de.fu_berlin.ties.extract
Class TrainEval

java.lang.Object
  extended byde.fu_berlin.ties.ConfigurableProcessor
      extended byde.fu_berlin.ties.DirectoryProcessor
          extended byde.fu_berlin.ties.extract.TrainEval
All Implemented Interfaces:
Processor

public class TrainEval
extends DirectoryProcessor

Trains an extractor and evaluates extraction quality.

Version:
$Revision: 1.24 $, $Date: 2004/04/13 08:13:23 $, $Author: siefkes $
Author:
Christian Siefkes

Nested Class Summary
static class TrainEval.Results
          An inner class wrapping the results of a training + evaluation run.
 
Field Summary
static String CONFIG_FILE_EXT
          Configuration key: The extension(s) of files to evaluate.
static String CONFIG_RUN
          Configuration key: Number of evaluation runs to do to get average results.
static String CONFIG_TRAIN_SPLIT
          Configuration key: The percentage of a corpus to use for training.
static String CONFIG_UNIFORM
          Configuration key for the isUniform() attribute.
static String KEY_RUN
          Serialization key for the number of the run.
static String OUTPUT_DIR
          The base name of the subdirectory created and used to store the output results.
static String RUN_DIR
          The base name of the subdirectories created in the OUTPUT_DIR to store the results of each evaluation run.
 
Constructor Summary
TrainEval()
          Creates a new instance, using the standard configuration.
TrainEval(FileFilter filter, float trainingSplit, int runNo, boolean uniformTesting, TiesConfiguration config)
          Creates a new instance.
TrainEval(TiesConfiguration config)
          Creates a new instance.
 
Method Summary
 float getEvalSplit()
          Returns the percentage of a corpus to use for evaluation.
 int getRuns()
          Returns the number of evaluation runs to do to get average results.
 float getTrainSplit()
          Returns the percentage of a corpus to use for training; the remaining documents (1-x) are used for evaluation.
protected  Extractor initExtractor(Trainer trainer)
          Creates and initializes a extractor to use for an evaluation run, re-using the components of the provided trainer.
protected  Trainer initTrainer(File runDirectory)
          Creates and initializes a trainer to use for an evaluation run, configured from the stored configuration.
 boolean isUniform()
          If true, the evaluator does two runs with 50/50 split, using each file once for training and once for evaluation.
 void process(File[] files, ContextMap context)
          Processes an array of files, calling the trainAndEval(File[], ContextMap, File, int) method getRuns()() times.
 String toString()
          Returns a string representation of this object.
 TrainEval.Results trainAndEval(File[] files, ContextMap context, File runDirectory, int runNo)
          Processes an array of files.
 
Methods inherited from class de.fu_berlin.ties.DirectoryProcessor
process, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

OUTPUT_DIR

public static final String OUTPUT_DIR
The base name of the subdirectory created and used to store the output results.

See Also:
Constant Field Values

RUN_DIR

public static final String RUN_DIR
The base name of the subdirectories created in the OUTPUT_DIR to store the results of each evaluation run.

See Also:
Constant Field Values

CONFIG_FILE_EXT

public static final String CONFIG_FILE_EXT
Configuration key: The extension(s) of files to evaluate.

See Also:
Constant Field Values

CONFIG_TRAIN_SPLIT

public static final String CONFIG_TRAIN_SPLIT
Configuration key: The percentage of a corpus to use for training.

See Also:
Constant Field Values

CONFIG_RUN

public static final String CONFIG_RUN
Configuration key: Number of evaluation runs to do to get average results.

See Also:
Constant Field Values

CONFIG_UNIFORM

public static final String CONFIG_UNIFORM
Configuration key for the isUniform() attribute.

See Also:
Constant Field Values

KEY_RUN

public static final String KEY_RUN
Serialization key for the number of the run.

See Also:
Constant Field Values
Constructor Detail

TrainEval

public TrainEval()
          throws IllegalArgumentException,
                 ClassCastException,
                 NoSuchElementException
Creates a new instance, using the standard configuration.

Throws:
IllegalArgumentException - if the configured values are outside the allowed ranges
ClassCastException - if the configured numeric values cannot be parsed
NoSuchElementException - if one of the required values is missing from the configuration

TrainEval

public TrainEval(TiesConfiguration config)
          throws IllegalArgumentException,
                 ClassCastException,
                 NoSuchElementException
Creates a new instance.

Parameters:
config - used to configure this instance
Throws:
IllegalArgumentException - if the configured values are outside the allowed ranges
ClassCastException - if the configured numeric values cannot be parsed
NoSuchElementException - if one of the required values is missing from the configuration

TrainEval

public TrainEval(FileFilter filter,
                 float trainingSplit,
                 int runNo,
                 boolean uniformTesting,
                 TiesConfiguration config)
          throws IllegalArgumentException
Creates a new instance.

Parameters:
filter - the filter used to decide which files to accept
trainingSplit - the percentage of a corpus to use for training; the remaining documents (1-x) are used for evaluation
runNo - Number of evaluation runs to do to get average results
uniformTesting - if true, the evaluator does two runs with 50/50 split, using each file once for training and once for evaluation (ignoring the trainingSplitrunNo arguments)
config - used to configure superclasses, trainer, and extractor; if null, the standard configuration is used
Throws:
IllegalArgumentException - if trainingSplit is not a percentage (larger than 1 or smaller than 0) or if crossValidation is non-positive
Method Detail

getEvalSplit

public float getEvalSplit()
Returns the percentage of a corpus to use for evaluation.

Returns:
1.0 - getTrainSplit()

getRuns

public int getRuns()
Returns the number of evaluation runs to do to get average results.

Returns:
the value of the attribute

getTrainSplit

public float getTrainSplit()
Returns the percentage of a corpus to use for training; the remaining documents (1-x) are used for evaluation.

Returns:
the percentage to use for training

initExtractor

protected Extractor initExtractor(Trainer trainer)
Creates and initializes a extractor to use for an evaluation run, re-using the components of the provided trainer. Subclasses can overwrite this method to provide a different extractor.

Parameters:
trainer - trainer whose components should be re-used
Returns:
the created extractor

initTrainer

protected Trainer initTrainer(File runDirectory)
                       throws ProcessingException
Creates and initializes a trainer to use for an evaluation run, configured from the stored configuration. Subclasses can overwrite this method to provide a different trainer.

Parameters:
runDirectory - directory used to run the classifier
Returns:
the created trainer
Throws:
ProcessingException - if an error occurs during initialization

isUniform

public boolean isUniform()
If true, the evaluator does two runs with 50/50 split, using each file once for training and once for evaluation. The getRuns() and getTrainSplit() settings are ignored in this case.

Returns:
the value of the attribute

process

public void process(File[] files,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Processes an array of files, calling the trainAndEval(File[], ContextMap, File, int) method getRuns()() times. For each file, a corresponding answer key (*.ans) must exist.

Specified by:
process in class DirectoryProcessor
Parameters:
files - the array of files to process
context - a map of objects that are made available for processing; will be empty when called from the implemented process methods in this class
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class DirectoryProcessor
Returns:
a textual representation

trainAndEval

public TrainEval.Results trainAndEval(File[] files,
                                      ContextMap context,
                                      File runDirectory,
                                      int runNo)
                               throws IOException,
                                      ProcessingException
Processes an array of files. For each file, a corresponding answer key (*.ans) must exist.

Parameters:
files - the array of files to process
context - a map of objects that are made available for processing; will be empty when called from the implemented process methods in this class
runDirectory - directory used to do this run and store the results
runNo - the number of this run (counting starts with 1)
Returns:
a wrapper of the results of this run
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.