de.fu_berlin.ties.extract
Class TrainEval

java.lang.Object
  extended by de.fu_berlin.ties.ConfigurableProcessor
      extended by de.fu_berlin.ties.TextProcessor
          extended by de.fu_berlin.ties.extract.TrainEval
All Implemented Interfaces:
Closeable, Processor

public class TrainEval
extends TextProcessor
implements Closeable

Trains an extractor and evaluates extraction quality. Processes shuffle files (as generated by ShuffleGenerator contain the files to use for training and evaluation. For each of these files, a corresponding answer key (*.ans) must exist.

Instances of this class are not thread-safe.

Version:
$Revision: 1.88 $, $Date: 2006/10/21 16:04:14 $, $Author: siefkes $
Author:
Christian Siefkes

Nested Class Summary
static class TrainEval.Results
          An inner class wrapping the results of a training + evaluation run.
 
Field Summary
static String CONFIG_FEEDBACK
          Configuration key: If true, a fully incremental setup is used where the trainer is trained on each document after the extractor processed it.
static String CONFIG_SENTENCE_TUNE
          Configuration key: The maximum number of iterations used for TUNE training the sentence classifier; if 0 or negative, the value of Tuner.CONFIG_TUNE is used.
static String EXT_METRICS
          File extension used for metrics files.
static String KEY_ITERATION
          Serialization key for the number of the iteration (when TUNE training).
static String KEY_RUN
          Serialization key for the number of the run.
static String KEY_TYPE
          Serialization key for the type (either "Train" or "Eval").
static String TYPE_EVAL
          Serialization value for the "Eval" type.
static String TYPE_TRAIN
          Serialization value for the "Train" type.
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
TrainEval()
          Creates a new instance, using a default extension and the standard configuration.
TrainEval(String outExt)
          Creates a new instance, using the standard configuration.
TrainEval(String outExt, TiesConfiguration config)
          Creates a new instance.
TrainEval(String outExt, Tuner theTuner, int sentenceTUNE, boolean giveFeedback, boolean doEvaluate, String predExt, boolean doStorePredsInOutDir, TiesConfiguration config)
          Creates a new instance.
 
Method Summary
 void close(int errorCount)
          Closes this instance, releasing all resources and stopping any background threads.
protected  void doProcess(Reader reader, Writer writer, ContextMap context)
          Processes the contents of a reader, writing a modified version to a writer.
protected  Extractor initExtractor(Trainer trainer)
          Creates and initializes a extractor to use for an evaluation run, re-using the components of the provided trainer.
protected  Trainer initTrainer(File runDirectory)
          Creates and initializes a trainer to use for an evaluation run, configured from the stored configuration.
 String toString()
          Returns a string representation of this object.
 TrainEval.Results trainAndEval(String[] files, File inDirectory, File outDirectory, String baseName, Writer writer)
          Processes an array of files.
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process, process, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_FEEDBACK

public static final String CONFIG_FEEDBACK
Configuration key: If true, a fully incremental setup is used where the trainer is trained on each document after the extractor processed it.

See Also:
Constant Field Values

CONFIG_SENTENCE_TUNE

public static final String CONFIG_SENTENCE_TUNE
Configuration key: The maximum number of iterations used for TUNE training the sentence classifier; if 0 or negative, the value of Tuner.CONFIG_TUNE is used.

See Also:
Constant Field Values

KEY_ITERATION

public static final String KEY_ITERATION
Serialization key for the number of the iteration (when TUNE training).

See Also:
Constant Field Values

KEY_RUN

public static final String KEY_RUN
Serialization key for the number of the run.

See Also:
Constant Field Values

KEY_TYPE

public static final String KEY_TYPE
Serialization key for the type (either "Train" or "Eval").

See Also:
Constant Field Values

TYPE_TRAIN

public static final String TYPE_TRAIN
Serialization value for the "Train" type.

See Also:
Constant Field Values

TYPE_EVAL

public static final String TYPE_EVAL
Serialization value for the "Eval" type.

See Also:
Constant Field Values

EXT_METRICS

public static final String EXT_METRICS
File extension used for metrics files.

See Also:
Constant Field Values
Constructor Detail

TrainEval

public TrainEval()
          throws IllegalArgumentException,
                 ClassCastException,
                 NoSuchElementException
Creates a new instance, using a default extension and the standard configuration.

Throws:
IllegalArgumentException - if the configured values are outside the allowed ranges
ClassCastException - if the configured numeric values cannot be parsed
NoSuchElementException - if one of the required values is missing from the configuration

TrainEval

public TrainEval(String outExt)
          throws IllegalArgumentException,
                 ClassCastException,
                 NoSuchElementException
Creates a new instance, using the standard configuration.

Parameters:
outExt - the extension to use for output files
Throws:
IllegalArgumentException - if the configured values are outside the allowed ranges
ClassCastException - if the configured numeric values cannot be parsed
NoSuchElementException - if one of the required values is missing from the configuration

TrainEval

public TrainEval(String outExt,
                 TiesConfiguration config)
          throws IllegalArgumentException,
                 ClassCastException,
                 NoSuchElementException
Creates a new instance.

Parameters:
outExt - the extension to use for output files
config - used to configure this instance
Throws:
IllegalArgumentException - if the configured values are outside the allowed ranges
ClassCastException - if the configured numeric values cannot be parsed
NoSuchElementException - if one of the required values is missing from the configuration

TrainEval

public TrainEval(String outExt,
                 Tuner theTuner,
                 int sentenceTUNE,
                 boolean giveFeedback,
                 boolean doEvaluate,
                 String predExt,
                 boolean doStorePredsInOutDir,
                 TiesConfiguration config)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
theTuner - used for TUNE training
sentenceTUNE - the maximum number of iterations used for TUNE training the sentence classifier (if used); if 0 or negative, the value of Tuner.getTuneIterations() is used
giveFeedback - if true, a fully incremental setup is used where the trainer is trained on each document after the extractor processed it; it's not allowed to set this both this and Tuner.isTuneEach() to true when training for several TUNE iterations because that would mean to evaluate on the training set
doEvaluate - whether to evaluate predictions by comparing them to answer keys, otherwise predictions are stored without evaluating them
predExt - the extension used to stored predictions (if doEvaluate is set to false)
doStorePredsInOutDir - whether to write prediction files to the configured output directory or the the directory containing the input file
config - used to configure superclasses, trainer, and extractor; if null, the standard configuration is used
Method Detail

close

public void close(int errorCount)
           throws IOException,
                  ProcessingException
Closes this instance, releasing all resources and stopping any background threads.

Specified by:
close in interface Closeable
Parameters:
errorCount - the number of errors (exceptions) that occurred during calls to this instance (0 if none)
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing any remaining input

doProcess

protected void doProcess(Reader reader,
                         Writer writer,
                         ContextMap context)
                  throws IOException,
                         ProcessingException
Processes the contents of a reader, writing a modified version to a writer.

Specified by:
doProcess in class TextProcessor
Parameters:
reader - reader containing the text to process; should not be closed by this method
writer - the writer to write the processed text to; might be flushed but not closed by this method; if this method does not use the writer, the underlying file will be deleted afterwards
context - a map of objects that are made available for processing; when called from the implemented process methods in this class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET to the character set of the output writer; from TextProcessor.KEY_OUT_DIRECTORY to the output directory (File); from ContentType.KEY_MIME_TYPE to the document's MIME type; from TextProcessor.KEY_LOCAL_NAME to the local name (String) and either from TextProcessor.KEY_DIRECTORY to the input directory (File), in case of a local file) or from TextProcessor.KEY_URL to the URL (otherwise) of the processed document
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

initExtractor

protected Extractor initExtractor(Trainer trainer)
Creates and initializes a extractor to use for an evaluation run, re-using the components of the provided trainer. Subclasses can overwrite this method to provide a different extractor.

Parameters:
trainer - trainer whose components should be re-used
Returns:
the created extractor

initTrainer

protected Trainer initTrainer(File runDirectory)
                       throws ProcessingException
Creates and initializes a trainer to use for an evaluation run, configured from the stored configuration. Subclasses can overwrite this method to provide a different trainer.

Parameters:
runDirectory - directory used to run the classifier
Returns:
the created trainer
Throws:
ProcessingException - if an error occurs during initialization

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TextProcessor
Returns:
a textual representation

trainAndEval

public TrainEval.Results trainAndEval(String[] files,
                                      File inDirectory,
                                      File outDirectory,
                                      String baseName,
                                      Writer writer)
                               throws IOException,
                                      ProcessingException
Processes an array of files. For each file, a corresponding answer key (*.ans) must exist.

Parameters:
files - the array of file names to process (relative to the inDirectory)
inDirectory - directory containing the files to process
outDirectory - directory used to do this run and store the results
baseName - the base name of the files to use for storing all extractions and training statistics
writer - used to serialize the calculated metrics
Returns:
a wrapper of the results of this run
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.