de.fu_berlin.ties.extract
Class Trainer

java.lang.Object
  extended byde.fu_berlin.ties.ConfigurableProcessor
      extended byde.fu_berlin.ties.TextProcessor
          extended byde.fu_berlin.ties.DocumentReader
              extended byde.fu_berlin.ties.extract.ExtractorBase
                  extended byde.fu_berlin.ties.extract.Trainer
All Implemented Interfaces:
Processor, TokenProcessor

public class Trainer
extends ExtractorBase
implements TokenProcessor

A trainer trains a local Classifier to be used for extraction.

Instances of this class are not thread-safe and cannot handle training on several documents in parallel.

Version:
$Revision: 1.24 $, $Date: 2004/04/08 16:07:28 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CONFIG_TEST_ONLY
          Configuration key determining whether the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training.
static String CONFIG_TOE
          Configuration key for determining the training mode (isTrainingOnlyErrors()).
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
Trainer()
          Creates a new instance without specifying an output extension (which isn't needed anyway, because this class doesn't produce output).
Trainer(String outExt)
          Creates a new instance.
Trainer(String outExt, File runDirectory, TiesConfiguration config)
          Creates a new instance.
Trainer(String outExt, TargetStructure targetStruct, TrainableClassifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory)
          Creates a new instance, using the standard configuration to configure the training mode and the superclasses.
Trainer(String outExt, TargetStructure targetStruct, TrainableClassifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, boolean trainOnlyErrors, boolean testOnly, TiesConfiguration config)
          Creates a new instance.
Trainer(String outExt, TiesConfiguration config)
          Creates a new instance.
 
Method Summary
 boolean isTestingOnly()
          If true the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training.
 boolean isTrainingOnlyErrors()
          Whether to train only errors (TOE mode, recommmended) or to train all instances (brute-force mode).
 void process(Document document, Writer writer, ContextMap context)
          Trains the local classifier with the correct extractions of an XML document, using the provided context representation.
 void processToken(Element element, String left, String token, String right, int tokenRep, boolean whitespaceBefore, ContextMap context)
          Trains the local classifier on the features of a token in an XML document.
 String toString()
          Returns a string representation of this object.
 Accuracy train(Document document, ExtractionContainer correctExtractions)
          Trains the local classifier with the correct extractions of an XML document, using the provided context representation.
 
Methods inherited from class de.fu_berlin.ties.extract.ExtractorBase
getActiveClasses, getClassifier, getFactory, getFeatureCount, getFeatures, getPriorRecognitions, getRepresentation, getStrategy, getTargetStructure, initFields, updateState, viewFeatureCount
 
Methods inherited from class de.fu_berlin.ties.DocumentReader
doProcess
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_TOE

public static final String CONFIG_TOE
Configuration key for determining the training mode (isTrainingOnlyErrors()).

See Also:
Constant Field Values

CONFIG_TEST_ONLY

public static final String CONFIG_TEST_ONLY
Configuration key determining whether the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training.

See Also:
Constant Field Values
Constructor Detail

Trainer

public Trainer()
        throws IllegalArgumentException,
               ProcessingException
Creates a new instance without specifying an output extension (which isn't needed anyway, because this class doesn't produce output). Delegates to Trainer(String, TiesConfiguration) using the standard configuration and a dummy extension.

Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

Trainer

public Trainer(String outExt)
        throws IllegalArgumentException,
               ProcessingException
Creates a new instance. Delegates to Trainer(String, TiesConfiguration) using the standard configuration.

Parameters:
outExt - the extension to use for output files
Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

Trainer

public Trainer(String outExt,
               TiesConfiguration config)
        throws IllegalArgumentException,
               ProcessingException
Creates a new instance. Delegates to the Trainer(String, File, TiesConfiguration) constructor without specifying a runDirectory.

Parameters:
outExt - the extension to use for output files
config - the configuration to use
Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

Trainer

public Trainer(String outExt,
               File runDirectory,
               TiesConfiguration config)
        throws IllegalArgumentException,
               ProcessingException
Creates a new instance. Sets the training mode (isTrainingOnlyErrors()) to the value of the CONFIG_TOE configuration key in the provided configuration and delegates to the corresponding super constructor to configure the fields.

Parameters:
outExt - the extension to use for output files
runDirectory - the directory to run the classifier in; used instead of the configured directory if not null
config - the configuration to use
Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

Trainer

public Trainer(String outExt,
               TargetStructure targetStruct,
               TrainableClassifier theClassifier,
               Representation theRepresentation,
               CombinationStrategy combiStrat,
               TokenizerFactory tFactory)
Creates a new instance, using the standard configuration to configure the training mode and the superclasses.

Parameters:
outExt - the extension to use for output files
targetStruct - the target structure specifying the classes to recognize
theClassifier - the classifier to train
theRepresentation - the context representation to use training
combiStrat - the combination strategy to use
tFactory - used to instantiate tokenizers

Trainer

public Trainer(String outExt,
               TargetStructure targetStruct,
               TrainableClassifier theClassifier,
               Representation theRepresentation,
               CombinationStrategy combiStrat,
               TokenizerFactory tFactory,
               boolean trainOnlyErrors,
               boolean testOnly,
               TiesConfiguration config)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
targetStruct - the target structure specifying the classes to recognize
theClassifier - the classifier to train
theRepresentation - the context representation to use training
combiStrat - the combination strategy to use
tFactory - used to instantiate tokenizers.
trainOnlyErrors - whether to train only errors (TOE mode, recommmended) or to train all instances (brute-force mode)
testOnly - if true the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training
config - used to configure superclasses; if null, the standard configuration is used
Method Detail

isTestingOnly

public boolean isTestingOnly()
If true the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training.

Returns:
the value of the attribute

isTrainingOnlyErrors

public boolean isTrainingOnlyErrors()
Whether to train only errors (TOE mode, recommmended) or to train all instances (brute-force mode).

Returns:
the value of the attribute

process

public void process(Document document,
                    Writer writer,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Trains the local classifier with the correct extractions of an XML document, using the provided context representation. In TOE mode, training statistics are serialized to the writer. The answer keys must be in a corresponding file ending in AnswerBuilder.EXT_ANSWERS in the same directory (when processing a local file) or in the current working directory (when processin an URL).

Specified by:
process in class DocumentReader
Parameters:
document - the document to read
writer - ignored by this method
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

processToken

public void processToken(Element element,
                         String left,
                         String token,
                         String right,
                         int tokenRep,
                         boolean whitespaceBefore,
                         ContextMap context)
                  throws ProcessingException
Trains the local classifier on the features of a token in an XML document.

Specified by:
processToken in interface TokenProcessor
Parameters:
element - the element containing the token
left - the textual contents of the element to the left of the token (in case of mixed contents, only up to the last preceding child element, if any)
token - the token to process
right - the textual contents of the element to the right of the token (in case of mixed contents, only up to the next following child element, if any)
tokenRep - the repetition of the token in the document (counting starts with 0, as the first occurrence is the "0th repetition").
whitespaceBefore - whether there is whitespace before the main token (either at the end of left or in the preceding element)
context - a map of objects that are made available for processing; ignored by this method
Throws:
ProcessingException - if an error occurs during processing

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class ExtractorBase
Returns:
a textual representation

train

public Accuracy train(Document document,
                      ExtractionContainer correctExtractions)
               throws IOException,
                      ProcessingException
Trains the local classifier with the correct extractions of an XML document, using the provided context representation.

Parameters:
document - a document whose contents should be classified
correctExtractions - a container of all correct extractions for the document
Returns:
The token accuracy of the trained document if in TOE mode; null otherwise
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.