de.fu_berlin.ties.extract
Class ExtractorBase

java.lang.Object
  extended byde.fu_berlin.ties.ConfigurableProcessor
      extended byde.fu_berlin.ties.TextProcessor
          extended byde.fu_berlin.ties.DocumentReader
              extended byde.fu_berlin.ties.extract.ExtractorBase
All Implemented Interfaces:
Processor
Direct Known Subclasses:
Extractor, Trainer

public abstract class ExtractorBase
extends DocumentReader

Common code base shared by Extractor and Trainer.

Instances of subclasses are not thread-safe and cannot process several documents in parallel.

Version:
$Revision: 1.17 $, $Date: 2004/04/08 16:07:28 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
ExtractorBase(String outExt)
          Creates a new instance.
ExtractorBase(String outExt, File runDirectory, TiesConfiguration config)
          Creates a new instance, configuring target structure, classifier, DefaultRepresentation, node filter, combination strategy and tokenizer factory from the provided configuration.
ExtractorBase(String outExt, TargetStructure targetStruct, Classifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, TiesConfiguration config)
          Creates a new instance.
ExtractorBase(String outExt, TiesConfiguration config)
          Creates a new instance, configuring target structure, classifier, DefaultRepresentation, node filter and combination strategy from the provided configuration.
 
Method Summary
protected  Set getActiveClasses()
          Returns the set of candidate classes to consider for the current element.
 Classifier getClassifier()
          Returns the classifier used for the local classification decisions.
 TokenizerFactory getFactory()
          Returns the factory used to instantiate tokenizers.
 FeatureCount getFeatureCount()
          Returns the object used to count documents, contexts, and features and to calculate averages.
protected  FeatureVector getFeatures()
          Returns vector of features representing the currently processed element.
 PriorRecognitions getPriorRecognitions()
          Returns the buffer of preceding Recognitions from the current document.
 Representation getRepresentation()
          Returns the context representation used for local classifications.
protected  CombinationStrategy getStrategy()
          Returns the combination strategy used.
 TargetStructure getTargetStructure()
          Returns the target structure specifying the classes to recognize.
protected  void initFields()
          Initializes the fields used for processing a document (feature cache, buffer of prior recognitions, and statistics) and resets the combination strategy.
 String toString()
          Returns a string representation of this object.
protected  void updateState(Element element, String leftText, String mainText, String rightText)
          Helper that build the features and determines the active classes for an element.
 FeatureCountView viewFeatureCount()
          Returns a read-only view on the counted documents, contexts, and features and the calculated averages.
 
Methods inherited from class de.fu_berlin.ties.DocumentReader
doProcess, process
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ExtractorBase

public ExtractorBase(String outExt)
              throws IllegalArgumentException,
                     ProcessingException
Creates a new instance. Delegates to ExtractorBase(String, TiesConfiguration) using the standard configuration.

Parameters:
outExt - the extension to use for output files
Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

ExtractorBase

public ExtractorBase(String outExt,
                     TiesConfiguration config)
              throws IllegalArgumentException,
                     ProcessingException
Creates a new instance, configuring target structure, classifier, DefaultRepresentation, node filter and combination strategy from the provided configuration.

Parameters:
outExt - the extension to use for output files
config - the configuration to use
Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

ExtractorBase

public ExtractorBase(String outExt,
                     File runDirectory,
                     TiesConfiguration config)
              throws IllegalArgumentException,
                     ProcessingException
Creates a new instance, configuring target structure, classifier, DefaultRepresentation, node filter, combination strategy and tokenizer factory from the provided configuration.

Parameters:
outExt - the extension to use for output files
runDirectory - the directory to run the classifier in; used instead of the configured directory if not null
config - the configuration to use
Throws:
IllegalArgumentException - if the combination strategy cannot be initialized (cf. CombinationStrategy.createStrategy(Set, TiesConfiguration))
ProcessingException - if an error occurs during initialization

ExtractorBase

public ExtractorBase(String outExt,
                     TargetStructure targetStruct,
                     Classifier theClassifier,
                     Representation theRepresentation,
                     CombinationStrategy combiStrat,
                     TokenizerFactory tFactory,
                     TiesConfiguration config)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
targetStruct - the target structure specifying the classes to recognize
theClassifier - the classifier to use for the local classification decisions
theRepresentation - the context representation to use for local classifications
combiStrat - the combination strategy to use
tFactory - used to instantiate tokenizers
config - used to configure superclasses; if null, the standard configuration is used
Method Detail

getActiveClasses

protected Set getActiveClasses()
Returns the set of candidate classes to consider for the current element.

Returns:
the value of the attribute

getClassifier

public Classifier getClassifier()
Returns the classifier used for the local classification decisions.

Returns:
the local classifier

getFactory

public TokenizerFactory getFactory()
Returns the factory used to instantiate tokenizers.

Returns:
the value of the attribute

getFeatureCount

public FeatureCount getFeatureCount()
Returns the object used to count documents, contexts, and features and to calculate averages.

Returns:
the used feature count

getFeatures

protected FeatureVector getFeatures()
Returns vector of features representing the currently processed element.

Returns:
the value of the attribute

getPriorRecognitions

public PriorRecognitions getPriorRecognitions()
Returns the buffer of preceding Recognitions from the current document.

Returns:
the buffer

getRepresentation

public Representation getRepresentation()
Returns the context representation used for local classifications.

Returns:
the context representation

getStrategy

protected CombinationStrategy getStrategy()
Returns the combination strategy used.

Returns:
the combination strategy

getTargetStructure

public TargetStructure getTargetStructure()
Returns the target structure specifying the classes to recognize.

Returns:
the used target structure

initFields

protected void initFields()
Initializes the fields used for processing a document (feature cache, buffer of prior recognitions, and statistics) and resets the combination strategy.


toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TextProcessor
Returns:
a textual representation

updateState

protected void updateState(Element element,
                           String leftText,
                           String mainText,
                           String rightText)
Helper that build the features and determines the active classes for an element.

Parameters:
element - the element to process
leftText - textual content to the left of (preceding) mainText, might be empty
mainText - the main textual content to represent, might be empty
rightText - textual content to the right of (following) mainText, might be empty

viewFeatureCount

public FeatureCountView viewFeatureCount()
Returns a read-only view on the counted documents, contexts, and features and the calculated averages. This is not a snapshot but will change whenever the a document is processed.

Returns:
a view on the counts and averages


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.