|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.DocumentReader
de.fu_berlin.ties.extract.ExtractorBase
Common code base shared by Extractor
and
Trainer
.
Instances of subclasses are not thread-safe and cannot process several documents in parallel.
Field Summary |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
ExtractorBase(String outExt)
Creates a new instance. |
|
ExtractorBase(String outExt,
File runDirectory,
TiesConfiguration config)
Creates a new instance, configuring target structure, classifier, DefaultRepresentation , node filter, combination strategy and
tokenizer factory from the provided configuration. |
|
ExtractorBase(String outExt,
TargetStructure targetStruct,
Classifier theClassifier,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory,
TiesConfiguration config)
Creates a new instance. |
|
ExtractorBase(String outExt,
TiesConfiguration config)
Creates a new instance, configuring target structure, classifier, DefaultRepresentation , node filter
and combination strategy from the provided configuration. |
Method Summary | |
protected Set |
getActiveClasses()
Returns the set of candidate classes to consider for the current element. |
Classifier |
getClassifier()
Returns the classifier used for the local classification decisions. |
TokenizerFactory |
getFactory()
Returns the factory used to instantiate tokenizers. |
FeatureCount |
getFeatureCount()
Returns the object used to count documents, contexts, and features and to calculate averages. |
protected FeatureVector |
getFeatures()
Returns vector of features representing the currently processed element. |
PriorRecognitions |
getPriorRecognitions()
Returns the buffer of preceding Recognition s from the current document. |
Representation |
getRepresentation()
Returns the context representation used for local classifications. |
protected CombinationStrategy |
getStrategy()
Returns the combination strategy used. |
TargetStructure |
getTargetStructure()
Returns the target structure specifying the classes to recognize. |
protected void |
initFields()
Initializes the fields used for processing a document (feature cache, buffer of prior recognitions, and statistics) and resets the combination strategy. |
String |
toString()
Returns a string representation of this object. |
protected void |
updateState(Element element,
String leftText,
String mainText,
String rightText)
Helper that build the features and determines the active classes for an element. |
FeatureCountView |
viewFeatureCount()
Returns a read-only view on the counted documents, contexts, and features and the calculated averages. |
Methods inherited from class de.fu_berlin.ties.DocumentReader |
doProcess, process |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
getOutFileExt, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
getConfig |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
public ExtractorBase(String outExt) throws IllegalArgumentException, ProcessingException
ExtractorBase(String, TiesConfiguration)
using the
standard configuration.
outExt
- the extension to use for output files
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic ExtractorBase(String outExt, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
DefaultRepresentation
, node filter
and combination strategy from the provided configuration.
outExt
- the extension to use for output filesconfig
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic ExtractorBase(String outExt, File runDirectory, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
DefaultRepresentation
, node filter, combination strategy and
tokenizer factory from the provided configuration.
outExt
- the extension to use for output filesrunDirectory
- the directory to run the classifier in; used instead
of the
configured directory if not null
config
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic ExtractorBase(String outExt, TargetStructure targetStruct, Classifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, TiesConfiguration config)
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifier
- the classifier to use for the local classification
decisionstheRepresentation
- the context representation to use for local
classificationscombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizersconfig
- used to configure superclasses; if null
,
the standard configuration is usedMethod Detail |
protected Set getActiveClasses()
public Classifier getClassifier()
public TokenizerFactory getFactory()
public FeatureCount getFeatureCount()
protected FeatureVector getFeatures()
public PriorRecognitions getPriorRecognitions()
Recognition
s from the current document.
public Representation getRepresentation()
protected CombinationStrategy getStrategy()
public TargetStructure getTargetStructure()
protected void initFields()
public String toString()
toString
in class TextProcessor
protected void updateState(Element element, String leftText, String mainText, String rightText)
element
- the element to processleftText
- textual content to the left of (preceding)
mainText
, might be emptymainText
- the main textual content to represent, might be emptyrightText
- textual content to the right of (following)
mainText
, might be emptypublic FeatureCountView viewFeatureCount()
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |