|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.DocumentReader
de.fu_berlin.ties.extract.ExtractorBase
de.fu_berlin.ties.extract.Extractor
An extractor runs a local Classifier
on a list of items/nodes and combines their results using a
CombinationStrategy
.
Instances of this class are not thread-safe and cannot extract from several documents in parallel.
Field Summary | |
static String |
EXT_EXTRACTIONS
The recommended file extension to use for storing extractions. |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
Extractor(String outExt)
Creates a new instance. |
|
Extractor(String outExt,
File runDirectory,
TiesConfiguration config)
Creates a new instance. |
|
Extractor(String outExt,
TargetStructure targetStruct,
Classifier theClassifier,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory,
TiesConfiguration config)
Creates a new instance. |
|
Extractor(String outExt,
TiesConfiguration config)
Creates a new instance. |
|
Extractor(String outExt,
Trainer trainer)
Creates a new instance, re-using the components from the provided trainer. |
Method Summary | |
ExtractionContainer |
extract(Document document)
Extracts items of interest from the contents of an XML document, based on context representation and local classifier. |
protected ExtractionContainer |
getPredictedExtractions()
Returns the extraction container used for storing the predicted extractions. |
void |
process(Document document,
Writer writer,
ContextMap context)
Extracts items of interest from the contents of an XML document and serializes the extractions. |
void |
processToken(Element element,
String left,
String token,
String right,
int tokenRep,
boolean whitespaceBefore,
ContextMap context)
Classifies a token in an XML document, building features and delegating to the classifier. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class de.fu_berlin.ties.extract.ExtractorBase |
getActiveClasses, getClassifier, getFactory, getFeatureCount, getFeatures, getPriorRecognitions, getRepresentation, getStrategy, getTargetStructure, initFields, updateState, viewFeatureCount |
Methods inherited from class de.fu_berlin.ties.DocumentReader |
doProcess |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
getOutFileExt, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
getConfig |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final String EXT_EXTRACTIONS
Constructor Detail |
public Extractor(String outExt) throws IllegalArgumentException, ProcessingException
Extractor(String, TiesConfiguration)
using the
standard configuration.
outExt
- the extension to use for output files
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
outExt
- the extension to use for output filesconfig
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt, File runDirectory, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
outExt
- the extension to use for output filesrunDirectory
- the directory to run the classifier in; used instead
of the
configured directory if not null
config
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt, Trainer trainer)
outExt
- the extension to use for output filestrainer
- trainer whose components should be re-usedpublic Extractor(String outExt, TargetStructure targetStruct, Classifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, TiesConfiguration config)
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifier
- the classifier to use for the local classification
decisionstheRepresentation
- the context representation to use for local
classificationscombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizersconfig
- used to configure superclasses; if null
,
the standard configuration is usedMethod Detail |
protected ExtractionContainer getPredictedExtractions()
public ExtractionContainer extract(Document document) throws IOException, ProcessingException
document
- a document whose contents should be classified
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void process(Document document, Writer writer, ContextMap context) throws IOException, ProcessingException
process
in class DocumentReader
document
- the document to readwriter
- the writer to write the extracted items to; flushed
but not closed by this methodcontext
- a map of objects that are made available for processing
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void processToken(Element element, String left, String token, String right, int tokenRep, boolean whitespaceBefore, ContextMap context) throws ProcessingException
processToken
in interface TokenProcessor
element
- the element containing the tokenleft
- the textual contents of the element to the left of the
token
(in case of mixed contents, only up to the last
preceding child element, if any)token
- the token to processright
- the textual contents of the element to the right of the
token
(in case of mixed contents, only up to the next
following child element, if any)tokenRep
- the repetition of the token
in the document
(counting starts with 0, as the first occurrence is the "0th
repetition").whitespaceBefore
- whether there is whitespace before the main
token
(either at the end of left
or in the
preceding element)context
- a map of objects that are made available for processing;
ignored by this method
ProcessingException
- if an error occurs during classificationpublic String toString()
toString
in class ExtractorBase
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |