|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.DocumentReader
de.fu_berlin.ties.extract.ExtractorBase
de.fu_berlin.ties.extract.Extractor
public class Extractor
An extractor runs a local Classifier
on a list of items/nodes and combines their results using a
CombinationStrategy
.
Instances of this class are not thread-safe and cannot extract from several documents in parallel.
Field Summary | |
---|---|
static String |
EXT_EXTRACTIONS
The recommended file extension to use for storing extractions. |
Fields inherited from class de.fu_berlin.ties.extract.ExtractorBase |
---|
CONFIG_AVOID, CONFIG_ELEMENTS, CONFIG_RELEVANT_PUNCTUATION, CONFIG_SENTENCE |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
Extractor()
Creates a new instance using a default extension. |
|
Extractor(String outExt)
Creates a new instance. |
|
Extractor(String outExt,
File runDirectory,
TiesConfiguration config)
Creates a new instance. |
|
Extractor(String outExt,
TargetStructure targetStruct,
Classifier[] theClassifiers,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory,
TrainableFilter sentFilter,
Reranker rerank,
Set<String> relevantPunct,
TiesConfiguration config)
Creates a new instance. |
|
Extractor(String outExt,
TiesConfiguration config)
Creates a new instance. |
|
Extractor(String outExt,
Trainer trainer)
Creates a new instance, re-using the components from the provided trainer. |
Method Summary | |
---|---|
protected void |
addPunctuationDetails(TokenDetails details)
Adds an element to the collected punctuation details. |
protected void |
appendPunctuation(Extraction ext)
Appends the collected punctuation details (if any) to the provided extraction. |
protected void |
clearPunctuation()
Clears the collected punctuation details. |
protected FilteringTokenWalker |
createFilteringTokenWalker(TrainableFilter repFilter)
Creates a filtering token walker to be used for walking through a document and sentence classification if a double classification approach is used. |
FMetricsView |
evaluateSentenceFiltering(ExtractionContainer correctExtractions)
Evaluates precision and recall for sentence filtering on the last processed document. |
ExtractionContainer |
extract(Document document)
Extracts items of interest from the contents of an XML document, based on context representation and local classifier. |
protected ExtractionContainer |
getPredictedExtractions()
Returns the extraction container used for storing the predicted extractions. |
void |
process(Document document,
Writer writer,
ContextMap context)
Extracts items of interest from the contents of an XML document and serializes the extractions. |
void |
processToken(Element element,
String left,
TokenDetails details,
String right,
ContextMap context)
Processes a token in an XML element, optionally modifying the element or the document it is part of. |
protected void |
resetStrategy()
Reset strategy and discard last prediction extraction if requested. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class de.fu_berlin.ties.extract.ExtractorBase |
---|
createSentenceFilter, evaluateSentenceFiltering, getActiveClasses, getClassifiers, getFactory, getFeatureCount, getFeatures, getPriorRecognitions, getRepresentation, getSentenceFilter, getStrategy, getTargetStructure, getWalker, initFields, isRelevant, isSentenceFiltering, markRelevant, skip, updateState, viewFeatureCount, viewRelevantPunctuation |
Methods inherited from class de.fu_berlin.ties.DocumentReader |
---|
doProcess |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String EXT_EXTRACTIONS
Constructor Detail |
---|
public Extractor() throws IllegalArgumentException, ProcessingException
Extractor(String, TiesConfiguration)
using the
standard configuration.
IllegalArgumentException
- if the combination strategy cannot be
initialized (cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt) throws IllegalArgumentException, ProcessingException
Extractor(String, TiesConfiguration)
using the
standard configuration.
outExt
- the extension to use for output files
IllegalArgumentException
- if the combination strategy cannot be
initialized (cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
outExt
- the extension to use for output filesconfig
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt, File runDirectory, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
outExt
- the extension to use for output filesrunDirectory
- the directory to run the classifier in; used instead
of the
configured directory if not null
config
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Extractor(String outExt, Trainer trainer)
outExt
- the extension to use for output filestrainer
- trainer whose components should be re-usedpublic Extractor(String outExt, TargetStructure targetStruct, Classifier[] theClassifiers, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, TrainableFilter sentFilter, Reranker rerank, Set<String> relevantPunct, TiesConfiguration config)
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifiers
- the classifiers to use for the local classification
decisionstheRepresentation
- the context representation to use for local
classificationscombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizerssentFilter
- the filter used in the first step of a double
classification approach ("sentence filtering"); if null
,
no sentence filtering is usedrerank
- a reranker that recalculates probabilities to
introduce a bias (can be used to favor recall over precision, by setting
a bias < 1 for the background class, etc.); must not be
null
relevantPunct
- a set of punctuation tokens that have been found to
be relevant for token classification; might be empty but not
null
config
- used to configure superclasses; if null
,
the standard configuration is usedMethod Detail |
---|
protected void addPunctuationDetails(TokenDetails details)
details
- the element to addprotected void appendPunctuation(Extraction ext)
clearPunctuation()
to dleetes the processed punctuation.
ext
- the extraction to append toprotected void clearPunctuation()
protected FilteringTokenWalker createFilteringTokenWalker(TrainableFilter repFilter)
createFilteringTokenWalker
in class ExtractorBase
repFilter
- the trainable filter to use
public ExtractionContainer extract(Document document) throws IOException, ProcessingException
document
- a document whose contents should be classified
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic FMetricsView evaluateSentenceFiltering(ExtractionContainer correctExtractions)
correctExtractions
- a container of all correct extractions for the
document
null
if sentence filtering is disabledprotected ExtractionContainer getPredictedExtractions()
public void process(Document document, Writer writer, ContextMap context) throws IOException, ProcessingException
process
in class DocumentReader
document
- the document to readwriter
- the writer to write the extracted items to; flushed
but not closed by this methodcontext
- a map of objects that are made available for processing
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void processToken(Element element, String left, TokenDetails details, String right, ContextMap context) throws ProcessingException
element
- the element containing the tokenleft
- the textual contents of the element to the left of the
token
(in case of mixed contents, only up to the last
preceding child element, if any)details
- details about the token to processright
- the textual contents of the element to the right of the
token
(in case of mixed contents, only up to the next
following child element, if any)context
- a map of objects that are made available for processing
ProcessingException
- if an error occurs during processingprotected void resetStrategy()
resetStrategy
in class ExtractorBase
public String toString()
toString
in class ExtractorBase
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |