|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.extract.ExtractionMatcher
public class ExtractionMatcher
Matches all extractions from an extraction container against a
preprocessed document, ensuring that they can be located and ordering them.
See matchAndOrderExtractions(ExtractionContainer, Document)
for
details.
Instances of this class are not thread-safe and cannot be used to match and order extractions from different documents in parallel.
Field Summary | |
---|---|
static Object |
PROP_DUPLICATE
All duplicate extractions will be marked with this property. |
Constructor Summary | |
---|---|
ExtractionMatcher(TiesConfiguration conf)
Creates a new instance. |
Method Summary | |
---|---|
ExtractionContainer |
matchAndOrderExtractions(ExtractionContainer orgContainer,
Document doc)
Matches all extractions from an extraction container against a preprocessing document, ensuring that they can be located. |
void |
processToken(Element element,
String left,
TokenDetails details,
String right,
ContextMap context)
Processes a token in an XML element, optionally modifying the element or the document it is part of. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final Object PROP_DUPLICATE
Extraction.hasProperty(Object)
Constructor Detail |
---|
public ExtractionMatcher(TiesConfiguration conf)
conf
- the configuration to useMethod Detail |
---|
public ExtractionContainer matchAndOrderExtractions(ExtractionContainer orgContainer, Document doc) throws IOException, ProcessingException
Extraction.getFirstTokenRep()
and Extraction.getIndex()
and Extraction.getLastIndex()
are set to the correct values.
If the FirstTokenRep
of an
extraction is negative, this implies that all instances of the
extracted text are to be found in the document, not just a single
instance at a specific position. In this case, the returned container
will contain a copy of the extraction for each match, with the
FirstTokenRep + Index + LastIndex values of each copy set to their
correct values. If each such extraction not at least found
once, an error is reported.
orgContainer
- the original container of extractions to processdoc
- the preprocessed document to match the extractions agains
IOException
- if an I/O error occurs during processing
ProcessingException
- if an error occurs during processingpublic void processToken(Element element, String left, TokenDetails details, String right, ContextMap context) throws IOException, ProcessingException
processToken
in interface TokenProcessor
element
- the element containing the tokenleft
- the textual contents of the element to the left of the
token
(in case of mixed contents, only up to the last
preceding child element, if any)details
- details about the token to processright
- the textual contents of the element to the right of the
token
(in case of mixed contents, only up to the next
following child element, if any)context
- a map of objects that are made available for processing
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |