de.fu_berlin.ties.extract
Class ExtractionMatcher

java.lang.Object
  extended by de.fu_berlin.ties.extract.ExtractionMatcher
All Implemented Interfaces:
TokenProcessor

public class ExtractionMatcher
extends Object
implements TokenProcessor

Matches all extractions from an extraction container against a preprocessed document, ensuring that they can be located and ordering them. See matchAndOrderExtractions(ExtractionContainer, Document) for details.

Instances of this class are not thread-safe and cannot be used to match and order extractions from different documents in parallel.

Version:
$Revision: 1.5 $, $Date: 2006/10/21 16:04:13 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static Object PROP_DUPLICATE
          All duplicate extractions will be marked with this property.
 
Constructor Summary
ExtractionMatcher(TiesConfiguration conf)
          Creates a new instance.
 
Method Summary
 ExtractionContainer matchAndOrderExtractions(ExtractionContainer orgContainer, Document doc)
          Matches all extractions from an extraction container against a preprocessing document, ensuring that they can be located.
 void processToken(Element element, String left, TokenDetails details, String right, ContextMap context)
          Processes a token in an XML element, optionally modifying the element or the document it is part of.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_DUPLICATE

public static final Object PROP_DUPLICATE
All duplicate extractions will be marked with this property.

See Also:
Extraction.hasProperty(Object)
Constructor Detail

ExtractionMatcher

public ExtractionMatcher(TiesConfiguration conf)
Creates a new instance.

Parameters:
conf - the configuration to use
Method Detail

matchAndOrderExtractions

public ExtractionContainer matchAndOrderExtractions(ExtractionContainer orgContainer,
                                                    Document doc)
                                             throws IOException,
                                                    ProcessingException
Matches all extractions from an extraction container against a preprocessing document, ensuring that they can be located. Returns a new extraction container containing all extraction in document order (the order in which they occur in the document). Also ensures that Extraction.getFirstTokenRep() and Extraction.getIndex() and Extraction.getLastIndex() are set to the correct values.

If the FirstTokenRep of an extraction is negative, this implies that all instances of the extracted text are to be found in the document, not just a single instance at a specific position. In this case, the returned container will contain a copy of the extraction for each match, with the FirstTokenRep + Index + LastIndex values of each copy set to their correct values. If each such extraction not at least found once, an error is reported.

Parameters:
orgContainer - the original container of extractions to process
doc - the preprocessed document to match the extractions agains
Returns:
a new extraction container as described above
Throws:
IOException - if an I/O error occurs during processing
ProcessingException - if an error occurs during processing

processToken

public void processToken(Element element,
                         String left,
                         TokenDetails details,
                         String right,
                         ContextMap context)
                  throws IOException,
                         ProcessingException
Processes a token in an XML element, optionally modifying the element or the document it is part of.

Specified by:
processToken in interface TokenProcessor
Parameters:
element - the element containing the token
left - the textual contents of the element to the left of the token (in case of mixed contents, only up to the last preceding child element, if any)
details - details about the token to process
right - the textual contents of the element to the right of the token (in case of mixed contents, only up to the next following child element, if any)
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.