de.fu_berlin.ties.extract
Class ExtractionLocator

java.lang.Object
  extended by de.fu_berlin.ties.extract.ExtractionLocator

public class ExtractionLocator
extends Object

Locates extractions in a document.

Version:
$Revision: 1.11 $, $Date: 2006/10/21 16:04:13 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
ExtractionLocator(ExtractionContainer extractions, TextTokenizer textTokenizer)
          Creates a new instance, setting isRetrySilently() to false.
ExtractionLocator(ExtractionContainer extractions, TextTokenizer textTokenizer, boolean doRetrySilently)
          Creates a new instance.
 
Method Summary
 boolean endOfExtraction()
          Whether we reached the end of the current extraction.
 Extraction getCurrentExtraction()
          Returns the current extraction.
 boolean inExtraction()
          Whether we are currently within an extraction.
 boolean isRetrySilently()
          Whether the locator accepts extractions that are not explicitly located in the document.
 void reachedEndOfDocument()
          This method must be called at the end of the current document.
 boolean startOfExtraction(String token, int tokenRep)
          Whether the current token starts a new extraction.
 void switchToNextExtraction()
          Switches to the next extraction, updating the current extraction and related fields.
 String toString()
          Returns a string representation of this object.
 boolean updateExtraction(String token, int tokenRep)
          Updates the currently processed extraction.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ExtractionLocator

public ExtractionLocator(ExtractionContainer extractions,
                         TextTokenizer textTokenizer)
Creates a new instance, setting isRetrySilently() to false.

Parameters:
extractions - the extractions in this document
textTokenizer - the tokenizer used to split extractions into tokens

ExtractionLocator

public ExtractionLocator(ExtractionContainer extractions,
                         TextTokenizer textTokenizer,
                         boolean doRetrySilently)
Creates a new instance.

Parameters:
extractions - the extractions in this document
textTokenizer - the tokenizer used to split extractions into tokens
doRetrySilently - sets the state of isRetrySilently()
Method Detail

endOfExtraction

public boolean endOfExtraction()
Whether we reached the end of the current extraction.

Returns:
true iff the current extraction has ended

getCurrentExtraction

public Extraction getCurrentExtraction()
Returns the current extraction.

Returns:
the current extraction

inExtraction

public boolean inExtraction()
Whether we are currently within an extraction.

Returns:
true iff are processing the getCurrentExtraction(), false otherwise (we are waiting for it to start or there are no more extractions)

isRetrySilently

public boolean isRetrySilently()
Whether the locator accepts extractions that are not explicitly located in the document. If true, the locator accepts extractions that are not explicitly located in the document (negative (FirstTokenRep). If such an extraction is encountered, the locator will try to matching at all possible positions. When updateExtraction(String, int) fails (returns false) in such a case (indicating that only the first token(s) of the extraction could be matched, but not the full extraction), the locator will silently to locate the extraction against the next possible position.

Returns:
the value of the attribute, false by default

reachedEndOfDocument

public void reachedEndOfDocument()
This method must be called at the end of the current document. It will log an error when there are still unprocessed or incompletely processed extractions.


startOfExtraction

public boolean startOfExtraction(String token,
                                 int tokenRep)
Whether the current token starts a new extraction. This method must be called once for each token in a document, otherwise we might miss extractions.

Parameters:
token - the token to check
tokenRep - the repetition of the token in the document (counting starts with 0, as the first occurrence is the "0th repetition").
Returns:
true iff the given token starts a new extraction

switchToNextExtraction

public void switchToNextExtraction()
                            throws IllegalStateException
Switches to the next extraction, updating the current extraction and related fields. The prior current extraction must have been fully processed when this method is called, i.e. endOfExtraction() must be true.

Throws:
IllegalStateException - if endOfExtraction() is not true (there are still remaining tokens to process

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation

updateExtraction

public boolean updateExtraction(String token,
                                int tokenRep)
Updates the currently processed extraction. This method must be called once for each token in each extraction.

Parameters:
token - the token to process
tokenRep - the repetition of the token in the document (counting starts with 0, as the first occurrence is the "0th repetition").
Returns:
true iff the extraction was successfully updated; false if the token was erroneous (not expected to occur within the current extraction)


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.