de.fu_berlin.ties.extract
Class Extraction

java.lang.Object
  extended byde.fu_berlin.ties.io.BaseStorable
      extended byde.fu_berlin.ties.classify.Prediction
          extended byde.fu_berlin.ties.extract.Extraction
All Implemented Interfaces:
Recognition, Storable

public class Extraction
extends Prediction
implements Recognition

Extends a Prediction by also storing the extracted text and location data.

Instances of this class are not thread-safe and cannot handle extraction from several documents in parallel.

Version:
$Revision: 1.7 $, $Date: 2004/04/08 16:07:28 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String KEY_FIRST_TOKEN_REP
          Serialization key for the repetition of the first token.
static String KEY_TEXT
          Serialization key for the extracted text.
 
Fields inherited from class de.fu_berlin.ties.classify.Prediction
KEY_PR, KEY_PROB, KEY_SOURCE, KEY_TYPE
 
Constructor Summary
Extraction(FieldMap fieldMap)
          Creates a new instance from a field map, fulfilling the Storable contract.
Extraction(String predicted, double prob, double pr, String extracted, int ftRep)
          Creates a new instance, setting the evaluation status to EvalStatus.UNKNOWN.
Extraction(String predicted, double prob, double pr, String extracted, int ftRep, EvalStatus status)
          Creates a new instance.
Extraction(String predicted, String extracted, int ftRep)
          Creates a new instance, setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.
 
Method Summary
 void append(String newText, boolean afterWhitespace)
          Appends text to this extraction, using a new probability of -1 ("confirmed").
 void append(String newText, boolean afterWhitespace, double newProb, double newPR)
          Appends text to this extraction, recalculating the probability by multiplying the prior probability value with the probability of the new text.
 boolean equals(Object obj)
          Indicates whether some other object is "equal to" this one, fulfulling the Object.equals(java.lang.Object) contract.
 int getFirstTokenRep()
          Returns the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"), -1 if unknown.
 String getText()
          Returns the extracted text fragment.
 String getVisibleChars()
          Returns The visible characters of the text fragment (everything except whitespace and control characters).
 int hashCode()
          Returns a hash code value for this object, fulfulling the Object.hashCode() contract.
 boolean isSealed()
          Whether this extraction has been sealed.
 void seal()
          Seals this extraction.
 void setFirstTokenRep(int newFirstTokenRep)
          Modifies the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"); or -1 if unknown/irrelevant.
 FieldMap storeFields()
          Stores all relevant fields of this object in a field map for serialization.
 
Methods inherited from class de.fu_berlin.ties.classify.Prediction
addProb, getEvalStatus, getPR, getProbability, getSource, getType, setEvalStatus, setSource
 
Methods inherited from class de.fu_berlin.ties.io.BaseStorable
toString, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.fu_berlin.ties.context.Recognition
getType
 

Field Detail

KEY_TEXT

public static final String KEY_TEXT
Serialization key for the extracted text.

See Also:
Constant Field Values

KEY_FIRST_TOKEN_REP

public static final String KEY_FIRST_TOKEN_REP
Serialization key for the repetition of the first token.

See Also:
Constant Field Values
Constructor Detail

Extraction

public Extraction(FieldMap fieldMap)
Creates a new instance from a field map, fulfilling the Storable contract. An extraction created this way will be immediately sealed, thus the extracted text cannot be changed.

Parameters:
fieldMap - map containing the serialized fields

Extraction

public Extraction(String predicted,
                  String extracted,
                  int ftRep)
Creates a new instance, setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH. Use this constructor to build answer keys.

Parameters:
predicted - the predicted class
extracted - the (first part) extracted text fragment; must not be null
ftRep - the repetition of the first token in the text (counting starts with 0, as the first occurrence is the "0th repetition"); -1 if unknown

Extraction

public Extraction(String predicted,
                  double prob,
                  double pr,
                  String extracted,
                  int ftRep)
Creates a new instance, setting the evaluation status to EvalStatus.UNKNOWN.

Parameters:
predicted - the predicted class
prob - the probability of the prediction (must be in the range from 0.0 to 1.0, or -1 if this is a confirmed extraction resp. an answer key)
pr - the pR of the prediction; or Double.NaN if not known
extracted - the (first part) extracted text fragment; must not be null
ftRep - the repetition of the first token in the text (counting starts with 0, as the first occurrence is the "0th repetition"); -1 if unknown

Extraction

public Extraction(String predicted,
                  double prob,
                  double pr,
                  String extracted,
                  int ftRep,
                  EvalStatus status)
Creates a new instance.

Parameters:
predicted - the predicted class
prob - the probability of the prediction (must be in the range from 0.0 to 1.0, or -1 if this is a confirmed extraction resp. an answer key)
pr - the pR of the prediction; or Double.NaN if not known
extracted - the (first part) extracted text fragment; must not be null
ftRep - the repetition of the first token in the text (counting starts with 0, as the first occurrence is the "0th repetition"); -1 if unknown
status - the evaluation status of this instance
Method Detail

append

public void append(String newText,
                   boolean afterWhitespace)
            throws IllegalStateException
Appends text to this extraction, using a new probability of -1 ("confirmed"). Use this method when building answer keys.

Parameters:
newText - the text to append to the extracted text fragment
afterWhitespace - whether the add a space character before the new text
Throws:
IllegalStateException - if this extraction is sealed

append

public void append(String newText,
                   boolean afterWhitespace,
                   double newProb,
                   double newPR)
            throws IllegalStateException
Appends text to this extraction, recalculating the probability by multiplying the prior probability value with the probability of the new text.

Parameters:
newText - the text to append to the extracted text fragment
afterWhitespace - whether the add a space character before the new text
newProb - the probability of the new text; or -1 if this is an answer key
newPR - the new pR; or Double.NaN if not used
Throws:
IllegalStateException - if this extraction is sealed; or if new and old probabilities/pRs cannot be combined

equals

public boolean equals(Object obj)
Indicates whether some other object is "equal to" this one, fulfulling the Object.equals(java.lang.Object) contract. The evaluation status is ignored when checking equality, thus if all other fields of two extractions are equal, this method will return true even if their evaluation states differ. Only the visible characters of the extractions are compared, whitespace and control characters are ignored.

Overrides:
equals in class Prediction
Parameters:
obj - the reference object with which to compare
Returns:
true iff the specified object is an Extraction equal to this instance

getFirstTokenRep

public int getFirstTokenRep()
Returns the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"), -1 if unknown. This is useful to locate this extraction in the original text.

Returns:
the value of the attribute

getText

public String getText()
Returns the extracted text fragment.

Specified by:
getText in interface Recognition
Returns:
the extracted text

getVisibleChars

public String getVisibleChars()
Returns The visible characters of the text fragment (everything except whitespace and control characters).

Returns:
the visible characters

hashCode

public int hashCode()
Returns a hash code value for this object, fulfulling the Object.hashCode() contract.

Overrides:
hashCode in class Prediction
Returns:
a hash code value for this object

isSealed

public boolean isSealed()
Whether this extraction has been sealed. The text of a sealed extraction cannot longer be changed. This means that appending is not allowed after sealing.

Specified by:
isSealed in interface Recognition
Returns:
true iff this extraction is sealed

seal

public void seal()
Seals this extraction. The text of a sealed extraction cannot longer be changed. This means that appending is not allowed after sealing.


setFirstTokenRep

public void setFirstTokenRep(int newFirstTokenRep)
Modifies the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"); or -1 if unknown/irrelevant.

Parameters:
newFirstTokenRep - the new value of the attribute

storeFields

public FieldMap storeFields()
Stores all relevant fields of this object in a field map for serialization. An equivalent object can be created by calling FieldMap.createObject(Class) on the created field map.

Specified by:
storeFields in interface Storable
Overrides:
storeFields in class Prediction
Returns:
the created field map


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.