de.fu_berlin.ties.extract
Class Extraction

java.lang.Object
  extended by de.fu_berlin.ties.io.BaseStorable
      extended by de.fu_berlin.ties.classify.Prediction
          extended by de.fu_berlin.ties.extract.Extraction
All Implemented Interfaces:
Recognition, Storable

public class Extraction
extends Prediction
implements Recognition

Extends a Prediction by also storing the extracted text and location data.

Instances of this class are not thread-safe.

Version:
$Revision: 1.12 $, $Date: 2004/11/25 13:36:08 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String KEY_FIRST_TOKEN_REP
          Serialization key for the repetition of the first token.
static String KEY_INDEX
          Serialization key for the index.
static String KEY_TEXT
          Serialization key for the extracted text.
 
Fields inherited from class de.fu_berlin.ties.classify.Prediction
KEY_PR, KEY_PROB, KEY_SOURCE, KEY_TYPE
 
Constructor Summary
Extraction(FieldMap fieldMap)
          Creates a new instance from a field map, fulfilling the Storable contract.
Extraction(String predicted, String extracted)
          Creates a new instance without locating it in a text (using -1 for first token rep + index), setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.
Extraction(String predicted, TokenDetails details)
          Creates a new instance, setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.
Extraction(String predicted, TokenDetails details, Probability prob)
          Creates a new instance, setting the evaluation status to EvalStatus.UNKNOWN.
Extraction(String predicted, TokenDetails details, Probability prob, EvalStatus status)
          Creates a new instance.
 
Method Summary
 void addToken(TokenDetails details, boolean atEnd)
          Adds a token to this extraction, delegating to addToken(TokenDetails, Probability, boolean) with a probability of -1 ("confirmed").
 void addToken(TokenDetails details, Probability prob, boolean atEnd)
          Adds a token to this extraction, recalculating the probability by multiplying the prior probability value with the probability of the new text.
 boolean equals(Object obj)
          Indicates whether some other object is "equal to" this one, fulfulling the Object.equals(java.lang.Object) contract.
 int getFirstTokenRep()
          Returns the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"), -1 if unknown or if isFirstTokenRepIgnored() is true.
 int getIndex()
          Returns the index of the first token in the text (indexing starts with 0); or -1 if unknown/irrelevant.
 String getText()
          Returns the extracted text fragment.
 String getVisibleChars()
          Returns the visible characters of the text fragment (everything except whitespace and control characters).
 int hashCode()
          Returns a hash code value for this object, fulfulling the Object.hashCode() contract.
 boolean isFirstTokenRepIgnored()
          Whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions.
 boolean isSealed()
          Whether this extraction has been sealed.
 TokenDetails removeToken(boolean atEnd)
          Deletes one of the tokens from this prediction.
 void setFirstTokenRepIgnored(boolean firstTokenRepIgnored)
          Specified whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions.
 void setSealed(boolean newSealed)
          Seals or unseals this extraction.
 FieldMap storeFields()
          Stores all relevant fields of this object in a field map for serialization.
 int tokenCount()
          Returns the number of tokens in this extraction.
 
Methods inherited from class de.fu_berlin.ties.classify.Prediction
addProb, getEvalStatus, getProbability, getSource, getType, probCount, removeProb, setEvalStatus, setSource
 
Methods inherited from class de.fu_berlin.ties.io.BaseStorable
toString, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.fu_berlin.ties.context.Recognition
getType
 

Field Detail

KEY_TEXT

public static final String KEY_TEXT
Serialization key for the extracted text.

See Also:
Constant Field Values

KEY_FIRST_TOKEN_REP

public static final String KEY_FIRST_TOKEN_REP
Serialization key for the repetition of the first token.

See Also:
Constant Field Values

KEY_INDEX

public static final String KEY_INDEX
Serialization key for the index.

See Also:
Constant Field Values
Constructor Detail

Extraction

public Extraction(FieldMap fieldMap)
Creates a new instance from a field map, fulfilling the Storable contract. An extraction created this way will be immediately sealed, thus the extracted text cannot be changed.

Parameters:
fieldMap - map containing the serialized fields

Extraction

public Extraction(String predicted,
                  String extracted)
Creates a new instance without locating it in a text (using -1 for first token rep + index), setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.

Parameters:
predicted - the predicted class
extracted - the (first part) extracted text fragment; must not be null

Extraction

public Extraction(String predicted,
                  TokenDetails details)
Creates a new instance, setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH. Use this constructor to build answer keys.

Parameters:
predicted - the predicted class
details - details about the extracted text fragment or its first token

Extraction

public Extraction(String predicted,
                  TokenDetails details,
                  Probability prob)
Creates a new instance, setting the evaluation status to EvalStatus.UNKNOWN.

Parameters:
predicted - the predicted class
details - details about the extracted text fragment or its first token
prob - the probability of the prediction

Extraction

public Extraction(String predicted,
                  TokenDetails details,
                  Probability prob,
                  EvalStatus status)
Creates a new instance.

Parameters:
predicted - the predicted class
details - details about the extracted text fragment or its first token
prob - the probability of the prediction
status - the evaluation status of this instance
Method Detail

addToken

public void addToken(TokenDetails details,
                     boolean atEnd)
              throws IllegalStateException
Adds a token to this extraction, delegating to addToken(TokenDetails, Probability, boolean) with a probability of -1 ("confirmed"). Use this method when building answer keys.

Parameters:
details - details about the new token
atEnd - whether to add the new token at the end or at the start
Throws:
IllegalStateException - if this extraction is sealed

addToken

public void addToken(TokenDetails details,
                     Probability prob,
                     boolean atEnd)
              throws IllegalStateException
Adds a token to this extraction, recalculating the probability by multiplying the prior probability value with the probability of the new text. Increments the token count by 1.

Parameters:
details - details about the new token
prob - the probability of the new token; might be null if the overall probability of the extraction should not be changed
atEnd - whether to add the new token at the end or at the start
Throws:
IllegalStateException - if this extraction is sealed; or if new and old probabilities/pRs cannot be combined

equals

public boolean equals(Object obj)
Indicates whether some other object is "equal to" this one, fulfulling the Object.equals(java.lang.Object) contract. The evaluation status is ignored when checking equality, thus if all other fields of two extractions are equal, this method will return true even if their evaluation states differ. Only the visible characters of the extractions are compared, whitespace and control characters are ignored.

Overrides:
equals in class Prediction
Parameters:
obj - the reference object with which to compare
Returns:
true iff the specified object is an Extraction equal to this instance

getFirstTokenRep

public int getFirstTokenRep()
Returns the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"), -1 if unknown or if isFirstTokenRepIgnored() is true. This is useful to locate this extraction in the original text.

Returns:
the value of the attribute

getIndex

public int getIndex()
Returns the index of the first token in the text (indexing starts with 0); or -1 if unknown/irrelevant.

Returns:
the value of the attribute

getText

public String getText()
Returns the extracted text fragment.

Specified by:
getText in interface Recognition
Returns:
the extracted text

getVisibleChars

public String getVisibleChars()
Returns the visible characters of the text fragment (everything except whitespace and control characters).

Returns:
the visible characters

hashCode

public int hashCode()
Returns a hash code value for this object, fulfulling the Object.hashCode() contract.

Overrides:
hashCode in class Prediction
Returns:
a hash code value for this object

isFirstTokenRepIgnored

public boolean isFirstTokenRepIgnored()
Whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions. Defaults to false.

Returns:
the value of the attribute

isSealed

public boolean isSealed()
Whether this extraction has been sealed. The text of a sealed extraction cannot be changed. This means that adding tokens is not allowed after sealing.

Specified by:
isSealed in interface Recognition
Returns:
true iff this extraction is sealed

removeToken

public TokenDetails removeToken(boolean atEnd)
                         throws IllegalStateException
Deletes one of the tokens from this prediction. At least one token must always remain, i.e. tokenCount() must be 2 or more prior to calling this method.

Parameters:
atEnd - whether to delete the first or the last token
Returns:
details describing the removed token
Throws:
IllegalStateException - if there is only one token left or if this extraction is sealed

setFirstTokenRepIgnored

public void setFirstTokenRepIgnored(boolean firstTokenRepIgnored)
Specified whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions.

Parameters:
firstTokenRepIgnored - the new value of the attribute

setSealed

public void setSealed(boolean newSealed)
Seals or unseals this extraction. The text of a sealed extraction cannot be changed. This means that adding tokens is not allowed after sealing.

Parameters:
newSealed - the new value of the attribute

storeFields

public FieldMap storeFields()
Stores all relevant fields of this object in a field map for serialization. An equivalent object can be created by calling FieldMap.createObject(Class) on the created field map.

Specified by:
storeFields in interface Storable
Overrides:
storeFields in class Prediction
Returns:
the created field map

tokenCount

public int tokenCount()
Returns the number of tokens in this extraction. This will only be reliable if a constructor is used to give the first token and operations such as addToken are used for each further token. Omitted when serializing so it cannot be restored.

Returns:
the value of the attribute


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.