de.fu_berlin.ties.extract
Class Extraction

java.lang.Object
  extended by de.fu_berlin.ties.io.BaseStorable
      extended by de.fu_berlin.ties.classify.Prediction
          extended by de.fu_berlin.ties.extract.Extraction
All Implemented Interfaces:
Recognition, Storable, Cloneable

public class Extraction
extends Prediction
implements Cloneable, Recognition

Extends a Prediction by also storing the extracted text and location data.

Instances of this class are not thread-safe.

Version:
$Revision: 1.22 $, $Date: 2006/10/21 16:04:13 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String KEY_FIRST_TOKEN_REP
          Serialization key for the repetition of the first token.
static String KEY_INDEX
          Serialization key for the index.
static String KEY_TEXT
          Serialization key for the extracted text.
 
Fields inherited from class de.fu_berlin.ties.classify.Prediction
KEY_PR, KEY_PROB, KEY_SOURCE, KEY_TYPE
 
Constructor Summary
Extraction(FieldMap fieldMap)
          Creates a new instance from a field map, fulfilling the Storable contract.
Extraction(String predicted, String extracted)
          Creates a new instance without locating it in a text (using -1 for first token rep + index), setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.
Extraction(String predicted, TokenDetails details)
          Creates a new instance, setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.
Extraction(String predicted, TokenDetails details, Probability prob)
          Creates a new instance, setting the evaluation status to EvalStatus.UNKNOWN.
Extraction(String predicted, TokenDetails details, Probability prob, EvalStatus status)
          Creates a new instance.
 
Method Summary
 void addToken(TokenDetails details, boolean atEnd)
          Adds a token to this extraction, delegating to addToken(TokenDetails, Probability, boolean) with a probability of -1 ("confirmed").
 void addToken(TokenDetails details, Probability prob, boolean atEnd)
          Adds a token to this extraction, recalculating the probability by multiplying the prior probability value with the probability of the new text.
 Extraction clone()
          Creates and returns a deep copy of this object.
 boolean equals(Object obj)
          Indicates whether some other object is "equal to" this one, fulfulling the Object.equals(java.lang.Object) contract.
 int getFirstTokenRep()
          Returns the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"), -1 if unknown or if isFirstTokenRepIgnored() is true.
 int getIndex()
          Returns the index of the first token in the text (indexing starts with 0); or -1 if unknown/irrelevant.
 int getLastIndex()
          Returns the index of the last token in the text (indexing starts with 0); or -1 if unknown/irrelevant.
 String getText()
          Returns the extracted text fragment.
 String getVisibleChars()
          Returns the visible characters of the text fragment (everything except whitespace and control characters).
 int hashCode()
          Returns a hash code value for this object, fulfulling the Object.hashCode() contract.
 boolean hasProperty(Object prop)
          Checks if a specific user-defined property is set for this extraction.
 boolean isFirstTokenRepIgnored()
          Whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions.
 boolean isSealed()
          Whether this extraction has been sealed.
 void modifyProbability(Probability prob)
          Modifies the probability of an extraction.
 void setFirstTokenRep(int newFirstTokenRep)
          Modifies the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition").
 void setFirstTokenRepIgnored(boolean ftRepIgnored)
          Specified whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions.
 void setIndex(int newIndex)
          Overrides the index of the first token in the text (indexing starts with 0).
 void setLastIndex(int newLastIndex)
          Overrides the index of the last token in the text (indexing starts with 0).
 boolean setProperty(Object prop)
          Sets a user-defined property for this extraction.
 void setSealed(boolean newSealed)
          Seals or unseals this extraction.
 FieldMap storeFields()
          Stores all relevant fields of this object in a field map for serialization.
 int tokenCount()
          Returns the number of tokens in this extraction.
 boolean unsetProperty(Object prop)
          Unsets a user-defined property for this extraction.
 
Methods inherited from class de.fu_berlin.ties.classify.Prediction
addProb, getEvalStatus, getProbability, getSource, getType, probCount, removeProb, setEvalStatus, setSource
 
Methods inherited from class de.fu_berlin.ties.io.BaseStorable
toString, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.fu_berlin.ties.context.Recognition
getType
 

Field Detail

KEY_TEXT

public static final String KEY_TEXT
Serialization key for the extracted text.

See Also:
Constant Field Values

KEY_FIRST_TOKEN_REP

public static final String KEY_FIRST_TOKEN_REP
Serialization key for the repetition of the first token.

See Also:
Constant Field Values

KEY_INDEX

public static final String KEY_INDEX
Serialization key for the index.

See Also:
Constant Field Values
Constructor Detail

Extraction

public Extraction(FieldMap fieldMap)
Creates a new instance from a field map, fulfilling the Storable contract. An extraction created this way will be immediately sealed, thus the extracted text cannot be changed.

Parameters:
fieldMap - map containing the serialized fields

Extraction

public Extraction(String predicted,
                  String extracted)
Creates a new instance without locating it in a text (using -1 for first token rep + index), setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH.

Parameters:
predicted - the predicted class
extracted - the (first part) extracted text fragment; must not be null

Extraction

public Extraction(String predicted,
                  TokenDetails details)
Creates a new instance, setting the probability to -1 ("confirmed") and the evaluation status to EvalStatus.TRUTH. Use this constructor to build answer keys.

Parameters:
predicted - the predicted class
details - details about the extracted text fragment or its first token

Extraction

public Extraction(String predicted,
                  TokenDetails details,
                  Probability prob)
Creates a new instance, setting the evaluation status to EvalStatus.UNKNOWN.

Parameters:
predicted - the predicted class
details - details about the extracted text fragment or its first token
prob - the probability of the prediction

Extraction

public Extraction(String predicted,
                  TokenDetails details,
                  Probability prob,
                  EvalStatus status)
Creates a new instance.

Parameters:
predicted - the predicted class
details - details about the extracted text fragment or its first token
prob - the probability of the prediction
status - the evaluation status of this instance
Method Detail

addToken

public void addToken(TokenDetails details,
                     boolean atEnd)
              throws IllegalStateException
Adds a token to this extraction, delegating to addToken(TokenDetails, Probability, boolean) with a probability of -1 ("confirmed"). Use this method when building answer keys.

Parameters:
details - details about the new token
atEnd - whether to add the new token at the end or at the start
Throws:
IllegalStateException - if this extraction is sealed

addToken

public void addToken(TokenDetails details,
                     Probability prob,
                     boolean atEnd)
              throws IllegalStateException
Adds a token to this extraction, recalculating the probability by multiplying the prior probability value with the probability of the new text. Increments the token count by 1.

Parameters:
details - details about the new token
prob - the probability of the new token; might be null if the overall probability of the extraction should not be changed
atEnd - whether to add the new token at the end or at the start
Throws:
IllegalStateException - if this extraction is sealed; or if new and old probabilities/pRs cannot be combined

clone

public Extraction clone()
Creates and returns a deep copy of this object. "Deep" means that there are no dependencies between the two objects -- modifying any fields of the copy will not affect this object, and vice versa. Any user-set properties, however, are only copied, not cloned.

Overrides:
clone in class Object
Returns:
a deep copy of this object

equals

public boolean equals(Object obj)
Indicates whether some other object is "equal to" this one, fulfulling the Object.equals(java.lang.Object) contract. The evaluation status is ignored when checking equality, thus if all other fields of two extractions are equal, this method will return true even if their evaluation states differ. Only the visible characters of the extractions are compared, whitespace and control characters are ignored.

Overrides:
equals in class Prediction
Parameters:
obj - the reference object with which to compare
Returns:
true iff the specified object is an Extraction equal to this instance

getFirstTokenRep

public int getFirstTokenRep()
Returns the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"), -1 if unknown or if isFirstTokenRepIgnored() is true. This is useful to locate this extraction in the original text.

Returns:
the value of the attribute

getIndex

public int getIndex()
Returns the index of the first token in the text (indexing starts with 0); or -1 if unknown/irrelevant.

Returns:
the value of the attribute

getLastIndex

public int getLastIndex()
Returns the index of the last token in the text (indexing starts with 0); or -1 if unknown/irrelevant.

Returns:
the value of the attribute

getText

public String getText()
Returns the extracted text fragment.

Specified by:
getText in interface Recognition
Returns:
the extracted text

getVisibleChars

public String getVisibleChars()
Returns the visible characters of the text fragment (everything except whitespace and control characters).

Returns:
the visible characters

hashCode

public int hashCode()
Returns a hash code value for this object, fulfulling the Object.hashCode() contract.

Overrides:
hashCode in class Prediction
Returns:
a hash code value for this object

hasProperty

public boolean hasProperty(Object prop)
Checks if a specific user-defined property is set for this extraction.

Parameters:
prop - the property to check
Returns:
true iff the property is set

setProperty

public boolean setProperty(Object prop)
Sets a user-defined property for this extraction.

Parameters:
prop - the property to set
Returns:
true iff the property had not been set before

unsetProperty

public boolean unsetProperty(Object prop)
Unsets a user-defined property for this extraction.

Parameters:
prop - the property to unset
Returns:
true iff the property had been set before

isFirstTokenRepIgnored

public boolean isFirstTokenRepIgnored()
Whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions. Defaults to false.

Returns:
the value of the attribute

isSealed

public boolean isSealed()
Whether this extraction has been sealed. The text of a sealed extraction cannot be changed. This means that adding tokens is not allowed after sealing.

Specified by:
isSealed in interface Recognition
Returns:
true iff this extraction is sealed

modifyProbability

public void modifyProbability(Probability prob)
Modifies the probability of an extraction.

Parameters:
prob - the new probability, will be combined with the current token probabilities to calculate the average

setFirstTokenRep

public void setFirstTokenRep(int newFirstTokenRep)
Modifies the repetition of the first token of the extraction in the original text (counting starts with 0, as the first occurrence is the "0th repetition"). This also sets isFirstTokenRepIgnored() to false (since it wouldn't make much sense to update the repetition if you want it to be ignored anyway).

Parameters:
newFirstTokenRep - the new value of the attribute

setFirstTokenRepIgnored

public void setFirstTokenRepIgnored(boolean ftRepIgnored)
Specified whether the repetition of the first token should be ignored, comparing only the text but not the position of extractions.

Parameters:
ftRepIgnored - the new value of the attribute

setIndex

public void setIndex(int newIndex)
Overrides the index of the first token in the text (indexing starts with 0).

Parameters:
newIndex - the value of the attribute; if negative, the index of the first token will be used instead

setLastIndex

public void setLastIndex(int newLastIndex)
Overrides the index of the last token in the text (indexing starts with 0).

Parameters:
newLastIndex - the value of the attribute; if negative, the index of the last token will be used instead

setSealed

public void setSealed(boolean newSealed)
Seals or unseals this extraction. The text of a sealed extraction cannot be changed. This means that adding tokens is not allowed after sealing.

Parameters:
newSealed - the new value of the attribute

storeFields

public FieldMap storeFields()
Stores all relevant fields of this object in a field map for serialization. An equivalent object can be created by calling FieldMap.createObject(Class) on the created field map.

Specified by:
storeFields in interface Storable
Overrides:
storeFields in class Prediction
Returns:
the created field map

tokenCount

public int tokenCount()
Returns the number of tokens in this extraction. This will only be reliable if a constructor is used to give the first token and operations such as addToken are used for each further token. Omitted when serializing so it cannot be restored.

Returns:
the value of the attribute


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.