de.fu_berlin.ties.context
Class Representation

java.lang.Object
  extended by de.fu_berlin.ties.context.Representation
All Implemented Interfaces:
FeatureExtractor
Direct Known Subclasses:
AbstractRepresentation

public abstract class Representation
extends Object
implements FeatureExtractor

Abstract class that manages context representations for entity recognition and information extraction. Subclasses must implement the buildContext(Element, String, String, String, PriorRecognitions, Map, String) method for building representations.

Version:
$Revision: 1.18 $, $Date: 2006/10/21 16:04:03 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
Representation(int recogNum)
          Creates a new instance.
 
Method Summary
 FeatureVector buildContext(Document document, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose)
          Builds the context representation of a document.
 FeatureVector buildContext(Element element, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose)
          Builds the context representation of an element.
abstract  FeatureVector buildContext(Element element, String leftText, String mainText, String rightText, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose)
          Builds the context representation of text in an element.
 FeatureVector buildFeatures(Reader reader)
          Extracts a vector of relevant features from a text sequence. The input text must contain a well-formed XML element, otherwise this method will not work.
 int getRecognitionNumber()
          Returns the number of preceding recognitions to represent.
 PriorRecognitions initDocument(File filename, TokenizerFactory tFactory)
          Initializes the processing of a new document and creates a buffer to be filled with prior Recognitions and passed as argument to the buildContext(Element, String, String, String, PriorRecognitions, Map, String) method.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Representation

public Representation(int recogNum)
Creates a new instance.

Parameters:
recogNum - the number of preceding recognitions to represent
Method Detail

buildContext

public FeatureVector buildContext(Document document,
                                  PriorRecognitions priorRecognitions,
                                  Map<Element,List<LocalFeature>> featureCache,
                                  String logPurpose)
                           throws ClassCastException
Builds the context representation of a document. The default implementation delegates to the buildContext(Element, PriorRecognitions, Map, String) method, using the root element of the document.

Parameters:
document - the XML document whose context should be represented
priorRecognitions - a buffer of the last Recognitions from the document, created by calling initDocument(java.io.File, de.fu_berlin.ties.text.TokenizerFactory); might be null
featureCache - a cache of (local) feature, should be re-used between all calls for the nodes in a single document (but must not be re-used when building the context of nodes in different documents!)
logPurpose - the type of contexts of main interest to the caller (e.g. "Token" or "Sentence"), used for logging
Returns:
a vector of features considered relevant for representation
Throws:
ClassCastException - if the priorRecognitions buffer contains objects that aren't Recognitions

buildContext

public FeatureVector buildContext(Element element,
                                  PriorRecognitions priorRecognitions,
                                  Map<Element,List<LocalFeature>> featureCache,
                                  String logPurpose)
                           throws ClassCastException
Builds the context representation of an element. The default implementation delegates to the buildContext(Element, String, String, String, PriorRecognitions, Map, String) method, using the full textual content of the element as mainText and empty strings as leftText and rightText.

Parameters:
element - the element whose context should be represented
priorRecognitions - a buffer of the last Recognitions from the document, created by calling initDocument(java.io.File, de.fu_berlin.ties.text.TokenizerFactory); might be null
featureCache - a cache of (local) feature, should be re-used between all calls for the nodes in a single document (but must not be re-used when building the context of nodes in different documents!)
logPurpose - the type of contexts of main interest to the caller (e.g. "Token" or "Sentence"), used for logging
Returns:
a vector of features considered relevant for representation
Throws:
ClassCastException - if the priorRecognitions buffer contains objects that aren't Recognitions

buildContext

public abstract FeatureVector buildContext(Element element,
                                           String leftText,
                                           String mainText,
                                           String rightText,
                                           PriorRecognitions priorRecognitions,
                                           Map<Element,List<LocalFeature>> featureCache,
                                           String logPurpose)
                                    throws ClassCastException
Builds the context representation of text in an element. Returns a feature vector of all context features considered relevant for representation.

Parameters:
element - the element whose context should be represented
leftText - textual content to the left of (preceding) mainText, might be empty
mainText - the main textual content to represent, might be empty
rightText - textual content to the right of (following) mainText, might be empty
priorRecognitions - a buffer of the last Recognitions from the document, created by calling initDocument(java.io.File, de.fu_berlin.ties.text.TokenizerFactory); might be null
featureCache - a cache of (local) feature, should be re-used between all calls for the nodes in a single document (but must not be re-used when building the context of nodes in different documents!)
logPurpose - the type of contexts of main interest to the caller (e.g. "Token" or "Sentence"), used for logging
Returns:
a vector of features considered relevant for representation
Throws:
ClassCastException - if the priorRecognitions buffer contains objects that aren't Recognitions

buildFeatures

public FeatureVector buildFeatures(Reader reader)
                            throws IOException,
                                   ProcessingException
Extracts a vector of relevant features from a text sequence. The input text must contain a well-formed XML element, otherwise this method will not work.

Specified by:
buildFeatures in interface FeatureExtractor
Parameters:
reader - a reader containing the text to represent
Returns:
a feature vector representing the input text sequence
Throws:
IOException - if an I/O error occurs while reading the input
ProcessingException - if an error occurs while processing the input

initDocument

public PriorRecognitions initDocument(File filename,
                                      TokenizerFactory tFactory)
                               throws ProcessingException,
                                      IOException
Initializes the processing of a new document and creates a buffer to be filled with prior Recognitions and passed as argument to the buildContext(Element, String, String, String, PriorRecognitions, Map, String) method. The caller must PriorRecognitions.add(Recognition)new recognitions to the buffer but it is not necessary to remove them -- the buffer will automatically delete the oldest recognitions when appropriate.

Parameters:
filename - the name of the file
tFactory - a factory that can be used for creating tokenizers, if required
Returns:
a buffer to be used for collecting prior Recognitions
Throws:
ProcessingException - if an error occurs while starting to process the document
IOException - if an I/O error occurs

getRecognitionNumber

public int getRecognitionNumber()
Returns the number of preceding recognitions to represent.

Returns:
the value of the attibute

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.