de.fu_berlin.ties.context
Class SimpleRepresentation

java.lang.Object
  extended by de.fu_berlin.ties.context.Representation
      extended by de.fu_berlin.ties.context.AbstractRepresentation
          extended by de.fu_berlin.ties.context.SimpleRepresentation

public class SimpleRepresentation
extends AbstractRepresentation

A simple representation of an text in an element in an XML document. Instances of this class are thread-safe.

Version:
$Revision: 1.2 $, $Date: 2004/09/02 16:30:14 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
 
Fields inherited from class de.fu_berlin.ties.context.AbstractRepresentation
CONFIG_RECOGN_NUM, CONFIG_SPLIT_MAXIMUM, CONFIG_STORE_NTH
 
Constructor Summary
SimpleRepresentation()
          Creates a new instance based on the standard configuration.
SimpleRepresentation(int recogNum, int splitMax, int n, String outCharset, TextTokenizer textTokenizer)
          Creates a new instance.
SimpleRepresentation(TiesConfiguration config)
          Creates a new instance based on the provided configuration.
SimpleRepresentation(TiesConfiguration config, String suffix)
          Creates a new instance based on the provided configuration.
 
Method Summary
protected  void addFeature(FeatureVector features, String prefix, String value)
          Creates a feature and adds it to a feature vector.
protected  void addText(FeatureVector features, String prefix, String text)
          Adds feature(s) representing text to a feature vector, using the instance tokenizer for splitting the text into tokens.
protected  FeatureVector doBuildContext(Element element, String leftText, String mainText, String rightText, PriorRecognitions priorRecognitions, Map featureCache, String logPurpose)
          Builds the context representation of text in an element.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class de.fu_berlin.ties.context.AbstractRepresentation
buildContext, getSplitMaximum, getStoreN
 
Methods inherited from class de.fu_berlin.ties.context.Representation
buildContext, buildContext, createRecognitionBuffer, getRecognitionNumber
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SimpleRepresentation

public SimpleRepresentation()
                     throws ProcessingException
Creates a new instance based on the standard configuration.

Throws:
ProcessingException - if an error occurs while initializing this instance

SimpleRepresentation

public SimpleRepresentation(TiesConfiguration config)
                     throws ProcessingException
Creates a new instance based on the provided configuration.

Parameters:
config - used to configure this instance
Throws:
ProcessingException - if an error occurs while initializing this instance

SimpleRepresentation

public SimpleRepresentation(TiesConfiguration config,
                            String suffix)
                     throws ProcessingException
Creates a new instance based on the provided configuration.

Parameters:
config - used to configure this instance
suffix - this suffix can be appended to the used configuration parameters to give values that are specific for this instance; may be null
Throws:
ProcessingException - if an error occurs while initializing this instance

SimpleRepresentation

public SimpleRepresentation(int recogNum,
                            int splitMax,
                            int n,
                            String outCharset,
                            TextTokenizer textTokenizer)
Creates a new instance.

Parameters:
recogNum - the number of preceding recognitions to represent
splitMax - the maximum number of subsequences to keep when a feature value must be split (at whitespace)
n - Each n-th context representation is stored if > 0; otherwise no representation is stored
outCharset - the output character set to use (only used to store some configurations for inspection purposes, if n > 0); if null, the default charset of the current platform is used
textTokenizer - the tokenizer to use
Method Detail

addFeature

protected void addFeature(FeatureVector features,
                          String prefix,
                          String value)
Creates a feature and adds it to a feature vector. The feature is created by joining prefix and value, separated by a colon as separator character.

Parameters:
features - the feature vector to append to
prefix - the prefix of the new feature
value - the main value of the new feature

addText

protected void addText(FeatureVector features,
                       String prefix,
                       String text)
Adds feature(s) representing text to a feature vector, using the instance tokenizer for splitting the text into tokens.

Parameters:
features - the feature vector to append to
prefix - the prefix of the new feature(s)
text - to text to tokenize and add

doBuildContext

protected FeatureVector doBuildContext(Element element,
                                       String leftText,
                                       String mainText,
                                       String rightText,
                                       PriorRecognitions priorRecognitions,
                                       Map featureCache,
                                       String logPurpose)
                                throws ClassCastException
Builds the context representation of text in an element. Returns a feature vector of all context features considered relevant for representation.

Specified by:
doBuildContext in class AbstractRepresentation
Parameters:
element - the element whose context should be represented
leftText - textual content to the left of (preceding) mainText, might be empty
mainText - the main textual content to represent, might be empty
rightText - textual content to the right of (following) mainText, might be empty
priorRecognitions - a buffer of the last Recognitions from the document, created by calling Representation.createRecognitionBuffer(); might be null
featureCache - a cache of (local) feature, should be re-used between all calls for the nodes in a single document (but must not be re-used when building the context of nodes in different documents!)
logPurpose - the type of contexts of main interest to the caller (e.g. "Token" or "Sentence"), used for logging
Returns:
a vector of features considered relevant for representation
Throws:
ClassCastException - if the priorRecognitions buffer contains objects that aren't Recognitions

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class AbstractRepresentation
Returns:
a textual representation


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.