de.fu_berlin.ties.text
Class TokenizingExtractor

java.lang.Object
  extended by de.fu_berlin.ties.text.TokenizingExtractor
All Implemented Interfaces:
FeatureExtractor
Direct Known Subclasses:
FieldTokenizingExtractor

public class TokenizingExtractor
extends Object
implements FeatureExtractor

Uses a tokenizer to convert a text into a feature vector. Each token is stored as a feature, preserving the original order of tokens in a text.

Instances of this class are not thread-safe and must be synchronizing externally, if required.

Version:
$Revision: 1.6 $, $Date: 2006/10/21 16:04:25 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
TokenizingExtractor(TiesConfiguration conf, String suffix)
          Creates a new instance.
 
Method Summary
 FeatureVector buildFeatures(Reader reader)
          Extracts a vector of relevant features from a text sequence.
 TextTokenizer getTokenizer()
          Returns the tokenizer used by this instance.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TokenizingExtractor

public TokenizingExtractor(TiesConfiguration conf,
                           String suffix)
Creates a new instance.

Parameters:
conf - used to configure this instance
suffix - optional suffix for adapting configuration keys if not null
Method Detail

buildFeatures

public FeatureVector buildFeatures(Reader reader)
                            throws IOException
Extracts a vector of relevant features from a text sequence.

Specified by:
buildFeatures in interface FeatureExtractor
Parameters:
reader - a reader containing the text to represent
Returns:
a feature vector representing the input text sequence
Throws:
IOException - if an I/O error occurs while reading the input

getTokenizer

public TextTokenizer getTokenizer()
Returns the tokenizer used by this instance.

Returns:
the value of the attribute

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.