de.fu_berlin.ties.text
Class TokenContainer

java.lang.Object
  extended by de.fu_berlin.ties.text.TokenContainer

public class TokenContainer
extends Object

A container that keeps track of the tokens in a document. Instances of this class are not thread-safe; if you want to share a single instance between different thread, you have to ensure proper synchronization.

Version:
$Revision: 1.9 $, $Date: 2006/10/21 16:04:25 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
TokenContainer(TokenizerFactory tFactory)
          Creates a new instance.
 
Method Summary
 void add(String text)
          Adds text to this container.
 int getCount(String token)
          Returns the cardinality of the given token in this container.
 int getFirstTokenInLastIndex()
          Returns the index of the first token of the last added string in the original text (indexing starts with 0).
 int getFirstTokenInLastRep()
          Returns the repetition of the first token of the last added string in the original text (counting starts with 0, as the first occurrence is the "0th repetition").
 String getLast()
          Returns a trimmed and whitespace-normalized representation of the string added this container by the last add(String) operation.
 int getLastCount(String token)
          Returns the cardinality of the given token in the text added by the last add(String) operation.
 boolean isWhitespaceAfterLast()
          Whether there is whitespace after the last added string.
 boolean isWhitespaceBeforeLast()
          Whether there is whitespace before the last added string.
 boolean lastContains(String token)
          Whether the text added by the last add(String) operation contains the specified token.
 Iterator lastIterator()
          Returns an iterator over the word and number tokens added by the last add(String) operation.
 int size()
          Returns the token number of tokens counted by this instances (including duplicates).
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TokenContainer

public TokenContainer(TokenizerFactory tFactory)
Creates a new instance.

Parameters:
tFactory - used to instantiate the employed tokenizer
Method Detail

add

public void add(String text)
Adds text to this container. The specified string is split into a series of tokens, the token count is increased accordingly for all contained tokens.

Parameters:
text - the text to add

getCount

public int getCount(String token)
Returns the cardinality of the given token in this container.

Parameters:
token - the token to check
Returns:
the number of copies of the specified token in this container, >= 0

getFirstTokenInLastIndex

public int getFirstTokenInLastIndex()
Returns the index of the first token of the last added string in the original text (indexing starts with 0).

Returns:
the value of the attribute

getFirstTokenInLastRep

public int getFirstTokenInLastRep()
Returns the repetition of the first token of the last added string in the original text (counting starts with 0, as the first occurrence is the "0th repetition").

Returns:
the value of the attribute

getLastCount

public int getLastCount(String token)
Returns the cardinality of the given token in the text added by the last add(String) operation.

Parameters:
token - the token to check
Returns:
the number of copies of the specified token in the text added last, >= 0

getLast

public String getLast()
Returns a trimmed and whitespace-normalized representation of the string added this container by the last add(String) operation. Starting and trailing whitespace is removed; each internal whitespace is converted into a single space charater.

Returns:
the normalized representation of the last string added to this container

isWhitespaceAfterLast

public boolean isWhitespaceAfterLast()
Whether there is whitespace after the last added string.

Returns:
true iff there is whitespace after/at the end of the string

isWhitespaceBeforeLast

public boolean isWhitespaceBeforeLast()
Whether there is whitespace before the last added string.

Returns:
true iff there is whitespace before/at the start of the string

lastContains

public boolean lastContains(String token)
Whether the text added by the last add(String) operation contains the specified token.

Parameters:
token - the token to check
Returns:
true iff the specified argument is contained as a word or number token in the last added string.

lastIterator

public Iterator lastIterator()
Returns an iterator over the word and number tokens added by the last add(String) operation. The iterator contains each token only once (no matter how often it occurred in the last string); the tokens are iterated in no particular order.

Returns:
an iterator over the last added tokens

size

public int size()
Returns the token number of tokens counted by this instances (including duplicates).

Returns:
the number of tokens counted

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.