de.fu_berlin.ties.xml.dom
Class TokenWalker

java.lang.Object
  extended byde.fu_berlin.ties.xml.dom.TokenWalker

public class TokenWalker
extends Object

Walks through an document, handing all textual tokens over to a TokenProcessor.

Instances of this class are thread-safe iff the provided TokenProcessor is.

Version:
$Revision: 1.3 $, $Date: 2004/04/08 16:07:59 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
TokenWalker(TokenProcessor processor, TokenizerFactory tFactory)
          Creates a new instance.
 
Method Summary
protected  void processCollectedText(Element element, CharSequence collectedText, TokenCounter tokenCounter, TextTokenizer tokenizer, ContextMap context)
          Helper method that tokenizes the collected textual contents of an element and delegates to the token processor for each of them.
 String toString()
          Returns a string representation of this object.
 void walk(Document document, ContextMap context)
          Walks through the contents of an XML document, tokenizing the textual contents.
protected  void walk(Element element, TokenCounter tokenCounter, TextTokenizer tokenizer, ContextMap context)
          Walks through the contents of a node, tokenizing textual contents and recursing through nested elements.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TokenWalker

public TokenWalker(TokenProcessor processor,
                   TokenizerFactory tFactory)
Creates a new instance.

Parameters:
processor - used to process the tokens
tFactory - used to instantiate tokenizers
Method Detail

processCollectedText

protected void processCollectedText(Element element,
                                    CharSequence collectedText,
                                    TokenCounter tokenCounter,
                                    TextTokenizer tokenizer,
                                    ContextMap context)
                             throws IOException,
                                    ProcessingException
Helper method that tokenizes the collected textual contents of an element and delegates to the token processor for each of them.

Parameters:
element - the element to walk through
collectedText - the collected textual contents (limited to the text between/before/after child elements in case of mixed content)
tokenCounter - keeps track of the encountered tokens
tokenizer - used to tokenize text
context - a map of objects that are made available for processing
Throws:
IOException - might be throws by the token processor
ProcessingException - might be throws by the token processor

walk

public final void walk(Document document,
                       ContextMap context)
                throws IOException,
                       ProcessingException
Walks through the contents of an XML document, tokenizing the textual contents. The resulting tokens are handed over to the stored TokenProcessor.

Parameters:
document - the document to walk through
context - a map of objects that are made available for processing; might be null if not requred by the token processor
Throws:
IOException - might be throws by the token processor
ProcessingException - might be throws by the token processor

walk

protected void walk(Element element,
                    TokenCounter tokenCounter,
                    TextTokenizer tokenizer,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Walks through the contents of a node, tokenizing textual contents and recursing through nested elements. The registered token processor is called for each token.

Parameters:
element - the element to walk through
tokenCounter - keeps track of the encountered tokens
tokenizer - used to tokenize text
context - a map of objects that are made available for processing
Throws:
IOException - might be throws by the token processor
ProcessingException - might be throws by the token processor

toString

public String toString()
Returns a string representation of this object.

Returns:
a textual representation


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.