de.fu_berlin.ties.xml.dom
Class TokenWalker

java.lang.Object
  extended by de.fu_berlin.ties.xml.dom.TokenWalker
Direct Known Subclasses:
FilteringTokenWalker

public class TokenWalker
extends Object

Walks through a document, handing all textual tokens over to a TokenProcessor.

Instances of this class are thread-safe iff the provided TokenProcessor is -- but subclass implementations might be not.

Version:
$Revision: 1.14 $, $Date: 2006/10/21 16:04:33 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
TokenWalker(TokenProcessor processor, TokenizerFactory tFactory)
          Creates a new instance.
 
Method Summary
protected  void endElementHook(Element element, ContextMap context)
          Empty method that can be overwritten by subclasses to handle the end of elements in a special way.
protected  void processCollectedText(Element element, CharSequence collectedText, TokenCounter tokenCounter, TextTokenizer tokenizer, ContextMap context)
          Helper method that tokenizes the collected textual contents of an element and delegates to the token processor for each of them.
protected  void processToken(Element element, String left, TokenDetails details, String right, ContextMap context)
          Processes a token in an XML element by delegating to the configured TokenProcessor.
protected  void startElementHook(Element element, ContextMap context)
          Empty method that can be overwritten by subclasses to handle the start of elements in a special way.
 String toString()
          Returns a string representation of this object.
protected  void trailingWhitespaceHook(ContextMap context)
          Empty method that can be overwritten by subclasses to handle whitespace at the end of element content in a special way.
 void walk(Document document, ContextMap context)
          Walks through the contents of an XML document, tokenizing the textual contents.
protected  void walk(Element element, TokenCounter tokenCounter, TextTokenizer tokenizer, ContextMap context)
          Walks through the contents of a node, tokenizing textual contents and recursing through nested elements.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TokenWalker

public TokenWalker(TokenProcessor processor,
                   TokenizerFactory tFactory)
Creates a new instance.

Parameters:
processor - used to process the tokens
tFactory - used to instantiate tokenizers
Method Detail

endElementHook

protected void endElementHook(Element element,
                              ContextMap context)
                       throws IOException,
                              ProcessingException
Empty method that can be overwritten by subclasses to handle the end of elements in a special way.

Parameters:
element - the element
context - a map of objects that are made available for processing
Throws:
IOException - might be thrown if an I/O error occurs
ProcessingException - might be thrown if an error occurs during processing

processCollectedText

protected void processCollectedText(Element element,
                                    CharSequence collectedText,
                                    TokenCounter tokenCounter,
                                    TextTokenizer tokenizer,
                                    ContextMap context)
                             throws IOException,
                                    ProcessingException
Helper method that tokenizes the collected textual contents of an element and delegates to the token processor for each of them.

Parameters:
element - the element to walk through
collectedText - the collected textual contents (limited to the text between/before/after child elements in case of mixed content)
tokenCounter - keeps track of the encountered tokens
tokenizer - used to tokenize text
context - a map of objects that are made available for processing
Throws:
IOException - might be thrown by the token processor
ProcessingException - might be thrown by the token processor

processToken

protected void processToken(Element element,
                            String left,
                            TokenDetails details,
                            String right,
                            ContextMap context)
                     throws IOException,
                            ProcessingException
Processes a token in an XML element by delegating to the configured TokenProcessor.

Parameters:
element - the element containing the token
left - the textual contents of the element to the left of the token (in case of mixed contents, only up to the last preceding child element, if any)
details - details about the token to process
right - the textual contents of the element to the right of the token (in case of mixed contents, only up to the next following child element, if any)
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

startElementHook

protected void startElementHook(Element element,
                                ContextMap context)
                         throws IOException,
                                ProcessingException
Empty method that can be overwritten by subclasses to handle the start of elements in a special way.

Parameters:
element - the element
context - a map of objects that are made available for processing
Throws:
IOException - might be thrown if an I/O error occurs
ProcessingException - might be thrown if an error occurs during processing

trailingWhitespaceHook

protected void trailingWhitespaceHook(ContextMap context)
                               throws IOException,
                                      ProcessingException
Empty method that can be overwritten by subclasses to handle whitespace at the end of element content in a special way.

Parameters:
context - a map of objects that are made available for processing
Throws:
IOException - might be thrown if an I/O error occurs
ProcessingException - might be thrown if an error occurs during processing

walk

public void walk(Document document,
                 ContextMap context)
          throws IOException,
                 ProcessingException
Walks through the contents of an XML document, tokenizing the textual contents. The resulting tokens are handed over to the stored TokenProcessor.

Parameters:
document - the document to walk through
context - a map of objects that are made available for processing; might be null if not requred by the token processor
Throws:
IOException - might be thrown by the token processor
ProcessingException - might be thrown by the token processor

walk

protected void walk(Element element,
                    TokenCounter tokenCounter,
                    TextTokenizer tokenizer,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Walks through the contents of a node, tokenizing textual contents and recursing through nested elements. The registered token processor is called for each token.

Parameters:
element - the element to walk through
tokenCounter - keeps track of the encountered tokens
tokenizer - used to tokenize text
context - a map of objects that are made available for processing
Throws:
IOException - might be thrown by the token processor
ProcessingException - might be thrown by the token processor

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.