de.fu_berlin.ties.xml.dom
Class DocumentWalker

java.lang.Object
  extended by de.fu_berlin.ties.xml.dom.DocumentWalker

public class DocumentWalker
extends Object

Walks through a document, handing the elements matched by a NodeFilter over to an ElementProcessor. The textual contents of the document are tokenized; the resulting tokens are stored in a multi-set (Bag).

Version:
$Revision: 1.6 $, $Date: 2004/11/04 15:27:06 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
DocumentWalker(NodeFilter filter, ElementProcessor processor, TokenizerFactory tFactory)
          Creates a new instance.
 
Method Summary
 String toString()
          Returns a string representation of this object.
 void walk(Document document, ContextMap context)
          Walks through the contents of an XML document, tokenizing the textual contents.
protected  void walk(Element element, TokenContainer tokenContainer, ContextMap context)
          Walks through the contents of a node, tokenizing textual contents and recursing through nested elements.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DocumentWalker

public DocumentWalker(NodeFilter filter,
                      ElementProcessor processor,
                      TokenizerFactory tFactory)
Creates a new instance.

Parameters:
filter - a filter used to decide which elements to hand over to the element processor
processor - used to process the elements selected by the filter
tFactory - used to instantiate tokenizers
Method Detail

walk

public final void walk(Document document,
                       ContextMap context)
                throws IOException,
                       ProcessingException
Walks through the contents of an XML document, tokenizing the textual contents. The resulting tokens are stored in a TokenContainer.

Parameters:
document - the document to walk through
context - a map of objects that are made available for processing; might be null if not requred by the element processor
Throws:
IOException - might be thrown by the element processor
ProcessingException - might be thrown by the element processor

walk

protected void walk(Element element,
                    TokenContainer tokenContainer,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Walks through the contents of a node, tokenizing textual contents and recursing through nested elements. For elements matched by the registered node filter, the registered element processor is called -- the full textual content of the matched element and its children is available via TokenContainer.getLast(). For other elements, the textual contents are stored and child elements are walked through and matched recursively.

A successful match stops recursion, i.e. child elements of a matching element are never handed over to the node filter for testing (in the case, only the textual contents are recursively collected).

Parameters:
element - the element to walk through
tokenContainer - container storing all tokens
context - a map of objects that are made available for processing; might be null if not required by the element processor
Throws:
IOException - might be thrown by the element processor
ProcessingException - might be thrown by the element processor

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.