de.fu_berlin.ties.filter
Class FilteringTokenWalker

java.lang.Object
  extended by de.fu_berlin.ties.xml.dom.TokenWalker
      extended by de.fu_berlin.ties.filter.FilteringTokenWalker
Direct Known Subclasses:
TrainableFilteringTokenWalker

public class FilteringTokenWalker
extends TokenWalker

A token walker that only invokes a provided TokenProcessor on the subset of tokens that are children of an element accepted by a provided ElementFilter.

Instances of this class are not thread-safe.

Version:
$Revision: 1.17 $, $Date: 2006/10/21 16:04:19 $, $Author: siefkes $
Author:
Christian Siefkes

Constructor Summary
FilteringTokenWalker(TokenProcessor processor, TokenizerFactory tFactory, ElementFilter elementFilter, SkipHandler sHandler)
          Creates a new instance.
 
Method Summary
 Set getAcceptedElements()
          Returns the set of elements that have been accepted by the filter in the current document.
protected  ElementFilter getFilter()
          Returns the element filter used by this instance.
 Set getRejectedElements()
          Returns the set of elements that have been rejected by the filter in the current document.
protected  boolean handleAccept(Element element, Element filteredElement, boolean decision)
          This method can be overwritten by subclasses to modify decisions of the element filter.
protected  void processToken(Element element, String left, TokenDetails details, String right, ContextMap context)
          Processes a token in an XML element by delegating to the configured TokenProcessor.
 String toString()
          Returns a string representation of this object.
 void walk(Document document, ContextMap context)
          Walks through the contents of an XML document, tokenizing the textual contents.
 
Methods inherited from class de.fu_berlin.ties.xml.dom.TokenWalker
endElementHook, processCollectedText, startElementHook, trailingWhitespaceHook, walk
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

FilteringTokenWalker

public FilteringTokenWalker(TokenProcessor processor,
                            TokenizerFactory tFactory,
                            ElementFilter elementFilter,
                            SkipHandler sHandler)
Creates a new instance.

Parameters:
processor - used to process the tokens
tFactory - used to instantiate tokenizers
elementFilter - the element filter to use
sHandler - a handler that is called whenever some tokens are skipped; may be null
Method Detail

getAcceptedElements

public Set getAcceptedElements()
Returns the set of elements that have been accepted by the filter in the current document.

Returns:
the accepted elements

getRejectedElements

public Set getRejectedElements()
Returns the set of elements that have been rejected by the filter in the current document.

Returns:
the rejected elements

getFilter

protected ElementFilter getFilter()
Returns the element filter used by this instance.

Returns:
the used element filter

handleAccept

protected boolean handleAccept(Element element,
                               Element filteredElement,
                               boolean decision)
                        throws ProcessingException
This method can be overwritten by subclasses to modify decisions of the element filter. The standard behavior is to accept the decision as is.

Parameters:
element - the element to test
filteredElement - the element that was actually filtered (element or a parent), or null if the decision had been cached (no filtering took place)
decision - the decision of the element filer
Returns:
the revised decision
Throws:
ProcessingException - if an error occurs while revising the decision

processToken

protected void processToken(Element element,
                            String left,
                            TokenDetails details,
                            String right,
                            ContextMap context)
                     throws IOException,
                            ProcessingException
Processes a token in an XML element by delegating to the configured TokenProcessor.

Overrides:
processToken in class TokenWalker
Parameters:
element - the element containing the token
left - the textual contents of the element to the left of the token (in case of mixed contents, only up to the last preceding child element, if any)
details - details about the token to process
right - the textual contents of the element to the right of the token (in case of mixed contents, only up to the next following child element, if any)
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

walk

public void walk(Document document,
                 ContextMap context)
          throws IOException,
                 ProcessingException
Walks through the contents of an XML document, tokenizing the textual contents. The resulting tokens are handed over to the stored TokenProcessor.

Overrides:
walk in class TokenWalker
Parameters:
document - the document to walk through
context - a map of objects that are made available for processing; might be null if not requred by the token processor
Throws:
IOException - might be thrown by the token processor
ProcessingException - might be thrown by the token processor

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TokenWalker
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.