de.fu_berlin.ties.xml.dom
Class XMLStripper

java.lang.Object
  extended by de.fu_berlin.ties.ConfigurableProcessor
      extended by de.fu_berlin.ties.TextProcessor
          extended by de.fu_berlin.ties.DocumentReader
              extended by de.fu_berlin.ties.xml.dom.XMLStripper
All Implemented Interfaces:
Processor

public class XMLStripper
extends DocumentReader

An XML stripper converts a XML document to plain text, removing all markup.

This class is thread-safe and can be used to convert several documents in parallel.

Version:
$Revision: 1.4 $, $Date: 2004/08/23 17:11:21 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
XMLStripper()
          Creates a new instance, using a default extension and the standard configuration.
XMLStripper(String outExt)
          Creates a new instance, using the standard configuration.
XMLStripper(String outExt, TiesConfiguration config)
          Creates a new instance.
 
Method Summary
 void process(Document document, Writer writer, ContextMap context)
          Strips all markup from an XML document and stores the resulting plain text.
 
Methods inherited from class de.fu_berlin.ties.DocumentReader
doProcess
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process, process, process, toString
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

XMLStripper

public XMLStripper()
Creates a new instance, using a default extension and the standard configuration.


XMLStripper

public XMLStripper(String outExt)
Creates a new instance, using the standard configuration.

Parameters:
outExt - the extension to use for output files

XMLStripper

public XMLStripper(String outExt,
                   TiesConfiguration config)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
config - used to configure superclasses
Method Detail

process

public void process(Document document,
                    Writer writer,
                    ContextMap context)
             throws IOException
Strips all markup from an XML document and stores the resulting plain text. This method just delegates to DOMUtils.collectText(org.dom4j.Branch, StringBuffer).

Specified by:
process in class DocumentReader
Parameters:
document - the document to read
writer - the writer to write the resulting plain text to; flushed but not closed by this method
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.