|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.DocumentReader
de.fu_berlin.ties.xml.dom.XMLStripper
public class XMLStripper
An XML stripper converts a XML document to plain text, removing all markup.
This class is thread-safe and can be used to convert several documents in parallel.
Field Summary |
---|
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
XMLStripper()
Creates a new instance, using a default extension and the standard configuration. |
|
XMLStripper(String outExt)
Creates a new instance, using the standard configuration. |
|
XMLStripper(String outExt,
boolean stripToXML,
boolean myNormalizeWS,
TiesConfiguration config)
Creates a new instance. |
|
XMLStripper(String outExt,
TiesConfiguration config)
Creates a new instance from the provided configuration. |
Method Summary | |
---|---|
void |
process(Document document,
Writer writer,
ContextMap context)
Strips all markup from an XML document and stores the resulting plain text. |
Methods inherited from class de.fu_berlin.ties.DocumentReader |
---|
doProcess |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process, toString |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public XMLStripper()
public XMLStripper(String outExt)
outExt
- the extension to use for output filespublic XMLStripper(String outExt, TiesConfiguration config)
outExt
- the extension to use for output filesconfig
- used to configure this instancepublic XMLStripper(String outExt, boolean stripToXML, boolean myNormalizeWS, TiesConfiguration config)
outExt
- the extension to use for output filesstripToXML
- if this is set to true
, the output will
be an XML document instead of a plain text, by preserving the root
element (all other elements + attributes are still discarded)myNormalizeWS
- whether to normalize whitespaceconfig
- used to configure superclassesMethod Detail |
---|
public void process(Document document, Writer writer, ContextMap context) throws IOException
DOMUtils.collectText(org.dom4j.Branch, StringBuilder)
.
process
in class DocumentReader
document
- the document to readwriter
- the writer to write the resulting plain text to; flushed
but not closed by this methodcontext
- a map of objects that are made available for processing
IOException
- if an I/O error occurs
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |