|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.preprocess.PreProcessor
Preprocesses documents by converting them a suitable XML format and adding lingustic information. Instances of this class are thread-safe.
Field Summary | |
static String |
CONFIG_HTMLCONV_COMMAND
Configuration key prefix: command name and arguments of an external converter from a specified type to HTML. |
static String |
CONFIG_PREPROCESS_TEXT
Configuration key: Whether plain text is preprocessed to recognize and reformat definition lists. |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
PreProcessor(String outExt)
Creates and configured a new instance, using the standard configuration. |
|
PreProcessor(String outExt,
TiesConfiguration config)
Creates and configured a new instance. |
Method Summary | |
String |
cleanHTML(String input)
Converts HTML input to a clean XHTML representation, if necessary. |
protected void |
doProcess(Reader reader,
Writer writer,
ContextMap context)
Preprocesses the contents of a file. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
getOutFileExt, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
getConfig |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final String CONFIG_HTMLCONV_COMMAND
public static final String CONFIG_PREPROCESS_TEXT
Constructor Detail |
public PreProcessor(String outExt)
outExt
- the extension to use for output filespublic PreProcessor(String outExt, TiesConfiguration config)
outExt
- the extension to use for output filesconfig
- used to configure superclassesMethod Detail |
public final String cleanHTML(String input) throws IOException
JTidy
for checking and cleaning the HTML code.
input
- the HTML to tidy
IOException
- if the I/O goes wrongprotected final void doProcess(Reader reader, Writer writer, ContextMap context) throws IOException, ProcessingException
doProcess
in class TextProcessor
reader
- a reader containing the text to preprocess; not closed
by this methodwriter
- a writer used to store the preprocessed text; flushed
but not closed by this methodcontext
- a map of objects that are made available for processing;
the ContentType.KEY_MIME_TYPE
should to mapped to the MIME type
of the document
IOException
- if an I/O error occurred
ProcessingException
- if the file couldn't be parsed, e.g. due to
an error in the XML inputpublic String toString()
toString
in class TextProcessor
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |