|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.preprocess.TreeTagger
public class TreeTagger
Integrates the TreeTagger, a linguistic tool for part-of-speech tagging and chunk parsing. This integration brings XML-based input files in a form that can be processed by TreeTagger, runs the external TreeTagger command, converts the output in the augmented text format defined by TIE, inserting tags marking sentences and unifying the original XML markup and the TreeTagger output in a single XML tree. This class is thread-safe.
Field Summary |
---|
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
TreeTagger(String outExt)
Creates a new instance, using the standard configuration. |
|
TreeTagger(String outExt,
TiesConfiguration config)
Creates a new instance. |
Method Summary | |
---|---|
protected String |
deleteSpuriousEndTags(String input)
Workaround for a strange TreeTagger bug: the tagger not only tends to omit trailing XML markup (which is not too bad since missing end tags are completed by the XML adjuster), but sometimes it appends spurious ones. |
protected void |
doProcess(Reader in,
Writer out,
ContextMap context)
Augments the input text with the output of the TreeTagger. |
protected String |
tagSentences(String input)
Adds tags to mark the sentences in a document. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public TreeTagger(String outExt)
outExt
- the extension to use for output filespublic TreeTagger(String outExt, TiesConfiguration config)
outExt
- the extension to use for output filesconfig
- used to configure superclassesMethod Detail |
---|
protected String deleteSpuriousEndTags(String input)
input
- the TreeTagger output
protected void doProcess(Reader in, Writer out, ContextMap context) throws IOException, ParsingException
input
text with the output of the TreeTagger.
doProcess
in class TextProcessor
in
- reader containing the text to process; must contain the textual
representation of a well-formed XML documentout
- the writer to write the processed text to; the text will
be augmented with part-of-speech, lemma, and chunk information,
it will be a well-formed XML document (if the input was well-formed)context
- a map of objects that are made available for processing
IOException
- if an I/O error occurred
ParsingException
- if the file couldn't be parsed, e.g. due to an
error in the XML inputprotected final String tagSentences(String input)
input
- the text to process
public String toString()
toString
in class TextProcessor
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |