de.fu_berlin.ties.preprocess
Class TreeTagger

java.lang.Object
  extended byde.fu_berlin.ties.ConfigurableProcessor
      extended byde.fu_berlin.ties.TextProcessor
          extended byde.fu_berlin.ties.preprocess.TreeTagger
All Implemented Interfaces:
Processor

public class TreeTagger
extends TextProcessor

Integrates the TreeTagger, a linguistic tool for part-of-speech tagging and chunk parsing. This integration brings XML-based input files in a form that can be processed by TreeTagger, runs the external TreeTagger command, converts the output in the augmented text format defined by TIES, inserting tags marking sentences and unifying the original XML markup and the TreeTagger output in a single XML tree. This class is thread-safe.

Version:
$Revision: 1.6 $, $Date: 2004/04/13 07:08:34 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
TreeTagger(String outExt)
          Creates a new instance, using the standard configuration.
TreeTagger(String outExt, TiesConfiguration config)
          Creates a new instance.
 
Method Summary
protected  void doProcess(Reader in, Writer out, ContextMap context)
          Augments the input text with the output of the TreeTagger.
protected  String tagSentences(String input)
          Adds tags to mark the sentences in a document.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TreeTagger

public TreeTagger(String outExt)
Creates a new instance, using the standard configuration.

Parameters:
outExt - the extension to use for output files

TreeTagger

public TreeTagger(String outExt,
                  TiesConfiguration config)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
config - used to configure superclasses
Method Detail

doProcess

protected void doProcess(Reader in,
                         Writer out,
                         ContextMap context)
                  throws IOException,
                         ParsingException
Augments the input text with the output of the TreeTagger.

Specified by:
doProcess in class TextProcessor
Parameters:
in - reader containing the text to process; must contain the textual representation of a well-formed XML document
out - the writer to write the processed text to; the text will be augmented with part-of-speech, lemma, and chunk information, it will be a well-formed XML document (if the input was well-formed)
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurred
ParsingException - if the file couldn't be parsed, e.g. due to an error in the XML input

tagSentences

protected final String tagSentences(String input)
Adds tags to mark the sentences in a document. Only the ends of sentences are tagged by this method by inserted </sent> tags -- the corresponding start tags are later added by the XML adjuster.

Parameters:
input - the text to process
Returns:
the processed tag with </sent> tags added

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TextProcessor
Returns:
a textual representation


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.