de.fu_berlin.ties
Class TextProcessor

java.lang.Object
  extended byde.fu_berlin.ties.ConfigurableProcessor
      extended byde.fu_berlin.ties.TextProcessor
All Implemented Interfaces:
Processor
Direct Known Subclasses:
ClassTrain, DocumentReader, PreProcessor, ReEvaluator, TreeTagger, XMLAdjuster

public abstract class TextProcessor
extends ConfigurableProcessor

Abstract base class for a Processor that operates on text documents. Input is read from a file or URL or Reader, output is written to a file or Writer.

Version:
$Revision: 1.13 $, $Date: 2004/04/13 11:28:50 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CONFIG_POST
          Configuration prefix for post-processors.
static String KEY_DIRECTORY
          Context key referring to the directory of the processed document, if it is a local file.
static String KEY_LOCAL_NAME
          Context key referring to the local name of the processed document.
static String KEY_OUT_DIRECTORY
          Context key referring output directory; if missing, the value of KEY_DIRECTORY is used instead.
static String KEY_URL
          Context key referring to the URL of the processed document, if loaded from an URL.
 
Constructor Summary
TextProcessor(String outExt, TiesConfiguration conf)
          Creates a new instance.
 
Method Summary
protected abstract  void doProcess(Reader reader, Writer writer, ContextMap context)
          Processes the contents of a reader, writing a modified version to a writer.
 String getOutFileExt()
          Returns the extension used for output files.
 void process(File file, Writer writer, ContextMap context)
          Processes the contents of a file, delegating to the process(Reader, Writer, ContextMap) method.
 void process(Reader reader, Writer writer, ContextMap context)
          Delegates to the abstract doProcess(Reader, Writer, ContextMap) method and invokes a post-processor, if configured.
 void process(String inputName)
          Processes a file or URL given as input argument, delegating to the appropriate process method.
 void process(URLConnection urlConn, Writer writer, ContextMap context)
          Processes the contents of an URL connection, delegating to the process(Reader, Writer, ContextMap) method.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_POST

public static final String CONFIG_POST
Configuration prefix for post-processors.

See Also:
Constant Field Values

KEY_LOCAL_NAME

public static final String KEY_LOCAL_NAME
Context key referring to the local name of the processed document.

See Also:
Constant Field Values

KEY_DIRECTORY

public static final String KEY_DIRECTORY
Context key referring to the directory of the processed document, if it is a local file.

See Also:
Constant Field Values

KEY_OUT_DIRECTORY

public static final String KEY_OUT_DIRECTORY
Context key referring output directory; if missing, the value of KEY_DIRECTORY is used instead.

See Also:
Constant Field Values

KEY_URL

public static final String KEY_URL
Context key referring to the URL of the processed document, if loaded from an URL.

See Also:
Constant Field Values
Constructor Detail

TextProcessor

public TextProcessor(String outExt,
                     TiesConfiguration conf)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
conf - used to configure this instance; if null, the standard configuration is used
Method Detail

doProcess

protected abstract void doProcess(Reader reader,
                                  Writer writer,
                                  ContextMap context)
                           throws IOException,
                                  ProcessingException
Processes the contents of a reader, writing a modified version to a writer.

Parameters:
reader - reader containing the text to process; should not be closed by this method
writer - the writer to write the processed text to; might be flushed but not closed by this method; if this method does not use the writer, the underlying file will be deleted afterwards
context - a map of objects that are made available for processing; when called from the implemented process methods in this class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET to the character set of the output writer; from ContentType.KEY_MIME_TYPE to the document's MIME type; from KEY_LOCAL_NAME to the local name (String) and either from KEY_DIRECTORY to the directory (File), in case of a local file) or from KEY_URL to the URL (otherwise) of the processed document
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

getOutFileExt

public String getOutFileExt()
Returns the extension used for output files.

Returns:
the value of the attribute

process

public final void process(Reader reader,
                          Writer writer,
                          ContextMap context)
                   throws IOException,
                          ProcessingException
Delegates to the abstract doProcess(Reader, Writer, ContextMap) method and invokes a post-processor, if configured.

Parameters:
reader - reader containing the text to process; should not be closed by this method
writer - the writer to write the processed text to; might be flushed but not closed by this method; if this method does not use the writer, the underlying file will be deleted afterwards
context - a map of objects that are made available for processing; when called from the implemented process methods in this class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET to the character set of the output writer; from ContentType.KEY_MIME_TYPE to the document's MIME type; from KEY_LOCAL_NAME to the local name (String) and either from KEY_DIRECTORY to the directory (File), in case of a local file) or from KEY_URL to the URL (otherwise) of the processed document
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

process

public final void process(File file,
                          Writer writer,
                          ContextMap context)
                   throws IOException,
                          ProcessingException
Processes the contents of a file, delegating to the process(Reader, Writer, ContextMap) method. Stores a mapping from ContentType.KEY_MIME_TYPE to the document's MIME type in the context.

Parameters:
file - the file to process
writer - the writer to write the processed text to; not closed by this method
context - a map of objects that are made available for processing; must contain a mapping from IOUtils.KEY_LOCAL_CHARSET to the character set to use for local files
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

process

public final void process(String inputName)
                   throws IOException,
                          ProcessingException
Processes a file or URL given as input argument, delegating to the appropriate process method. A warning is logged if the input is neither a readable file nor a readable URL. Stores a mapping from IOUtils.KEY_LOCAL_CHARSET to the character set of the output writer in the created context.

Parameters:
inputName - the name of a readable file or URL to process
Throws:
IOException - if an I/O error occurs during processing
ProcessingException - if an error occurs during processing

process

public final void process(URLConnection urlConn,
                          Writer writer,
                          ContextMap context)
                   throws IOException,
                          ProcessingException
Processes the contents of an URL connection, delegating to the process(Reader, Writer, ContextMap) method. Stores a mapping from ContentType.KEY_MIME_TYPE to the document's MIME type in the context.

Parameters:
urlConn - the URL connection to process
writer - the writer to write the processed text to; not closed by this method
context - a map of objects that are made available for processing; must contain a mapping from IOUtils.KEY_LOCAL_CHARSET to the character set to use for local files
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

toString

public String toString()
Returns a string representation of this object.

Returns:
a textual representation


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.