de.fu_berlin.ties.classify
Class TextFilter

java.lang.Object
  extended by de.fu_berlin.ties.ConfigurableProcessor
      extended by de.fu_berlin.ties.CollectingProcessor
          extended by de.fu_berlin.ties.classify.TextFilter
All Implemented Interfaces:
Closeable, Processor

public class TextFilter
extends CollectingProcessor

A text filter provides a simple API for classifying text files. All instances of this class share a common TrainableClassifier which will be initialized by the first created instance. This class it meant to be used with NailGun to avoid the cost of creating the virtual machine and to allow re-using the same classifier instance between multiple calls.

The classifier is configured from the provided configuration, using the ClassTrain.CONFIG_SUFFIX_TEXT suffix to allow text-specific settings. The classes to consider for classification are read from the CONFIG_CLASSES parameter. The probability of the very first class will be returned as "score".

This class is meant to be invoked on the command line as "filter" goal to classify or train a text file. It supports two commands, "classify" and "train":

classify FILENAME
Classifies the given file, writing a single line of output to System.out:
class=PREDICTED-CLASS score=PROB.-OF-FIRST-CLASS prob==PROB.-OF-PREDICTED-CLASS
train TRUE-CLASS FILENAME
Trains the given file as TRUE-CLASS (must be one of the configured classes).

Additional parameters after the required arguments are allowed but ignored; other commands will be treated as errors.

Version:
$Revision: 1.7 $, $Date: 2006/10/21 16:03:55 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CONFIG_CLASSES
          Configuration key: Names of the classes used to filter text.
 
Constructor Summary
TextFilter()
          Creates a new instance, configured from the default configuration.
TextFilter(TiesConfiguration conf)
          Creates a new instance.
 
Method Summary
 void process(List<String> collected, ContextMap context)
          Processes the collected input arguments.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class de.fu_berlin.ties.CollectingProcessor
close, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_CLASSES

public static final String CONFIG_CLASSES
Configuration key: Names of the classes used to filter text. The probability of the very first class will be returned as "score".

See Also:
Constant Field Values
Constructor Detail

TextFilter

public TextFilter()
           throws ProcessingException
Creates a new instance, configured from the default configuration.

Throws:
ProcessingException - if the configured classifier instance cannot be instantiated

TextFilter

public TextFilter(TiesConfiguration conf)
           throws ProcessingException
Creates a new instance.

Parameters:
conf - used to configure this instance
Throws:
ProcessingException - if the configured classifier instance cannot be instantiated
Method Detail

process

public void process(List<String> collected,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Processes the collected input arguments.

Specified by:
process in class CollectingProcessor
Parameters:
collected - a list of Strings containing the collected input arguments
context - a map of objects that are made available for processing; will be empty when called from the CollectingProcessor.close(int) method in this class
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.