de.fu_berlin.ties.classify
Class ClassTrain

java.lang.Object
  extended by de.fu_berlin.ties.ConfigurableProcessor
      extended by de.fu_berlin.ties.TextProcessor
          extended by de.fu_berlin.ties.classify.ClassTrain
All Implemented Interfaces:
Processor

public class ClassTrain
extends TextProcessor

Classifies a list of files, training the classifier on each error. See classifyAndTrain(FieldContainer, File, String) for a description of input and output formats.

This class does not calculate statistics; you can do so be calling e.g. tail -q --lines 500 FILENAME|grep -v "|+"|wc on the output serialized in DelimSepValues format to get the number of errors during the last 500 classifications (assuming that classes to not start with a "+").

Instances of this class are thread-safe.

Version:
$Revision: 1.13 $, $Date: 2004/11/17 09:15:10 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CONFIG_FILE_EXT
          Configuration key: The extension to append to file names given via the File key (if any).
static String CORRECT_CLASS
          Value of the KEY_CLASSIFICATION field for correct predictions: "+".
static String KEY_CLASS
          Serialization key for the correct class.
static String KEY_CLASSIFICATION
          Serialization key for the result of the classification: either CORRECT_CLASS if the correct class was predicted or the wrongly predicted class in case of an error.
static String KEY_FILE
          Serialization key for the name of the file to classify.
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
ClassTrain()
          Creates a new instance using a default extension and the standard configuration.
ClassTrain(String outExt)
          Creates a new instance using the standard configuration.
ClassTrain(String outExt, TiesConfiguration conf)
          Creates a new instance from the provided configuration.
ClassTrain(String outExt, TiesConfiguration conf, TokenizerFactory factory, String fileExt)
          Creates a new instance.
 
Method Summary
 FieldContainer classifyAndTrain(FieldContainer filesToClassify, File directory, String charset)
          Classifies a list of files, training the classifier on each error.
protected  void doProcess(Reader reader, Writer writer, ContextMap context)
          Delegates to classifyAndTrain(FieldContainer, File, String).
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process, process, process, toString
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_FILE_EXT

public static final String CONFIG_FILE_EXT
Configuration key: The extension to append to file names given via the File key (if any).

See Also:
Constant Field Values

KEY_FILE

public static final String KEY_FILE
Serialization key for the name of the file to classify.

See Also:
Constant Field Values

KEY_CLASS

public static final String KEY_CLASS
Serialization key for the correct class.

See Also:
Constant Field Values

KEY_CLASSIFICATION

public static final String KEY_CLASSIFICATION
Serialization key for the result of the classification: either CORRECT_CLASS if the correct class was predicted or the wrongly predicted class in case of an error.

See Also:
Constant Field Values

CORRECT_CLASS

public static final String CORRECT_CLASS
Value of the KEY_CLASSIFICATION field for correct predictions: "+".

See Also:
Constant Field Values
Constructor Detail

ClassTrain

public ClassTrain()
Creates a new instance using a default extension and the standard configuration.


ClassTrain

public ClassTrain(String outExt)
Creates a new instance using the standard configuration.

Parameters:
outExt - the extension to use for output files

ClassTrain

public ClassTrain(String outExt,
                  TiesConfiguration conf)
Creates a new instance from the provided configuration.

Parameters:
outExt - the extension to use for output files
conf - used to configure this instance; if null, the standard configuration is used

ClassTrain

public ClassTrain(String outExt,
                  TiesConfiguration conf,
                  TokenizerFactory factory,
                  String fileExt)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
conf - used to configure this instance; if null, the standard configuration is used
factory - used to create tokenizers
fileExt - the extension to append to file names given via the File key; null or the empty string if none should be appended
Method Detail

classifyAndTrain

public FieldContainer classifyAndTrain(FieldContainer filesToClassify,
                                       File directory,
                                       String charset)
                                throws IOException,
                                       ProcessingException
Classifies a list of files, training the classifier on each error.

Parameters:
filesToClassify - a field container of the files to process; each entry must contain a KEY_FILE field giving the name of the file to classify and KEY_CLASS giving the true class of the file
directory - file names are relative to this directory; if null they are relative to the working directory
charset - the character set of the files to process
Returns:
a field container of the classification results; in addition to the fields given above, each entry will contain the classification result in a KEY_CLASSIFICATION field: CORRECT_CLASS in case of a correct classification, the name of the wrongly predicted class otherwise
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

doProcess

protected void doProcess(Reader reader,
                         Writer writer,
                         ContextMap context)
                  throws IOException,
                         ProcessingException
Delegates to classifyAndTrain(FieldContainer, File, String).

Specified by:
doProcess in class TextProcessor
Parameters:
reader - the FieldContainer of files to classify is read from this reader; not closed by this method
writer - the resulting FieldContainer containing classification results is serialized to this writer; not closed by this method
context - a map of objects that are made available for processing; the IOUtils.KEY_LOCAL_CHARSET is used to determine the character set of the listed files; the TextProcessor.KEY_DIRECTORY File determines the source of relative file names, if given (otherwise the current working directory is used)
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.