|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.classify.ClassTrain
public class ClassTrain
Classifies a list of files, training the classifier on each error. See
classifyAndTrain(FieldContainer, File, String)
for a description of
input and output formats.
This class does not calculate statistics; you can do so be calling e.g.
tail -q --lines 500 FILENAME|grep -v "|+"|wc
on the
output serialized in DelimSepValues
format to
get the number of errors during the last 500 classifications (assuming that
classes to not start with a "+").
Instances of this class are thread-safe.
Field Summary | |
---|---|
static String |
CONFIG_FILE_EXT
Configuration key: The extension to append to file names given via the File key (if any). |
static String |
CORRECT_CLASS
Value of the KEY_CLASSIFICATION field for correct predictions:
"+". |
static String |
KEY_CLASS
Serialization key for the correct class. |
static String |
KEY_CLASSIFICATION
Serialization key for the result of the classification: either CORRECT_CLASS if the correct class was predicted or the
wrongly predicted class in case of an error. |
static String |
KEY_FILE
Serialization key for the name of the file to classify. |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
ClassTrain()
Creates a new instance using a default extension and the standard configuration . |
|
ClassTrain(String outExt)
Creates a new instance using the standard configuration . |
|
ClassTrain(String outExt,
TiesConfiguration conf)
Creates a new instance from the provided configuration. |
|
ClassTrain(String outExt,
TiesConfiguration conf,
TokenizerFactory factory,
String fileExt)
Creates a new instance. |
Method Summary | |
---|---|
FieldContainer |
classifyAndTrain(FieldContainer filesToClassify,
File directory,
String charset)
Classifies a list of files, training the classifier on each error. |
protected void |
doProcess(Reader reader,
Writer writer,
ContextMap context)
Delegates to classifyAndTrain(FieldContainer, File, String) . |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process, toString |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CONFIG_FILE_EXT
public static final String KEY_FILE
public static final String KEY_CLASS
public static final String KEY_CLASSIFICATION
CORRECT_CLASS
if the correct class was predicted or the
wrongly predicted class in case of an error.
public static final String CORRECT_CLASS
KEY_CLASSIFICATION
field for correct predictions:
"+".
Constructor Detail |
---|
public ClassTrain()
standard configuration
.
public ClassTrain(String outExt)
standard configuration
.
outExt
- the extension to use for output filespublic ClassTrain(String outExt, TiesConfiguration conf)
outExt
- the extension to use for output filesconf
- used to configure this instance; if null
,
the standard configuration is usedpublic ClassTrain(String outExt, TiesConfiguration conf, TokenizerFactory factory, String fileExt)
outExt
- the extension to use for output filesconf
- used to configure this instance; if null
,
the standard configuration is usedfactory
- used to create tokenizersfileExt
- the extension to append to file names given via the
File key; null
or the empty string
if none should be appendedMethod Detail |
---|
public FieldContainer classifyAndTrain(FieldContainer filesToClassify, File directory, String charset) throws IOException, ProcessingException
filesToClassify
- a field container of the files to process; each
entry must contain a KEY_FILE
field giving the name of the file
to classify and KEY_CLASS
giving the true class of the filedirectory
- file names are relative to this directory; if
null
they are relative to the working directorycharset
- the character set of the files to process
KEY_CLASSIFICATION
field: CORRECT_CLASS
in
case of a correct classification, the name of the wrongly predicted
class otherwise
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingprotected void doProcess(Reader reader, Writer writer, ContextMap context) throws IOException, ProcessingException
classifyAndTrain(FieldContainer, File, String)
.
doProcess
in class TextProcessor
reader
- the FieldContainer
of files to classify is read
from this reader; not closed by this methodwriter
- the resulting FieldContainer
containing
classification results is serialized to this writer; not closed by
this methodcontext
- a map of objects that are made available for processing;
the IOUtils.KEY_LOCAL_CHARSET
is used to determine the character
set of the listed files; the TextProcessor.KEY_DIRECTORY
File
determines the source of relative file names, if given
(otherwise the current working directory is used)
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |