|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.classify.ClassTrain
public class ClassTrain
Classifies a list of files, training the classifier on each error if the
true class is provided. See
classifyAndTrain(FieldContainer, File, String, String)
for a
description of input and output formats.
This class does not calculate statistics; you can do so be calling e.g.
tail -q --lines 500 FILENAME|grep -v "|+"|wc
on the
output serialized in DelimSepValues
format to
get the number of errors during the last 500 classifications (assuming that
classes to not start with a "+" and that the true class is known for all
files).
Instances of this class are not thread-safe and must be synchronized externally, if required.
Field Summary | |
---|---|
static String |
CONFIG_FILE_EXT
Configuration key: The extension to append to file names given via the File key (if any). |
static String |
CONFIG_SUFFIX_TEXT
Configuration suffix used for text classification--specific settings. |
static String |
CORRECT_CLASS
Value of the KEY_CLASSIFICATION field for correct predictions:
"+". |
static String |
KEY_CLASS
Serialization key for the correct class. |
static String |
KEY_CLASSIFICATION
Serialization key for the result of the classification: either CORRECT_CLASS if the correct class was predicted or the
wrongly predicted class in case of an error. |
static String |
KEY_FILE
Serialization key for the name of the file to classify. |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
ClassTrain()
Creates a new instance using a default extension and the standard configuration . |
|
ClassTrain(String outExt)
Creates a new instance using the standard configuration . |
|
ClassTrain(String outExt,
TiesConfiguration conf)
Creates a new instance from the provided configuration. |
|
ClassTrain(String outExt,
TiesConfiguration conf,
FeatureExtractor featureExt,
Tuner myTuner,
String fileExt,
String classifierFile,
boolean doReUse,
boolean doStore,
boolean doTestOnly)
Creates a new instance. |
Method Summary | |
---|---|
FieldContainer |
classifyAndTrain(FieldContainer filesToClassify,
File directory,
String baseName,
String charset)
Classifies a list of files, training the classifier on each error if the true class is known. |
void |
close(int errorCount)
Closes this instance, releasing all resources and stopping any background threads. |
protected void |
doProcess(Reader reader,
Writer writer,
ContextMap context)
Delegates to classifyAndTrain(FieldContainer, File, String, String) . |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process, toString |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CONFIG_FILE_EXT
public static final String CONFIG_SUFFIX_TEXT
public static final String KEY_FILE
public static final String KEY_CLASS
public static final String KEY_CLASSIFICATION
CORRECT_CLASS
if the correct class was predicted or the
wrongly predicted class in case of an error.
public static final String CORRECT_CLASS
KEY_CLASSIFICATION
field for correct predictions:
"+".
Constructor Detail |
---|
public ClassTrain() throws ProcessingException
standard configuration
.
ProcessingException
- if an error occurs while initializing this
instancepublic ClassTrain(String outExt) throws ProcessingException
standard configuration
.
outExt
- the extension to use for output files
ProcessingException
- if an error occurs while initializing this
instancepublic ClassTrain(String outExt, TiesConfiguration conf) throws ProcessingException
outExt
- the extension to use for output filesconf
- used to configure this instance; if null
,
the standard configuration is used
ProcessingException
- if an error occurs while initializing this
instancepublic ClassTrain(String outExt, TiesConfiguration conf, FeatureExtractor featureExt, Tuner myTuner, String fileExt, String classifierFile, boolean doReUse, boolean doStore, boolean doTestOnly)
outExt
- the extension to use for output filesconf
- used to configure this instance; if null
,
the standard configuration is usedfeatureExt
- used to convert texts into feature vectorsmyTuner
- used to control TUNE training (iterative training)fileExt
- the extension to append to file names given via the
File key; null
or the empty string
if none should be appendedclassifierFile
- name of the file used for storing the classifierdoReUse
- whether to re-use classifiers between several runs
(incl. classifiers stored in the classifierFile
, if exists)doStore
- whether to store the final classifier in the
classifierFile
doTestOnly
- If this is set to true
, the classifier
will be used only for prediction -- no training will take placeMethod Detail |
---|
public FieldContainer classifyAndTrain(FieldContainer filesToClassify, File directory, String baseName, String charset) throws IOException, ProcessingException
filesToClassify
- a field container of the files to process; each
entry must contain a KEY_FILE
field giving the name of the file
to classify; if it also contains a KEY_CLASS
field giving the
true class of the file, the classifier is trained in case of an errordirectory
- file names are relative to this directory; if
null
they are relative to the working directorybaseName
- the base name of the file listing the files to classifycharset
- the character set of the files to process
KEY_CLASSIFICATION
field: CORRECT_CLASS
in
case of a classification that is known to be correct (this requires that
the true class is given in the KEY_CLASS
field, otherwise we
don't know whether a prediction is correct); the name of the predicted
class otherwise
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void close(int errorCount) throws IOException
close
in interface Closeable
errorCount
- the number of errors (exceptions) that occurred during
calls to this instance (0 if none)
IOException
- if an I/O error occursprotected void doProcess(Reader reader, Writer writer, ContextMap context) throws IOException, ProcessingException
classifyAndTrain(FieldContainer, File, String, String)
.
doProcess
in class TextProcessor
reader
- the FieldContainer
of files to classify is read
from this reader; not closed by this methodwriter
- the resulting FieldContainer
containing
classification results is serialized to this writer; not closed by
this methodcontext
- a map of objects that are made available for processing;
the IOUtils.KEY_LOCAL_CHARSET
is used to determine the character
set of the listed files; the TextProcessor.KEY_DIRECTORY
File
determines the source of relative file names, if given
(otherwise the current working directory is used)
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |