|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.extract.TrainEval
public class TrainEval
Trains an extractor and evaluates extraction quality. Processes shuffle
files (as generated by ShuffleGenerator
contain the files to use for training and evaluation.
For each of these files, a corresponding answer key (*.ans) must exist.
Instances of this class are not thread-safe.
Nested Class Summary | |
---|---|
static class |
TrainEval.Results
An inner class wrapping the results of a training + evaluation run. |
Field Summary | |
---|---|
static String |
CONFIG_FEEDBACK
Configuration key: If true , a fully incremental setup is
used where the trainer is trained on each document after the extractor
processed it. |
static String |
CONFIG_SENTENCE_TUNE
Configuration key: The maximum number of iterations used for TUNE training the sentence classifier; if 0 or negative, the value of CONFIG_TUNE is used. |
static String |
CONFIG_TEST_SPLIT
Configuration key: The percentage of a corpus to use for testing (evaluation). |
static String |
CONFIG_TRAIN_SPLIT
Configuration key: The percentage of a corpus to use for training. |
static String |
CONFIG_TUNE
Configuration key: The maximum number of iterations used for TUNE (train until no error) training; if 1, training is incremental. |
static String |
CONFIG_TUNE_EACH
Configuration key: Whether to measure results after each TUNE iteration or only at the end of training. |
static String |
CONFIG_TUNE_SINCE
Configuration key: The training iteration after which to evaluate results for the first time if CONFIG_TUNE_EACH is enabled. |
static String |
CONFIG_TUNE_STOP
Configuration key: TUNE training is stopped if the training accuracy didn't improve for the specified number of iterations. |
static String |
KEY_ITERATION
Serialization key for the number of the iteration (when TUNE training). |
static String |
KEY_RUN
Serialization key for the number of the run. |
static String |
KEY_TYPE
Serialization key for the type (either "Train" or "Eval"). |
static String |
TYPE_EVAL
Serialization value for the "Eval" type. |
static String |
TYPE_TRAIN
Serialization value for the "Train" type. |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
TrainEval()
Creates a new instance, using a default extension and the standard configuration. |
|
TrainEval(String outExt)
Creates a new instance, using the standard configuration. |
|
TrainEval(String outExt,
float trainingSplit,
float testingSplit,
int tuneRuns,
int tuneStopAfter,
boolean measureEachTUNE,
int startMeasureTUNE,
List tuneEvalList,
int sentenceTUNE,
boolean giveFeedback,
TiesConfiguration config)
Creates a new instance. |
|
TrainEval(String outExt,
TiesConfiguration config)
Creates a new instance. |
Method Summary | |
---|---|
void |
close(int errorCount)
Closes this instance, releasing all resources and stopping any background threads. |
protected void |
doProcess(Reader reader,
Writer writer,
ContextMap context)
Processes the contents of a reader, writing a modified version to a writer. |
float |
getTestSplit()
Returns the percentage of a corpus to use for testing (evaluation). |
float |
getTrainSplit()
Returns the percentage of a corpus to use for training; the remaining documents (1-x) are used for evaluation. |
protected Extractor |
initExtractor(Trainer trainer)
Creates and initializes a extractor to use for an evaluation run, re-using the components of the provided trainer. |
protected Trainer |
initTrainer(File runDirectory)
Creates and initializes a trainer to use for an evaluation run, configured from the stored
configuration . |
String |
toString()
Returns a string representation of this object. |
TrainEval.Results |
trainAndEval(String[] files,
File inDirectory,
File outDirectory,
String baseName,
Writer writer)
Processes an array of files. |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CONFIG_TRAIN_SPLIT
public static final String CONFIG_TEST_SPLIT
public static final String CONFIG_FEEDBACK
true
, a fully incremental setup is
used where the trainer is trained on each document after the extractor
processed it.
public static final String CONFIG_TUNE
public static final String CONFIG_TUNE_STOP
public static final String CONFIG_SENTENCE_TUNE
CONFIG_TUNE
is used.
public static final String CONFIG_TUNE_EACH
public static final String CONFIG_TUNE_SINCE
CONFIG_TUNE_EACH
is enabled.
public static final String KEY_ITERATION
public static final String KEY_RUN
public static final String KEY_TYPE
public static final String TYPE_TRAIN
public static final String TYPE_EVAL
Constructor Detail |
---|
public TrainEval() throws IllegalArgumentException, ClassCastException, NoSuchElementException
IllegalArgumentException
- if the configured values are outside the
allowed ranges
ClassCastException
- if the configured numeric values cannot be
parsed
NoSuchElementException
- if one of the required values is
missing from the configurationpublic TrainEval(String outExt) throws IllegalArgumentException, ClassCastException, NoSuchElementException
outExt
- the extension to use for output files
IllegalArgumentException
- if the configured values are outside the
allowed ranges
ClassCastException
- if the configured numeric values cannot be
parsed
NoSuchElementException
- if one of the required values is
missing from the configurationpublic TrainEval(String outExt, TiesConfiguration config) throws IllegalArgumentException, ClassCastException, NoSuchElementException
outExt
- the extension to use for output filesconfig
- used to configure this instance
IllegalArgumentException
- if the configured values are outside the
allowed ranges
ClassCastException
- if the configured numeric values cannot be
parsed
NoSuchElementException
- if one of the required values is
missing from the configurationpublic TrainEval(String outExt, float trainingSplit, float testingSplit, int tuneRuns, int tuneStopAfter, boolean measureEachTUNE, int startMeasureTUNE, List tuneEvalList, int sentenceTUNE, boolean giveFeedback, TiesConfiguration config) throws IllegalArgumentException
outExt
- the extension to use for output filestrainingSplit
- the percentage of a corpus to use for trainingtestingSplit
- the percentage of a corpus to use for testing
(evaluation); if -1
, all remaining documents (1 -
trainingSplit
) are usedtuneRuns
- the maximum number of iterations used for TUNE
(train until no error) training; if 1, training is incrementaltuneStopAfter
- TUNE training is stopped if the training accuracy
didn't improve for the specified number of iterations.measureEachTUNE
- whether to measure results after each TUNE
iteration or only at the end of trainingstartMeasureTUNE
- he training iteration after which to evaluate
results for the first time if measureEachTUNE
is enabled
(ignored otherwise)sentenceTUNE
- the maximum number of iterations used for TUNE
training the sentence classifier (if used); if 0 or negative, the value
of tuneRuns
is usedtuneEvalList
- A list of Integers or int Strings specifying
iterations after which to evaluate TUNE training in addition to the last
one; ignored if measureEachTUNE
is true
giveFeedback
- if true
, a fully incremental setup is
used where the trainer is trained on each document after the extractor
processed it; it's not allowed to set this both this and
measureEachTUNE
to true
when training for
several tuneRuns
because that would mean to evaluate on the
training setconfig
- used to configure superclasses, trainer, and extractor;
if null
, the standard
configuration is used
IllegalArgumentException
- if trainingSplit
is not
a percentage (larger than 1 or smaller than 0) or if
tuneRuns
is non-positiveMethod Detail |
---|
public void close(int errorCount) throws IOException, ProcessingException
close
in interface Closeable
errorCount
- the number of errors (exceptions) that occurred during
calls to this instance (0 if none)
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing any
remaining inputprotected void doProcess(Reader reader, Writer writer, ContextMap context) throws IOException, ProcessingException
doProcess
in class TextProcessor
reader
- reader containing the text to process; should not be closed
by this methodwriter
- the writer to write the processed text to; might be flushed
but not closed by this method; if this method does not use the writer,
the underlying file will be deleted afterwardscontext
- a map of objects that are made available for processing;
when called from the implemented process
methods in this
class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET
to the character set of the output writer; from
ContentType.KEY_MIME_TYPE
to the document's MIME type; from
TextProcessor.KEY_LOCAL_NAME
to the local name (String) and either from
TextProcessor.KEY_DIRECTORY
to the directory (File
), in case of a
local file) or from TextProcessor.KEY_URL
to the URL
(otherwise) of
the processed document
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic float getTestSplit()
getTrainSplit()
) are used for
evaluationpublic float getTrainSplit()
protected Extractor initExtractor(Trainer trainer)
trainer
- trainer whose components should be re-used
protected Trainer initTrainer(File runDirectory) throws ProcessingException
stored
configuration
. Subclasses can overwrite this method to provide a
different trainer.
runDirectory
- directory used to run the classifier
ProcessingException
- if an error occurs during initializationpublic String toString()
toString
in class TextProcessor
public TrainEval.Results trainAndEval(String[] files, File inDirectory, File outDirectory, String baseName, Writer writer) throws IOException, ProcessingException
files
- the array of file names to process (relative to the
inDirectory
)inDirectory
- directory containing the files to processoutDirectory
- directory used to do this run and store the resultsbaseName
- the base name of the files to use for storing
all extractions and training statisticswriter
- used to serialize the calculated metrics
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |