|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.extract.TrainEval
public class TrainEval
Trains an extractor and evaluates extraction quality. Processes shuffle
files (as generated by ShuffleGenerator
contain the files to use for training and evaluation.
For each of these files, a corresponding answer key (*.ans) must exist.
Instances of this class are not thread-safe.
Nested Class Summary | |
---|---|
static class |
TrainEval.Results
An inner class wrapping the results of a training + evaluation run. |
Field Summary | |
---|---|
static String |
CONFIG_FEEDBACK
Configuration key: If true , a fully incremental setup is
used where the trainer is trained on each document after the extractor
processed it. |
static String |
CONFIG_SENTENCE_TUNE
Configuration key: The maximum number of iterations used for TUNE training the sentence classifier; if 0 or negative, the value of Tuner.CONFIG_TUNE is used. |
static String |
EXT_METRICS
File extension used for metrics files. |
static String |
KEY_ITERATION
Serialization key for the number of the iteration (when TUNE training). |
static String |
KEY_RUN
Serialization key for the number of the run. |
static String |
KEY_TYPE
Serialization key for the type (either "Train" or "Eval"). |
static String |
TYPE_EVAL
Serialization value for the "Eval" type. |
static String |
TYPE_TRAIN
Serialization value for the "Train" type. |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
TrainEval()
Creates a new instance, using a default extension and the standard configuration. |
|
TrainEval(String outExt)
Creates a new instance, using the standard configuration. |
|
TrainEval(String outExt,
TiesConfiguration config)
Creates a new instance. |
|
TrainEval(String outExt,
Tuner theTuner,
int sentenceTUNE,
boolean giveFeedback,
boolean doEvaluate,
String predExt,
boolean doStorePredsInOutDir,
TiesConfiguration config)
Creates a new instance. |
Method Summary | |
---|---|
void |
close(int errorCount)
Closes this instance, releasing all resources and stopping any background threads. |
protected void |
doProcess(Reader reader,
Writer writer,
ContextMap context)
Processes the contents of a reader, writing a modified version to a writer. |
protected Extractor |
initExtractor(Trainer trainer)
Creates and initializes a extractor to use for an evaluation run, re-using the components of the provided trainer. |
protected Trainer |
initTrainer(File runDirectory)
Creates and initializes a trainer to use for an evaluation run, configured from the stored
configuration . |
String |
toString()
Returns a string representation of this object. |
TrainEval.Results |
trainAndEval(String[] files,
File inDirectory,
File outDirectory,
String baseName,
Writer writer)
Processes an array of files. |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CONFIG_FEEDBACK
true
, a fully incremental setup is
used where the trainer is trained on each document after the extractor
processed it.
public static final String CONFIG_SENTENCE_TUNE
Tuner.CONFIG_TUNE
is used.
public static final String KEY_ITERATION
public static final String KEY_RUN
public static final String KEY_TYPE
public static final String TYPE_TRAIN
public static final String TYPE_EVAL
public static final String EXT_METRICS
Constructor Detail |
---|
public TrainEval() throws IllegalArgumentException, ClassCastException, NoSuchElementException
IllegalArgumentException
- if the configured values are outside the
allowed ranges
ClassCastException
- if the configured numeric values cannot be
parsed
NoSuchElementException
- if one of the required values is
missing from the configurationpublic TrainEval(String outExt) throws IllegalArgumentException, ClassCastException, NoSuchElementException
outExt
- the extension to use for output files
IllegalArgumentException
- if the configured values are outside the
allowed ranges
ClassCastException
- if the configured numeric values cannot be
parsed
NoSuchElementException
- if one of the required values is
missing from the configurationpublic TrainEval(String outExt, TiesConfiguration config) throws IllegalArgumentException, ClassCastException, NoSuchElementException
outExt
- the extension to use for output filesconfig
- used to configure this instance
IllegalArgumentException
- if the configured values are outside the
allowed ranges
ClassCastException
- if the configured numeric values cannot be
parsed
NoSuchElementException
- if one of the required values is
missing from the configurationpublic TrainEval(String outExt, Tuner theTuner, int sentenceTUNE, boolean giveFeedback, boolean doEvaluate, String predExt, boolean doStorePredsInOutDir, TiesConfiguration config)
outExt
- the extension to use for output filestheTuner
- used for TUNE trainingsentenceTUNE
- the maximum number of iterations used for TUNE
training the sentence classifier (if used); if 0 or negative, the value
of Tuner.getTuneIterations()
is usedgiveFeedback
- if true
, a fully incremental setup is
used where the trainer is trained on each document after the extractor
processed it; it's not allowed to set this both this and
Tuner.isTuneEach()
to true
when training for
several TUNE iterations because that would mean to evaluate on the
training setdoEvaluate
- whether to evaluate predictions by comparing them to
answer keys, otherwise predictions are stored without evaluating thempredExt
- the extension used to stored predictions (if
doEvaluate
is set to false
)doStorePredsInOutDir
- whether to write prediction files to the
configured output directory or the the directory containing the input
fileconfig
- used to configure superclasses, trainer, and extractor;
if null
, the standard
configuration is usedMethod Detail |
---|
public void close(int errorCount) throws IOException, ProcessingException
close
in interface Closeable
errorCount
- the number of errors (exceptions) that occurred during
calls to this instance (0 if none)
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing any
remaining inputprotected void doProcess(Reader reader, Writer writer, ContextMap context) throws IOException, ProcessingException
doProcess
in class TextProcessor
reader
- reader containing the text to process; should not be closed
by this methodwriter
- the writer to write the processed text to; might be flushed
but not closed by this method; if this method does not use the writer,
the underlying file will be deleted afterwardscontext
- a map of objects that are made available for processing;
when called from the implemented process
methods in this
class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET
to the character set of the output writer;
from TextProcessor.KEY_OUT_DIRECTORY
to the output directory (File
);
from ContentType.KEY_MIME_TYPE
to the document's MIME type; from
TextProcessor.KEY_LOCAL_NAME
to the local name (String) and either from
TextProcessor.KEY_DIRECTORY
to the input directory (File
), in case of
a local file) or from TextProcessor.KEY_URL
to the URL
(otherwise) of
the processed document
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingprotected Extractor initExtractor(Trainer trainer)
trainer
- trainer whose components should be re-used
protected Trainer initTrainer(File runDirectory) throws ProcessingException
stored
configuration
. Subclasses can overwrite this method to provide a
different trainer.
runDirectory
- directory used to run the classifier
ProcessingException
- if an error occurs during initializationpublic String toString()
toString
in class TextProcessor
public TrainEval.Results trainAndEval(String[] files, File inDirectory, File outDirectory, String baseName, Writer writer) throws IOException, ProcessingException
files
- the array of file names to process (relative to the
inDirectory
)inDirectory
- directory containing the files to processoutDirectory
- directory used to do this run and store the resultsbaseName
- the base name of the files to use for storing
all extractions and training statisticswriter
- used to serialize the calculated metrics
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |