|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.DocumentReader
de.fu_berlin.ties.extract.ExtractorBase
de.fu_berlin.ties.extract.Trainer
A trainer trains a local Classifier
to be used for extraction.
Instances of this class are not thread-safe and cannot handle training on several documents in parallel.
Field Summary | |
static String |
CONFIG_TEST_ONLY
Configuration key determining whether the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training. |
static String |
CONFIG_TOE
Configuration key for determining the training mode ( isTrainingOnlyErrors() ). |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
Trainer()
Creates a new instance without specifying an output extension (which isn't needed anyway, because this class doesn't produce output). |
|
Trainer(String outExt)
Creates a new instance. |
|
Trainer(String outExt,
File runDirectory,
TiesConfiguration config)
Creates a new instance. |
|
Trainer(String outExt,
TargetStructure targetStruct,
TrainableClassifier theClassifier,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory)
Creates a new instance, using the standard configuration to configure the
training mode and the superclasses. |
|
Trainer(String outExt,
TargetStructure targetStruct,
TrainableClassifier theClassifier,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory,
boolean trainOnlyErrors,
boolean testOnly,
TiesConfiguration config)
Creates a new instance. |
|
Trainer(String outExt,
TiesConfiguration config)
Creates a new instance. |
Method Summary | |
boolean |
isTestingOnly()
If true the trainer only ensures that all answer keys exist
and can be located in the document instead of doing any training. |
boolean |
isTrainingOnlyErrors()
Whether to train only errors (TOE mode, recommmended) or to train all instances (brute-force mode). |
void |
process(Document document,
Writer writer,
ContextMap context)
Trains the local classifier with the correct extractions of an XML document, using the provided context representation. |
void |
processToken(Element element,
String left,
String token,
String right,
int tokenRep,
boolean whitespaceBefore,
ContextMap context)
Trains the local classifier on the features of a token in an XML document. |
String |
toString()
Returns a string representation of this object. |
Accuracy |
train(Document document,
ExtractionContainer correctExtractions)
Trains the local classifier with the correct extractions of an XML document, using the provided context representation. |
Methods inherited from class de.fu_berlin.ties.extract.ExtractorBase |
getActiveClasses, getClassifier, getFactory, getFeatureCount, getFeatures, getPriorRecognitions, getRepresentation, getStrategy, getTargetStructure, initFields, updateState, viewFeatureCount |
Methods inherited from class de.fu_berlin.ties.DocumentReader |
doProcess |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
getOutFileExt, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
getConfig |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final String CONFIG_TOE
isTrainingOnlyErrors()
).
public static final String CONFIG_TEST_ONLY
Constructor Detail |
public Trainer() throws IllegalArgumentException, ProcessingException
Trainer(String, TiesConfiguration)
using the
standard configuration and a dummy
extension.
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt) throws IllegalArgumentException, ProcessingException
Trainer(String, TiesConfiguration)
using the
standard configuration.
outExt
- the extension to use for output files
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
Trainer(String, File, TiesConfiguration)
constructor without
specifying a runDirectory
.
outExt
- the extension to use for output filesconfig
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt, File runDirectory, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
isTrainingOnlyErrors()
) to the value of the CONFIG_TOE
configuration key in the provided configuration and delegates to the
corresponding super constructor
to configure the fields.
outExt
- the extension to use for output filesrunDirectory
- the directory to run the classifier in; used instead
of the
configured directory if not null
config
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(Set, TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt, TargetStructure targetStruct, TrainableClassifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory)
standard configuration
to configure the
training mode and the superclasses.
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifier
- the classifier to traintheRepresentation
- the context representation to use trainingcombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizerspublic Trainer(String outExt, TargetStructure targetStruct, TrainableClassifier theClassifier, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, boolean trainOnlyErrors, boolean testOnly, TiesConfiguration config)
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifier
- the classifier to traintheRepresentation
- the context representation to use trainingcombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizers.trainOnlyErrors
- whether to train only errors (TOE mode,
recommmended) or to train all instances (brute-force mode)testOnly
- if true
the trainer only ensures that all
answer keys exist and can be located in the document instead of doing
any trainingconfig
- used to configure superclasses; if null
,
the standard configuration is usedMethod Detail |
public boolean isTestingOnly()
true
the trainer only ensures that all answer keys exist
and can be located in the document instead of doing any training.
public boolean isTrainingOnlyErrors()
public void process(Document document, Writer writer, ContextMap context) throws IOException, ProcessingException
writer
. The answer keys must be
in a corresponding file ending in AnswerBuilder.EXT_ANSWERS
in
the same directory (when processing a local file) or in the current
working directory (when processin an URL).
process
in class DocumentReader
document
- the document to readwriter
- ignored by this methodcontext
- a map of objects that are made available for processing
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void processToken(Element element, String left, String token, String right, int tokenRep, boolean whitespaceBefore, ContextMap context) throws ProcessingException
processToken
in interface TokenProcessor
element
- the element containing the tokenleft
- the textual contents of the element to the left of the
token
(in case of mixed contents, only up to the last
preceding child element, if any)token
- the token to processright
- the textual contents of the element to the right of the
token
(in case of mixed contents, only up to the next
following child element, if any)tokenRep
- the repetition of the token
in the document
(counting starts with 0, as the first occurrence is the "0th
repetition").whitespaceBefore
- whether there is whitespace before the main
token
(either at the end of left
or in the
preceding element)context
- a map of objects that are made available for processing;
ignored by this method
ProcessingException
- if an error occurs during processingpublic String toString()
toString
in class ExtractorBase
public Accuracy train(Document document, ExtractionContainer correctExtractions) throws IOException, ProcessingException
document
- a document whose contents should be classifiedcorrectExtractions
- a container of all correct extractions for the
document
null
otherwise
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processing
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |