|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.DocumentReader
de.fu_berlin.ties.extract.ExtractorBase
de.fu_berlin.ties.extract.Trainer
public class Trainer
A trainer trains a local Classifier
to be used for extraction.
Instances of this class are not thread-safe and cannot handle training on several documents in parallel.
Field Summary | |
---|---|
static String |
CONFIG_TEST_ONLY
Configuration key determining whether the trainer only ensures that all answer keys exist and can be located in the document instead of doing any training. |
static String |
CONFIG_TOE
Configuration key for determining the training mode ( isTrainingOnlyErrors() ). |
static String |
PREFIX_GLOBAL_ACC
Prefix used for serializing the global (overall) accuracy. |
static String |
PREFIX_LOCAL_ACC
Prefix used for serializing the local (document-specific) accuracy. |
Fields inherited from class de.fu_berlin.ties.extract.ExtractorBase |
---|
CONFIG_AVOID, CONFIG_ELEMENTS, CONFIG_RELEVANT_PUNCTUATION, CONFIG_SENTENCE |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
Trainer()
Creates a new instance without specifying an output extension (which isn't needed anyway, because this class doesn't produce output). |
|
Trainer(String outExt)
Creates a new instance. |
|
Trainer(String outExt,
File runDirectory,
TiesConfiguration config)
Creates a new instance. |
|
Trainer(String outExt,
TargetStructure targetStruct,
TrainableClassifier[] theClassifiers,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory,
TrainableFilter sentFilter)
Creates a new instance, using the standard configuration to configure the
training mode and the superclasses. |
|
Trainer(String outExt,
TargetStructure targetStruct,
TrainableClassifier[] theClassifiers,
Representation theRepresentation,
CombinationStrategy combiStrat,
TokenizerFactory tFactory,
TrainableFilter sentFilter,
Set<String> relevantPunct,
boolean trainOnlyErrors,
boolean testOnly,
TiesConfiguration config)
Creates a new instance. |
|
Trainer(String outExt,
TiesConfiguration config)
Creates a new instance. |
Method Summary | |
---|---|
protected FilteringTokenWalker |
createFilteringTokenWalker(TrainableFilter repFilter)
Creates a filtering token walker to be used for walking through a document and sentence classification if a double classification approach is used. |
void |
disableSentenceTraining()
Disables training the embedded sentence filter, if sentence filtering is used. |
void |
enableSentenceTraining()
Re-enables training the embedded filter, if sentence filtering is used. |
FMetricsView |
evaluateSentenceFiltering()
Evaluates precision and recall for sentence filtering on the last processed document. |
boolean |
isTestingOnly()
If true the trainer only ensures that all answer keys exist
and can be located in the document instead of doing any training. |
boolean |
isTrainingOnlyErrors()
Whether to train only errors (TOE mode, recommmended) or to train all instances (brute-force mode). |
void |
process(Document document,
Writer writer,
ContextMap context)
Trains the local classifier with the correct extractions of an XML document, using the provided context representation. |
void |
processToken(Element element,
String left,
TokenDetails details,
String right,
ContextMap context)
Processes a token in an XML element, optionally modifying the element or the document it is part of. |
void |
reset()
Resets the internal classifer, completely deleting the prediction model. |
void |
resetGlobalAccuracy()
Resets the global (overall) accuracies measured so far by each classifier. |
protected void |
resetStrategy()
Reset the combination strategy, logging a warning if it tells me to discard the last extraction. |
boolean |
shouldMatch(Element element)
Decides whether an element should be accepted by filters. |
String |
toString()
Returns a string representation of this object. |
Accuracy[] |
train(Document document,
ExtractionContainer correctExtractions)
Trains the local classifier with the correct extractions of an XML document, using the provided context representation. |
AccuracyView[] |
viewGlobalAccuracy()
Returns a view on the global (overall) accuracies measured so far (or after the last call to resetGlobalAccuracy() ) by
each classifier. |
Methods inherited from class de.fu_berlin.ties.extract.ExtractorBase |
---|
createSentenceFilter, evaluateSentenceFiltering, getActiveClasses, getClassifiers, getFactory, getFeatureCount, getFeatures, getPriorRecognitions, getRepresentation, getSentenceFilter, getStrategy, getTargetStructure, getWalker, initFields, isRelevant, isSentenceFiltering, markRelevant, skip, updateState, viewFeatureCount, viewRelevantPunctuation |
Methods inherited from class de.fu_berlin.ties.DocumentReader |
---|
doProcess |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CONFIG_TOE
isTrainingOnlyErrors()
).
public static final String CONFIG_TEST_ONLY
public static final String PREFIX_GLOBAL_ACC
public static final String PREFIX_LOCAL_ACC
Constructor Detail |
---|
public Trainer() throws IllegalArgumentException, ProcessingException
Trainer(String)
using a dummy
extension.
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt) throws IllegalArgumentException, ProcessingException
Trainer(String, TiesConfiguration)
using the
standard configuration.
outExt
- the extension to use for output files
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
Trainer(String, File, TiesConfiguration)
constructor without
specifying a runDirectory
.
outExt
- the extension to use for output filesconfig
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt, File runDirectory, TiesConfiguration config) throws IllegalArgumentException, ProcessingException
isTrainingOnlyErrors()
) to the value of the CONFIG_TOE
configuration key in the provided configuration and delegates to the
corresponding super constructor
to configure the fields.
outExt
- the extension to use for output filesrunDirectory
- the directory to run the classifier in; used instead
of the
configured directory if not null
config
- the configuration to use
IllegalArgumentException
- if the combination strategy cannot be
initialized
(cf. CombinationStrategy.createStrategy(java.util.Set,
TiesConfiguration)
)
ProcessingException
- if an error occurs during initializationpublic Trainer(String outExt, TargetStructure targetStruct, TrainableClassifier[] theClassifiers, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, TrainableFilter sentFilter)
standard configuration
to configure the
training mode and the superclasses.
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifiers
- the array of classifiers to traintheRepresentation
- the context representation to use trainingcombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizerssentFilter
- the filter used in the first step of a double
classification approach ("sentence filtering"); if null
,
no sentence filtering is usedpublic Trainer(String outExt, TargetStructure targetStruct, TrainableClassifier[] theClassifiers, Representation theRepresentation, CombinationStrategy combiStrat, TokenizerFactory tFactory, TrainableFilter sentFilter, Set<String> relevantPunct, boolean trainOnlyErrors, boolean testOnly, TiesConfiguration config)
outExt
- the extension to use for output filestargetStruct
- the target structure specifying the classes to
recognizetheClassifiers
- the array of classifiers to traintheRepresentation
- the context representation to use trainingcombiStrat
- the combination strategy to usetFactory
- used to instantiate tokenizerssentFilter
- the filter used in the first step of a double
classification approach ("sentence filtering"); if null
,
no sentence filtering is usedrelevantPunct
- a set of punctuation tokens that have been found to
be relevant for token classification; might be empty but not
null
trainOnlyErrors
- whether to train only errors (TOE mode,
recommmended) or to train all instances (brute-force mode)testOnly
- if true
the trainer only ensures that all
answer keys exist and can be located in the document instead of doing
any trainingconfig
- used to configure superclasses; if null
,
the standard configuration is usedMethod Detail |
---|
protected FilteringTokenWalker createFilteringTokenWalker(TrainableFilter repFilter)
createFilteringTokenWalker
in class ExtractorBase
repFilter
- the trainable filter to use
public void disableSentenceTraining()
public void enableSentenceTraining()
public FMetricsView evaluateSentenceFiltering()
null
if sentence filtering is disabledpublic boolean isTestingOnly()
true
the trainer only ensures that all answer keys exist
and can be located in the document instead of doing any training.
public boolean isTrainingOnlyErrors()
public void process(Document document, Writer writer, ContextMap context) throws IOException, ProcessingException
writer
. The answer keys must be
in a corresponding file ending in AnswerBuilder.EXT_ANSWERS
in
the same directory (when processing a local file) or in the current
working directory (when processin an URL).
process
in class DocumentReader
document
- the document to readwriter
- ignored by this methodcontext
- a map of objects that are made available for processing
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void processToken(Element element, String left, TokenDetails details, String right, ContextMap context) throws ProcessingException
processToken
in interface TokenProcessor
element
- the element containing the tokenleft
- the textual contents of the element to the left of the
token
(in case of mixed contents, only up to the last
preceding child element, if any)details
- details about the token to processright
- the textual contents of the element to the right of the
token
(in case of mixed contents, only up to the next
following child element, if any)context
- a map of objects that are made available for processing
ProcessingException
- if an error occurs during processingpublic void reset() throws ProcessingException
ProcessingException
- if an error occurs during resetpublic void resetGlobalAccuracy()
protected void resetStrategy()
resetStrategy
in class ExtractorBase
public boolean shouldMatch(Element element)
shouldMatch
in interface Oracle
element
- the element to test
true
if filters should accept the element;
false
otherwisepublic String toString()
toString
in class ExtractorBase
public Accuracy[] train(Document document, ExtractionContainer correctExtractions) throws IOException, ProcessingException
document
- a document whose contents should be classifiedcorrectExtractions
- a container of all correct extractions for the
document
null
otherwise
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic AccuracyView[] viewGlobalAccuracy()
resetGlobalAccuracy()
) by
each classifier. This is not a snapshot but will change whenever the
underlying values are changed.
null
otherwise
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |