|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.classify.TrainableClassifier
public abstract class TrainableClassifier
Classifiers extending this abstract class must provide a training mechanism
by implementing the doTrain(FeatureVector, String, ContextMap)
method. This class supports error-driven learning ("train only errors")
which often leads to better prediction models than brute-force training.
The code in this class is thread-safe.
Field Summary | |
---|---|
(package private) static QName |
ATTRIB_CLASSES
Attribute name used for XML serialization. |
(package private) static QName |
ATTRIB_TRAIN_ALL
Attribute name used for XML serialization. |
static QName |
ELEMENT_MAIN
Name of the main element used for XML serialization. |
static String |
META_CLASSIFIER
Flag used to load the MetaClassifier . |
static String |
MULTI_CLASSIFIER
Flag used to load the MultiBinaryClassifier . |
static String |
OAR_CLASSIFIER
Flag used to load the OneAgainstTheRestClassifier . |
static String |
TIE_CLASSIFIER
Flag used to load the TieClassifier . |
Fields inherited from interface de.fu_berlin.ties.classify.Classifier |
---|
CONFIG_CLASSIFIER |
Constructor Summary | |
---|---|
TrainableClassifier(Element element)
Creates a new instance from an XML element, fulfilling the recommandation of the XMLStorable interface. |
|
TrainableClassifier(Set<String> allValidClasses,
FeatureTransformer trans,
boolean trainAll,
TiesConfiguration conf)
Creates a new instance. |
|
TrainableClassifier(Set<String> allValidClasses,
FeatureTransformer trans,
TiesConfiguration conf)
Creates a new instance. |
Method Summary | |
---|---|
PredictionDistribution |
classify(FeatureVector features,
Set candidateClasses)
Classifies an item that is represented by a feature vector by choosing the most probable class among a set of candidate classes. |
static TrainableClassifier |
createClassifier(Set<String> allValidClasses)
Factory method that delegates to createClassifier(Set, TiesConfiguration) using the
standard configuration. |
static TrainableClassifier |
createClassifier(Set<String> allValidClasses,
File runDirectory,
FeatureTransformer trans,
String[] spec,
TiesConfiguration conf)
Factory method that creates a trainable classifier based on the provided specification. |
static TrainableClassifier |
createClassifier(Set<String> allValidClasses,
File runDirectory,
TiesConfiguration conf,
String suffix)
Factory method that delegates to createClassifier(Set, File, FeatureTransformer, String[],
TiesConfiguration) . |
static TrainableClassifier |
createClassifier(Set<String> allValidClasses,
TiesConfiguration conf)
Factory method that delegates to createClassifier(Set, TiesConfiguration, String)
without specifying a suffix. |
static TrainableClassifier |
createClassifier(Set<String> allValidClasses,
TiesConfiguration conf,
String suffix)
Factory method that delegates to createClassifier(Set, File, TiesConfiguration, String)
without specifying an run directory. |
void |
destroy()
Destroys the classifer. |
protected abstract PredictionDistribution |
doClassify(FeatureVector features,
Set candidateClasses,
ContextMap context)
Classifies an item that is represented by a feature vector by choosing the most probable class among a set of candidate classes. |
protected abstract void |
doTrain(FeatureVector features,
String targetClass,
ContextMap context)
Incorporates an item that is represented by a feature vector into the classification model. |
protected boolean |
doTrainOnError(PredictionDistribution predDist,
FeatureVector features,
String targetClass,
Set candidateClasses,
ContextMap context)
The core of the trainOnError(FeatureVector, String, Set) method. |
Set<String> |
getAllClasses()
Returns the set of all valid classes. |
TiesConfiguration |
getConfig()
Returns the configuration used by this instance. |
abstract void |
reset()
Resets the classifer, completely deleting the prediction model. |
protected boolean |
shouldTrain(String targetClass,
PredictionDistribution predDist,
ContextMap context)
Invoked by trainOnError(FeatureVector, String, Set) to decide
whether to train an instance. |
ObjectElement |
toElement()
Stores all relevant fields of this object in an XML element for serialization. Subclasses of TrainableClassifier should extend this method and
the corresponding constructor from Element to
ensure (de)serialization works as expected. |
String |
toString()
Returns a string representation of this object. |
void |
train(FeatureVector features,
String targetClass)
Incorporates an item that is represented by a feature vector into the classification model. |
PredictionDistribution |
trainOnError(FeatureVector features,
String targetClass,
Set candidateClasses)
Handles error-driven learning ("train only errors"): the specified feature vector is trained into the model only if the predicted class for the feature vector differs from the specified target class. |
protected boolean |
trainOnErrorHook(PredictionDistribution predDist,
FeatureVector features,
String targetClass,
Set candidateClasses,
ContextMap context)
Subclasses can implement this hook for more refined error-driven learning. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final QName ELEMENT_MAIN
static final QName ATTRIB_CLASSES
static final QName ATTRIB_TRAIN_ALL
public static final String META_CLASSIFIER
MetaClassifier
.
public static final String MULTI_CLASSIFIER
MultiBinaryClassifier
.
public static final String OAR_CLASSIFIER
OneAgainstTheRestClassifier
.
public static final String TIE_CLASSIFIER
TieClassifier
.
Constructor Detail |
---|
public TrainableClassifier(Element element) throws InstantiationException
XMLStorable
interface.
element
- the XML element containing the serialized representation
InstantiationException
- if the given element does not contain
a valid classifier descriptionpublic TrainableClassifier(Set<String> allValidClasses, FeatureTransformer trans, TiesConfiguration conf)
allValidClasses
- the set of all valid classestrans
- the last transformer in the transformer chain to use, or
null
if no feature transformers should be usedconf
- used to configure this instancepublic TrainableClassifier(Set<String> allValidClasses, FeatureTransformer trans, boolean trainAll, TiesConfiguration conf)
allValidClasses
- the set of all valid classes; all class names
must be printable names
trans
- the last transformer in the transformer chain to use, or
null
if no feature transformers should be usedtrainAll
- set to true
iff the classifier should
consider all classes for error-driven training, not only the candidate
classes (results are filtered to the candidate classes prior to
returning them)conf
- used to configure this instanceMethod Detail |
---|
public static TrainableClassifier createClassifier(Set<String> allValidClasses) throws IllegalArgumentException, ProcessingException
createClassifier(Set, TiesConfiguration)
using the
standard configuration.
allValidClasses
- the set of all valid classes
IllegalArgumentException
- if the value of the
Classifier.CONFIG_CLASSIFIER
key is missing or invalid
ProcessingException
- if an error occurred while creating the
classifierpublic static TrainableClassifier createClassifier(Set<String> allValidClasses, TiesConfiguration conf) throws IllegalArgumentException, ProcessingException
createClassifier(Set, TiesConfiguration, String)
without specifying a suffix.
allValidClasses
- the set of all valid classesconf
- the configuration to use
IllegalArgumentException
- if the value of the
Classifier.CONFIG_CLASSIFIER
key is missing or invalid
ProcessingException
- if an error occurred while creating the
classifierpublic static TrainableClassifier createClassifier(Set<String> allValidClasses, TiesConfiguration conf, String suffix) throws IllegalArgumentException, ProcessingException
createClassifier(Set, File, TiesConfiguration, String)
without specifying an run directory.
allValidClasses
- the set of all valid classesconf
- the configuration to usesuffix
- an optional
suffix that is
appended to the Classifier.CONFIG_CLASSIFIER
key if not
null
IllegalArgumentException
- if the value of the
Classifier.CONFIG_CLASSIFIER
key is missing or invalid
ProcessingException
- if an error occurred while creating the
classifierpublic static TrainableClassifier createClassifier(Set<String> allValidClasses, File runDirectory, TiesConfiguration conf, String suffix) throws IllegalArgumentException, ProcessingException
createClassifier(Set, File, FeatureTransformer, String[],
TiesConfiguration)
. It reads the specification of the classifier from
the Classifier.CONFIG_CLASSIFIER
key in the provided configuration. It
calls FeatureTransformer.createTransformer(TiesConfiguration)
to
initialize a transformer chain, if configured.
allValidClasses
- the set of all valid classesrunDirectory
- the directory to run the classifier in; used for
ExternalClassifier
instead of the
configured directory
if not null
; ignored otherwiseconf
- the configuration to usesuffix
- an optional
suffix that is
appended to the Classifier.CONFIG_CLASSIFIER
key if not
null
IllegalArgumentException
- if the value of the
Classifier.CONFIG_CLASSIFIER
key is missing or invalid
ProcessingException
- if an error occurred while creating the
classifierpublic static TrainableClassifier createClassifier(Set<String> allValidClasses, File runDirectory, FeatureTransformer trans, String[] spec, TiesConfiguration conf) throws IllegalArgumentException, ProcessingException
Currently supported values in the first element of the specification:
ExternalClassifier
Winnow
UltraconservativeWinnow
MoonClassifier
TieClassifier
resp.
MetaClassifier
MultiBinaryClassifier
resp.
OneAgainstTheRestClassifier
(if there are only two classes
to classify, the outer classifer is skipped and the inner classifier is
used directly).
Otherwise the first element must be the qualified name of a
TrainableClassifier subclass accepting a Set
(of all valid class
names) as first argument, a FeatureTransformer
as second argument
and a TiesConfiguration
as third argument.
allValidClasses
- the set of all valid classesrunDirectory
- the directory to run the classifier in; used for
ExternalClassifier
instead of the
configured directory
if not null
; ignored otherwisetrans
- the last transformer in the transformer chain to use, or
null
if no feature transformers should be usedspec
- the specification used to initialize the classifier, as
described aboveconf
- passed to the created classifier to configure itself
IllegalArgumentException
- if the value of the
Classifier.CONFIG_CLASSIFIER
key is missing or invalid
ProcessingException
- if an error occurred while creating the
classifierpublic final PredictionDistribution classify(FeatureVector features, Set candidateClasses) throws IllegalArgumentException, ProcessingException
doClassify(FeatureVector, Set, ContextMap)
method.
classify
in interface Classifier
features
- the feature vector to considercandidateClasses
- an set of classes that are allowed for this item
PredictionDistribution.best()
to get the most probably class
IllegalArgumentException
- if the
set of valid classes does not contain all
candidate classes
ProcessingException
- if an error occurs during classificationpublic void destroy() throws ProcessingException
reset()
, but subclasses can overwrite this behaviour if
appropriate.
destroy
in interface Classifier
ProcessingException
- if an error occurs while the classifier is
being destroyedprotected abstract PredictionDistribution doClassify(FeatureVector features, Set candidateClasses, ContextMap context) throws ProcessingException
features
- the feature vector to considercandidateClasses
- an set of classes that are allowed for this itemcontext
- can be used to transport implementation-specific
contextual information between the
doClassify(FeatureVector, Set, ContextMap)
,
doTrain(FeatureVector, String, ContextMap)
, and
trainOnErrorHook(PredictionDistribution, FeatureVector, String,
Set, ContextMap)
methods
PredictionDistribution.best()
to get the most probably class
ProcessingException
- if an error occurs during classificationprotected abstract void doTrain(FeatureVector features, String targetClass, ContextMap context) throws ProcessingException
features
- the feature vector to considertargetClass
- the class of this feature vectorcontext
- can be used to transport implementation-specific
contextual information between the
doClassify(FeatureVector, Set, ContextMap)
,
doTrain(FeatureVector, String, ContextMap)
, and
trainOnErrorHook(PredictionDistribution, FeatureVector, String,
Set, ContextMap)
methods
ProcessingException
- if an error occurs during trainingprotected boolean doTrainOnError(PredictionDistribution predDist, FeatureVector features, String targetClass, Set candidateClasses, ContextMap context) throws ProcessingException
trainOnError(FeatureVector, String, Set)
method.
Generally there is no need for subclasses to modify this method.
predDist
- the prediction distribution returned by
classify(FeatureVector, Set)
features
- the feature vector to considertargetClass
- the expected class of this feature vector; must be
contained in the set of candidateClasses
candidateClasses
- an set of classes that are allowed for this item
(the actual targetClass
must be one of them)context
- can be used to transport implementation-specific
contextual information between the
doClassify(FeatureVector, Set, ContextMap)
,
doTrain(FeatureVector, String, ContextMap)
, and
trainOnErrorHook(PredictionDistribution, FeatureVector, String,
Set, ContextMap)
methods
shouldTrain(String,
PredictionDistribution, ContextMap)
method
ProcessingException
- if an error occurs during trainingpublic Set<String> getAllClasses()
public TiesConfiguration getConfig()
public abstract void reset() throws ProcessingException
ProcessingException
- if an error occurs during resetprotected boolean shouldTrain(String targetClass, PredictionDistribution predDist, ContextMap context)
trainOnError(FeatureVector, String, Set)
to decide
whether to train an instance. The default behavior is to train if the
best prediction was wrong or didn't yield a positive probability
("train only errors"). Subclasses can override this method to
add their own behavior, e.g. reinforcement training (thick threshold
heuristic).
targetClass
- the expected class of this feature vector; must be
contained in the set of candidateClasses
predDist
- the prediction distribution returned by
doClassify(FeatureVector, Set, ContextMap)
context
- can be used to transport implementation-specific
contextual information between the
doClassify(FeatureVector, Set, ContextMap)
,
doTrain(FeatureVector, String, ContextMap)
, and
trainOnErrorHook(PredictionDistribution, FeatureVector, String,
Set, ContextMap)
methods
public ObjectElement toElement()
ObjectElement.createObject(org.dom4j.Element,
Class)
on the created element.
Subclasses of TrainableClassifier
should extend this method and
the corresponding constructor from Element
to
ensure (de)serialization works as expected.
toElement
in interface XMLStorable
public String toString()
toString
in class Object
public final void train(FeatureVector features, String targetClass) throws IllegalArgumentException, ProcessingException
doTrain(FeatureVector, String, ContextMap)
method.
features
- the feature vector to considertargetClass
- the class of this feature vector
IllegalArgumentException
- if the target class is not in the
set of valid classes
ProcessingException
- if an error occurs during trainingpublic final PredictionDistribution trainOnError(FeatureVector features, String targetClass, Set candidateClasses) throws ProcessingException
features
- the feature vector to considertargetClass
- the expected class of this feature vector; must be
contained in the set of candidateClasses
candidateClasses
- an set of classes that are allowed for this item
(the actual targetClass
must be one of them)
null
if no
training was necessary (the prediction was already correct)
ProcessingException
- if an error occurs during trainingprotected boolean trainOnErrorHook(PredictionDistribution predDist, FeatureVector features, String targetClass, Set candidateClasses, ContextMap context) throws ProcessingException
trainOnError(FeatureVector, String, Set)
method after
classifying. This method can do any necessary training itself and
return true
to signal that no further action is necessary.
This implementation is just a placeholder that always returns
false
.
predDist
- the prediction distribution returned by
classify(FeatureVector, Set)
features
- the feature vector to considertargetClass
- the expected class of this feature vector; must be
contained in the set of candidateClasses
candidateClasses
- an set of classes that are allowed for this item
(the actual targetClass
must be one of them)context
- can be used to transport implementation-specific
contextual information between the
doClassify(FeatureVector, Set, ContextMap)
,
doTrain(FeatureVector, String, ContextMap)
, and
trainOnErrorHook(PredictionDistribution, FeatureVector, String,
Set, ContextMap)
methods
false
; subclasses
can return true
to signal that any error-driven learning was
already handled
ProcessingException
- if an error occurs during training
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |