de.fu_berlin.ties.classify.winnow
Class Winnow

java.lang.Object
  extended by de.fu_berlin.ties.classify.TrainableClassifier
      extended by de.fu_berlin.ties.classify.winnow.Winnow
All Implemented Interfaces:
Classifier, XMLStorable
Direct Known Subclasses:
UltraconservativeWinnow

public class Winnow
extends TrainableClassifier

Classifier implementing the Winnow algorithm (Nick Littlestone). Winnow supports only error-driven training, so you always have to use the TrainableClassifier.trainOnError(FeatureVector, String, Set) method. Trying to call the TrainableClassifier.train(FeatureVector, String) method instead will result in an UnsupportedOperationException.

Instances of this class are thread-safe.

Version:
$Revision: 1.80 $, $Date: 2006/10/21 16:03:59 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
 
Fields inherited from class de.fu_berlin.ties.classify.TrainableClassifier
ELEMENT_MAIN, META_CLASSIFIER, MULTI_CLASSIFIER, OAR_CLASSIFIER, TIE_CLASSIFIER
 
Fields inherited from interface de.fu_berlin.ties.classify.Classifier
CONFIG_CLASSIFIER
 
Constructor Summary
  Winnow(Element element)
          Creates a new instance from an XML element, fulfilling the recommandation of the XMLStorable interface.
  Winnow(Set<String> allValidClasses)
          Creates a new instance based on the standard configuration.
  Winnow(Set<String> allValidClasses, FeatureTransformer trans, boolean balance, float promotionFactor, float demotionFactor, float thresholdThick, int ignoreExp, TiesConfiguration config, String configSuffix)
          Creates a new instance.
  Winnow(Set<String> allValidClasses, FeatureTransformer trans, TiesConfiguration config)
          Creates a new instance based on the provided configuration.
protected Winnow(Set<String> allValidClasses, FeatureTransformer trans, TiesConfiguration config, String configSuffix)
          Creates a new instance based on the provided configuration.
protected Winnow(Set<String> allValidClasses, String configSuffix)
          Creates a new instance based on the standard configuration.
  Winnow(Set<String> allValidClasses, TiesConfiguration config)
          Creates a new instance based on the provided configuration.
protected Winnow(Set<String> allValidClasses, TiesConfiguration config, String configSuffix)
          Creates a new instance based on the provided configuration.
 
Method Summary
protected  void adjustWeights(Feature feature, short[] directions)
          Adjusts the weights of a feature for all classes.
protected  boolean checkRelevance(float[] weights)
          Checks whether a feature is relevant for classification.
protected  void chooseClassesToAdjust(WinnowDistribution winnowDist, String targetClass, Set<String> classesToPromote, Set<String> classesToDemote)
          Chooses the classes to promote and the classes to demote.
protected  double confidence(float normalized, float sum)
          Converts a normalized activation value into a confidence estimate.
protected  float defaultWeight()
          Returns the default weight to use if a feature is unknown.
 void destroy()
          Destroys the classifer.
protected  PredictionDistribution doClassify(FeatureVector features, Set candidateClasses, ContextMap context)
          Classifies an item that is represented by a feature vector by choosing the most probable class among a set of candidate classes.
protected  void doTrain(FeatureVector features, String targetClass, ContextMap context)
          Winnow supports only error-driven training, so you always have to use the TrainableClassifier.trainOnError(FeatureVector, String, Set) method instead of this one.
protected  FeatureSet featureSet(FeatureVector fv)
          Converts a feature vector into a FeatureSet (a multi-set of features).
 float getDemotion()
          Returns the promotion factor used by the algorithm.
 float getPromotion()
          Returns the demotion factor used by the algorithm.
 float getThresholdThickness()
          Returns the thickness of the threshold if the "thick threshold" heuristic is used.
protected  float[] initScores()
          Initializes the score (activation values) to use for all classes.
protected  float initWeight()
          Returns the initial weight to use for each feature per class.
protected  float[] initWeightArray()
          Returns the initial weight array to use for a feature for all classes.
 boolean isBalanced()
          Whether the Balanced Winnow or the standard Winnow algorithm is used.
protected  float majorThreshold(float threshold, float rawThreshold)
          Calculates the major theshold (theta+) to use for classification with the "thick threshold" heuristic.
protected  float minorThreshold(float threshold, float rawThreshold)
          Calculates the minor theshold (theta-) to use for classification with the "thick threshold" heuristic.
protected  float normalizeScore(float score, float threshold, float rawThreshold)
          Converts the raw score (activation value) to a normalized value depending on the threshold theta.
protected  float rawThreshold(FeatureSet features)
          Calculates the theshold (theta) to use for classification, based on the number of active features.
 void reset()
          Resets the classifer, completely deleting the prediction model.
 Map<String,List<Float>> showFeatureWeights(FeatureVector features)
          Returns a mapping from feature representations to weights.
protected  float threshold(float rawThreshold)
          Calculates the theshold (theta) to use for classification.
 ObjectElement toElement()
          Stores all relevant fields of this object in an XML element for serialization. An equivalent object can be created by calling ObjectElement.createObject(org.dom4j.Element, Class) on the created element. Subclasses of TrainableClassifier should extend this method and the corresponding constructor from Element to ensure (de)serialization works as expected.
 String toString()
          Returns a string representation of this object.
protected  boolean trainOnErrorHook(PredictionDistribution predDist, FeatureVector features, String targetClass, Set candidateClasses, ContextMap context)
          Hook implementing error-driven learning, promoting and demoting weights as required.
protected  void updateScores(Feature feature, float[] scores)
          Updates the score (activation values) for all classes by adding the weights of a feature.
 
Methods inherited from class de.fu_berlin.ties.classify.TrainableClassifier
classify, createClassifier, createClassifier, createClassifier, createClassifier, createClassifier, doTrainOnError, getAllClasses, getConfig, shouldTrain, train, trainOnError
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Winnow

public Winnow(Element element)
       throws InstantiationException
Creates a new instance from an XML element, fulfilling the recommandation of the XMLStorable interface.

Parameters:
element - the XML element containing the serialized representation
Throws:
InstantiationException - if the given element does not contain a valid classifier description

Winnow

public Winnow(Set<String> allValidClasses)
       throws IllegalArgumentException,
              ProcessingException
Creates a new instance based on the standard configuration.

Parameters:
allValidClasses - the set of all valid classes
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
ProcessingException - if an error occurred while creating the feature transformer(s)

Winnow

protected Winnow(Set<String> allValidClasses,
                 String configSuffix)
          throws IllegalArgumentException,
                 ProcessingException
Creates a new instance based on the standard configuration.

Parameters:
allValidClasses - the set of all valid classes
configSuffix - optional suffix appended to the configuration keys when configuring this instance; might be null
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
ProcessingException - if an error occurred while creating the feature transformer(s)

Winnow

public Winnow(Set<String> allValidClasses,
              TiesConfiguration config)
       throws IllegalArgumentException,
              ProcessingException
Creates a new instance based on the provided configuration.

Parameters:
allValidClasses - the set of all valid classes
config - contains configuration properties
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
ProcessingException - if an error occurred while creating the feature transformer(s)

Winnow

protected Winnow(Set<String> allValidClasses,
                 TiesConfiguration config,
                 String configSuffix)
          throws IllegalArgumentException,
                 ProcessingException
Creates a new instance based on the provided configuration.

Parameters:
allValidClasses - the set of all valid classes
config - contains configuration properties
configSuffix - optional suffix appended to the configuration keys when configuring this instance; might be null
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
ProcessingException - if an error occurred while creating the feature transformer(s)

Winnow

public Winnow(Set<String> allValidClasses,
              FeatureTransformer trans,
              TiesConfiguration config)
       throws IllegalArgumentException,
              ProcessingException
Creates a new instance based on the provided configuration.

Parameters:
allValidClasses - the set of all valid classes
trans - the last transformer in the transformer chain to use, or null if no feature transformers should be used
config - contains configuration properties
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
ProcessingException - if an error occurred while creating the feature transformer(s)

Winnow

protected Winnow(Set<String> allValidClasses,
                 FeatureTransformer trans,
                 TiesConfiguration config,
                 String configSuffix)
          throws IllegalArgumentException,
                 ProcessingException
Creates a new instance based on the provided configuration.

Parameters:
allValidClasses - the set of all valid classes
trans - the last transformer in the transformer chain to use, or null if no feature transformers should be used
config - contains configuration properties
configSuffix - optional suffix appended to the configuration keys when configuring this instance; might be null
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
ProcessingException - if an error occurred while creating the feature transformer(s)

Winnow

public Winnow(Set<String> allValidClasses,
              FeatureTransformer trans,
              boolean balance,
              float promotionFactor,
              float demotionFactor,
              float thresholdThick,
              int ignoreExp,
              TiesConfiguration config,
              String configSuffix)
       throws IllegalArgumentException
Creates a new instance.

Parameters:
allValidClasses - the set of all valid classes
trans - the last transformer in the transformer chain to use, or null if no feature transformers should be used
balance - whether to use the Balanced Winnow or the standard Winnow algorithm
promotionFactor - the promotion factor used by the algorithm; must be > 1.0
demotionFactor - the demotion factor used by the algorithm; must be < 1.0
thresholdThick - the thickness of the threshold if the "thick threshold" heuristic is used (must be < 1.0), 0.0 otherwise
ignoreExp - exponent used to calculate which features to consider irrelevant for classification (if any)
config - contains configuration properties
configSuffix - optional suffix appended to the configuration keys when configuring this instance; might be null
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
Method Detail

adjustWeights

protected void adjustWeights(Feature feature,
                             short[] directions)
Adjusts the weights of a feature for all classes. This method should be called in a synchronized context.

Parameters:
feature - the feature to process
directions - an array specifying for each class (in alphabetic order) whether it should be promoted (positive value), demoted (negative value) or left unmodified (0)

chooseClassesToAdjust

protected void chooseClassesToAdjust(WinnowDistribution winnowDist,
                                     String targetClass,
                                     Set<String> classesToPromote,
                                     Set<String> classesToDemote)
Chooses the classes to promote and the classes to demote. This class chooses the targetClass for promotion if its score is less or equal to the threshold. It chooses all other classes for demotion if their score is greather than the threshold.

Parameters:
winnowDist - the prediction distribution returned by TrainableClassifier.classify(FeatureVector, Set)
targetClass - the expected class of this instance; must be contained in the set of candidateClasses
classesToPromote - the classes to promote are added to this set
classesToDemote - the classes to demote are added to this set

confidence

protected double confidence(float normalized,
                            float sum)
Converts a normalized activation value into a confidence estimate.

Parameters:
normalized - the normalized activation value to convert
sum - the sum of all normalized activation values
Returns:
the estimated confidence: normalized / sum

checkRelevance

protected boolean checkRelevance(float[] weights)
Checks whether a feature is relevant for classification.

Parameters:
weights - the weights of the feature
Returns:
true iff the feature is relevant for classification;

defaultWeight

protected float defaultWeight()
Returns the default weight to use if a feature is unknown. This implementation returns 0.0 in case of Balanced Winnow (where positive and negative weights should cancel each other out), initWeight() otherwise.

Returns:
the default weight

destroy

public void destroy()
Destroys the classifer. This method must be called only if the classifier will never be used again. The default implementation delegates to TrainableClassifier.reset(), but subclasses can overwrite this behaviour if appropriate.

Specified by:
destroy in interface Classifier
Overrides:
destroy in class TrainableClassifier

doClassify

protected PredictionDistribution doClassify(FeatureVector features,
                                            Set candidateClasses,
                                            ContextMap context)
Classifies an item that is represented by a feature vector by choosing the most probable class among a set of candidate classes.

Specified by:
doClassify in class TrainableClassifier
Parameters:
features - the feature vector to consider
candidateClasses - an set of classes that are allowed for this item
context - can be used to transport implementation-specific contextual information between the TrainableClassifier.doClassify(FeatureVector, Set, ContextMap), TrainableClassifier.doTrain(FeatureVector, String, ContextMap), and TrainableClassifier.trainOnErrorHook(PredictionDistribution, FeatureVector, String, Set, ContextMap) methods
Returns:
the result of the classification; you can call PredictionDistribution.best() to get the most probably class

doTrain

protected void doTrain(FeatureVector features,
                       String targetClass,
                       ContextMap context)
                throws UnsupportedOperationException
Winnow supports only error-driven training, so you always have to use the TrainableClassifier.trainOnError(FeatureVector, String, Set) method instead of this one. Trying to call this method instead will result in an UnsupportedOperationException.

Specified by:
doTrain in class TrainableClassifier
Parameters:
features - ignored by this method
targetClass - ignored by this method
context - ignored by this method
Throws:
UnsupportedOperationException - always thrown by this method; use TrainableClassifier.trainOnError(FeatureVector, String, Set) instead

featureSet

protected FeatureSet featureSet(FeatureVector fv)
Converts a feature vector into a FeatureSet (a multi-set of features). If the last transformation of the provided vector already is a FeatureSet instance, it is casted and returned. Otherwise a new FeatureSet with the same contents is created, reading the used method for considering feature frequencies in strength values from the "classifier.winnow.strength.frequency" configuration key.

Parameters:
fv - the feature vector to convert
Returns:
a feature set with the same contents as the provided vector

getDemotion

public float getDemotion()
Returns the promotion factor used by the algorithm.

Returns:
the value of the attribute

getPromotion

public float getPromotion()
Returns the demotion factor used by the algorithm.

Returns:
the value of the attribute

isBalanced

public boolean isBalanced()
Whether the Balanced Winnow or the standard Winnow algorithm is used. Balanced Winnow keeps two weights per feature and class, a positive and a negative one.

Returns:
the value of the attribute

initScores

protected float[] initScores()
Initializes the score (activation values) to use for all classes.

Returns:
an array of floats containing the initial score for each class; the value of each float will be 0.0

getThresholdThickness

public float getThresholdThickness()
Returns the thickness of the threshold if the "thick threshold" heuristic is used.

Returns:
the value of the attribute, will be < 1.0; 0.0 if no thick threshold is used

initWeight

protected float initWeight()
Returns the initial weight to use for each feature per class. This implementation returns 1.0.

Returns:
the initial weight

initWeightArray

protected float[] initWeightArray()
Returns the initial weight array to use for a feature for all classes. The array returns by this implementation fill contain one weight for each class in case of normal Winnow, two weights in case of Balanced Winnow. Each element is initialized to initWeight().

Returns:
the initial weight array

majorThreshold

protected float majorThreshold(float threshold,
                               float rawThreshold)
Calculates the major theshold (theta+) to use for classification with the "thick threshold" heuristic. This implementation multiplies thetar with the threshold thickness and adds the result to theta. Subclasses can overwrite this method to calculate the major theshold in a different way.

Parameters:
threshold - the threshold theta
rawThreshold - the raw threshold thetar
Returns:
the major theshold (theta+) to use for classification
See Also:
minorThreshold(float, float)

minorThreshold

protected float minorThreshold(float threshold,
                               float rawThreshold)
Calculates the minor theshold (theta-) to use for classification with the "thick threshold" heuristic. This implementation multiplies thetar with the threshold thickness and subtracts the result from theta. Subclasses can overwrite this method to calculate the minor theshold in a different way.

Parameters:
threshold - the threshold theta
rawThreshold - the raw threshold thetar
Returns:
the minor theshold (theta-) to use for classification
See Also:
majorThreshold(float, float)

normalizeScore

protected float normalizeScore(float score,
                               float threshold,
                               float rawThreshold)
Converts the raw score (activation value) to a normalized value depending on the threshold theta. In this implementation this is calculed as follows:

norm(score, theta, thetar) = e^((score - theta) / thetar))

Parameters:
score - the raw score (activation value); must be a positive value in case of normal (non-balanced) Winnow
threshold - the threshold theta used for this instance
rawThreshold - the raw threshold thetar used for this instance
Returns:
the normalized score calculated as described above

rawThreshold

protected float rawThreshold(FeatureSet features)
Calculates the theshold (theta) to use for classification, based on the number of active features. This implementation returns the sum of all relevant features. Subclasses can overwrite this method to calculate the theshold in a different way.

Parameters:
features - the feature set to consider
Returns:
the raw theshold (theta) to use

reset

public void reset()
Resets the classifer, completely deleting the prediction model.

Specified by:
reset in class TrainableClassifier

showFeatureWeights

public Map<String,List<Float>> showFeatureWeights(FeatureVector features)
Returns a mapping from feature representations to weights. Features that are irrelevant or unknown (never seen during training) or contain only a comment are skipped. For each other feature, the returned map will contain an array of weighs for all classes stored in the order of TrainableClassifier.getAllClasses().

This method exists for debugging and demonstration purposes.

Parameters:
features - the feature vector to consider
Returns:
a mapping from known relevant feature representations to weights

threshold

protected float threshold(float rawThreshold)
Calculates the theshold (theta) to use for classification. This implementation returns the rawThreshold multiplied with the default weight. Subclasses can overwrite this method to calculate the theshold in a different way.

Parameters:
rawThreshold - the raw threshold
Returns:
the theshold (theta) to use for classification

trainOnErrorHook

protected boolean trainOnErrorHook(PredictionDistribution predDist,
                                   FeatureVector features,
                                   String targetClass,
                                   Set candidateClasses,
                                   ContextMap context)
                            throws ProcessingException
Hook implementing error-driven learning, promoting and demoting weights as required.

Overrides:
trainOnErrorHook in class TrainableClassifier
Parameters:
predDist - the prediction distribution returned by TrainableClassifier.classify(FeatureVector, Set); must be a WinnowDistribution
features - the feature vector to consider
targetClass - the expected class of this feature vector; must be contained in the set of candidateClasses
candidateClasses - an set of classes that are allowed for this item (the actual targetClass must be one of them)
context - ignored by this implementation
Returns:
this implementation always returns true to signal that any error-driven learning was already handled
Throws:
ProcessingException - if an error occurs during training

toElement

public ObjectElement toElement()
Stores all relevant fields of this object in an XML element for serialization. An equivalent object can be created by calling ObjectElement.createObject(org.dom4j.Element, Class) on the created element. Subclasses of TrainableClassifier should extend this method and the corresponding constructor from Element to ensure (de)serialization works as expected.

Specified by:
toElement in interface XMLStorable
Overrides:
toElement in class TrainableClassifier
Returns:
the created XML element

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TrainableClassifier
Returns:
a textual representation

updateScores

protected void updateScores(Feature feature,
                            float[] scores)
Updates the score (activation values) for all classes by adding the weights of a feature. This method should be called in a synchronized context.

Parameters:
feature - the feature to process
scores - an array of floats containing the scores for each class; will be updated by this method


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.