de.fu_berlin.ties.classify.winnow
Class Winnow

java.lang.Object
  extended byde.fu_berlin.ties.classify.TrainableClassifier
      extended byde.fu_berlin.ties.classify.winnow.Winnow
All Implemented Interfaces:
Classifier
Direct Known Subclasses:
UltraconservativeWinnow

public class Winnow
extends TrainableClassifier

Classifier implementing the Winnow algorithm (Nick Littlestone). Winnow supports only error-driven training, so you always have to use the TrainableClassifier.trainOnError(FeatureVector, String, Set) method. Trying to call the TrainableClassifier.train(FeatureVector, String) method instead will result in an UnsupportedOperationException.

Instances of this class are thread-safe.

Version:
$Revision: 1.16 $, $Date: 2004/04/13 08:00:13 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
 
Fields inherited from interface de.fu_berlin.ties.classify.Classifier
CONFIG_CLASSIFIER
 
Constructor Summary
Winnow(Set allValidClasses)
          Creates a new instance based on the standard configuration.
Winnow(Set allValidClasses, FeatureTransformer trans, boolean balance, float promotionFactor, float demotionFactor, float thresholdThick, int featureNum)
          Creates a new instance.
Winnow(Set allValidClasses, FeatureTransformer trans, TiesConfiguration config)
          Creates a new instance based on the provided configuration.
Winnow(Set allValidClasses, TiesConfiguration config)
          Creates a new instance based on the provided configuration.
 
Method Summary
protected  void adjustWeights(Feature feature, short[] directions)
          Adjusts the weights of a feature for all classes.
protected  void chooseClassesToAdjust(WinnowDistribution winnowDist, String targetClass, Set classesToPromote, Set classesToDemote)
          Chooses the classes to promote and the classes to demote.
protected  double confidence(float sigmoid, float sum)
          Converts a sigmoid activation value into a confidence estimate.
protected  float defaultWeight()
          Returns the default weight to use if a feature is unknown.
protected  PredictionDistribution doClassify(FeatureVector features, Set candidateClasses)
          Classifies an item that is represented by a feature vector by choosing the most probable class among a set of candidate classes.
protected  void doTrain(FeatureVector features, String targetClass)
          Winnow supports only error-driven training, so you always have to use the TrainableClassifier.trainOnError(FeatureVector, String, Set) method instead of this one.
protected  FeatureSet featureSet(FeatureVector fv)
          Converts a feature vector into a FeatureSet (a multi-set of features).
 float getDemotion()
          Returns the promotion factor used by the algorithm.
 float getPromotion()
          Returns the demotion factor used by the algorithm.
 float getThresholdThickness()
          Returns the thickness of the threshold if the "thick threshold" heuristic is used.
protected  float[] initScores()
          Initializes the score (activation values) to use for all classes.
protected  float initWeight()
          Returns the initial weight to use for each feature per class.
protected  float[] initWeightArray()
          Returns the initial weight array to use for a feature for all classes.
 boolean isBalanced()
          Whether the Balanced Winnow or the standard Winnow algorithm is used.
protected  float majorThreshold(float threshold, float rawThreshold)
          Calculates the major theshold (theta-) to use for classification with the "thick threshold" heuristic.
protected  float minorThreshold(float threshold, float rawThreshold)
          Calculates the minor theshold (theta-) to use for classification with the "thick threshold" heuristic.
protected  float rawThreshold(FeatureSet features)
          Calculates the theshold (theta) to use for classification, based on the number of active features.
protected  float sigmoid(float score, float threshold, float rawThreshold)
          Converts the raw score (activation value) to a value in the range from 0 to 1 via a sigmoid function depending on the threshold theta.
protected  float threshold(float rawThreshold)
          Calculates the theshold (theta) to use for classification.
 String toString()
          Returns a string representation of this object.
protected  boolean trainOnErrorHook(PredictionDistribution predDist, FeatureVector features, String targetClass, Set candidateClasses)
          Hook implementing error-driven learning, promoting and demoting weights as required.
protected  void updateScores(Feature feature, float[] scores)
          Updates the score (activation values) for all classes by adding the weights of a feature.
 
Methods inherited from class de.fu_berlin.ties.classify.TrainableClassifier
classify, createClassifier, createClassifier, createClassifier, getAllClasses, train, trainOnError
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Winnow

public Winnow(Set allValidClasses)
       throws IllegalArgumentException
Creates a new instance based on the standard configuration.

Parameters:
allValidClasses - the set of all valid classes
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range

Winnow

public Winnow(Set allValidClasses,
              TiesConfiguration config)
       throws IllegalArgumentException
Creates a new instance based on the provided configuration.

Parameters:
allValidClasses - the set of all valid classes
config - contains configuration properties
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range

Winnow

public Winnow(Set allValidClasses,
              FeatureTransformer trans,
              TiesConfiguration config)
       throws IllegalArgumentException
Creates a new instance based on the provided configuration.

Parameters:
allValidClasses - the set of all valid classes
trans - the last transformer in the transformer chain to use, or null if no feature transformers should be used
config - contains configuration properties
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range

Winnow

public Winnow(Set allValidClasses,
              FeatureTransformer trans,
              boolean balance,
              float promotionFactor,
              float demotionFactor,
              float thresholdThick,
              int featureNum)
       throws IllegalArgumentException
Creates a new instance.

Parameters:
allValidClasses - the set of all valid classes
trans - the last transformer in the transformer chain to use, or null if no feature transformers should be used
balance - whether to use the Balanced Winnow or the standard Winnow algorithm
promotionFactor - the promotion factor used by the algorithm; must be > 1.0
demotionFactor - the demotion factor used by the algorithm; must be < 1.0
thresholdThick - the thickness of the threshold if the "thick threshold" heuristic is used (must be < 1.0), 0.0 otherwise
featureNum - the number of features to store
Throws:
IllegalArgumentException - if one of the parameters is outside the allowed range
Method Detail

adjustWeights

protected void adjustWeights(Feature feature,
                             short[] directions)
Adjusts the weights of a feature for all classes. This method should be called in a synchronized context.

Parameters:
feature - the feature to process
directions - an array specifying for each class (in alphabetic order) whether it should be promoted (positive value), demoted (negative value) or left unmodified (0)

chooseClassesToAdjust

protected void chooseClassesToAdjust(WinnowDistribution winnowDist,
                                     String targetClass,
                                     Set classesToPromote,
                                     Set classesToDemote)
Chooses the classes to promote and the classes to demote. This class chooses the targetClass for promotion if its score is less or equal to the threshold. It chooses all other classes for demotion if their score is greather than the threshold.

Parameters:
winnowDist - the prediction distribution returned by TrainableClassifier.classify(FeatureVector, Set)
targetClass - the expected class of this instance; must be contained in the set of candidateClasses
classesToPromote - the classes to promote are added to this set
classesToDemote - the classes to demote are added to this set

confidence

protected double confidence(float sigmoid,
                            float sum)
Converts a sigmoid activation value into a confidence estimate.

Parameters:
sigmoid - the sigmoid activation value to convert
sum - the sum of all sigmoid activation values
Returns:
the estimated confidence: sigmoid / sum

defaultWeight

protected float defaultWeight()
Returns the default weight to use if a feature is unknown. This implementation returns 0.0 in case of Balanced Winnow (where positive and negative weights should cancel each other out), initWeight() otherwise.

Returns:
the default weight

doClassify

protected PredictionDistribution doClassify(FeatureVector features,
                                            Set candidateClasses)
Classifies an item that is represented by a feature vector by choosing the most probable class among a set of candidate classes.

Specified by:
doClassify in class TrainableClassifier
Parameters:
features - the feature vector to consider
candidateClasses - an set of classes that are allowed for this item
Returns:
the result of the classification; you can call PredictionDistribution.best() to get the most probably class

doTrain

protected void doTrain(FeatureVector features,
                       String targetClass)
                throws UnsupportedOperationException
Winnow supports only error-driven training, so you always have to use the TrainableClassifier.trainOnError(FeatureVector, String, Set) method instead of this one. Trying to call this method instead will result in an UnsupportedOperationException.

Specified by:
doTrain in class TrainableClassifier
Parameters:
features - ignored by this method
targetClass - ignored by this method
Throws:
UnsupportedOperationException - always thrown by this method; use TrainableClassifier.trainOnError(FeatureVector, String, Set) instead

featureSet

protected FeatureSet featureSet(FeatureVector fv)
Converts a feature vector into a FeatureSet (a multi-set of features). If the provided vector already is a FeatureSet instance, it is casted and returned. Otherwise a new FeatureSet with the same contents is created

Parameters:
fv - the feature vector to convert
Returns:
a feature set with the same contents as the provided vector

getDemotion

public float getDemotion()
Returns the promotion factor used by the algorithm.

Returns:
the value of the attribute

getPromotion

public float getPromotion()
Returns the demotion factor used by the algorithm.

Returns:
the value of the attribute

isBalanced

public boolean isBalanced()
Whether the Balanced Winnow or the standard Winnow algorithm is used. Balanced Winnow keeps two weights per feature and class, a positive and a negative one.

Returns:
the value of the attribute

initScores

protected float[] initScores()
Initializes the score (activation values) to use for all classes.

Returns:
an array of floats containing the initial score for each class; the value of each float will be 0.0

getThresholdThickness

public float getThresholdThickness()
Returns the thickness of the threshold if the "thick threshold" heuristic is used.

Returns:
the value of the attribute, will be < 1.0; 0.0 if no thick threshold is used

initWeight

protected float initWeight()
Returns the initial weight to use for each feature per class. This implementation returns 1.0.

Returns:
the initial weight

initWeightArray

protected float[] initWeightArray()
Returns the initial weight array to use for a feature for all classes. The array returns by this implementation fill contain one weight for each class in case of normal Winnow, two weights in case of Balanced Winnow. Each element is initialized to initWeight().

Returns:
the initial weight array

majorThreshold

protected float majorThreshold(float threshold,
                               float rawThreshold)
Calculates the major theshold (theta-) to use for classification with the "thick threshold" heuristic. This implementation multiplies thetar with the threshold thickness and adds the result to theta. Subclasses can overwrite this method to calculate the major theshold in a different way.

Parameters:
threshold - the threshold theta
rawThreshold - the raw threshold thetar
Returns:
the major theshold (theta-) to use for classification
See Also:
minorThreshold(float, float)

minorThreshold

protected float minorThreshold(float threshold,
                               float rawThreshold)
Calculates the minor theshold (theta-) to use for classification with the "thick threshold" heuristic. This implementation multiplies thetar with the threshold thickness and subtracts the result from theta. Subclasses can overwrite this method to calculate the minor theshold in a different way.

Parameters:
threshold - the threshold theta
rawThreshold - the raw threshold thetar
Returns:
the minor theshold (theta-) to use for classification
See Also:
majorThreshold(float, float)

rawThreshold

protected float rawThreshold(FeatureSet features)
Calculates the theshold (theta) to use for classification, based on the number of active features. This implementation returns the number of features. Subclasses can overwrite this method to calculate the theshold in a different way.

Parameters:
features - the feature set to consider
Returns:
the raw theshold (theta) to use

sigmoid

protected float sigmoid(float score,
                        float threshold,
                        float rawThreshold)
                 throws IllegalArgumentException
Converts the raw score (activation value) to a value in the range from 0 to 1 via a sigmoid function depending on the threshold theta. In this implementation this is calculed as follows:

Parameters:
score - the raw score (activation value); must be a positive value in case of normal (non-balanced) Winnow
threshold - the threshold theta used for this instance
rawThreshold - the raw threshold thetar used for this instance
Returns:
the sigmoid score calculated as described above; will be in range from 0 to 1
Throws:
IllegalArgumentException - if normal Winnow is used and score <= 0

threshold

protected float threshold(float rawThreshold)
Calculates the theshold (theta) to use for classification. This implementation returns the rawThreshold multiplied with the default weight. Subclasses can overwrite this method to calculate the theshold in a different way.

Parameters:
rawThreshold - the raw threshold
Returns:
the theshold (theta) to use for classification

trainOnErrorHook

protected boolean trainOnErrorHook(PredictionDistribution predDist,
                                   FeatureVector features,
                                   String targetClass,
                                   Set candidateClasses)
                            throws ProcessingException
Hook implementing error-driven learning, promoting and demoting weights as required.

Overrides:
trainOnErrorHook in class TrainableClassifier
Parameters:
predDist - the prediction distribution returned by TrainableClassifier.classify(FeatureVector, Set); must be a WinnowDistribution
features - the feature vector to consider
targetClass - the expected class of this feature vector; must be contained in the set of candidateClasses
candidateClasses - an set of classes that are allowed for this item (the actual targetClass must be one of them)
Returns:
this implementation always returns true to signal that any error-driven learning was already handled
Throws:
ProcessingException - if an error occurs during training

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TrainableClassifier
Returns:
a textual representation

updateScores

protected void updateScores(Feature feature,
                            float[] scores)
Updates the score (activation values) for all classes by adding the weights of a feature. This method should be called in a synchronized context.

Parameters:
feature - the feature to process
scores - an array of floats containing the scores for each class; will be updated by this method


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.