de.fu_berlin.ties.classify
Class Tuner

java.lang.Object
  extended by de.fu_berlin.ties.classify.Tuner

public class Tuner
extends Object

This class provides support for iterative training, also called TUNE (Train-until-no-errors) training.

Instances of this class are not thread-safe and must be synchronized externally, if required.

Version:
$Revision: 1.9 $, $Date: 2006/10/21 16:03:55 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CONFIG_SPLIT_SEPARATPR
          Configuration key: If given, the specified string is used to separate the training from the testing section of the corpus (e.g. "---") and the train split and test split values are ignored.
static String CONFIG_TEST_SPLIT
          Configuration key: The percentage of a corpus to use for testing (evaluation).
static String CONFIG_TRAIN_SPLIT
          Configuration key: The percentage of a corpus to use for training.
static String CONFIG_TUNE
          Configuration key: The maximum number of iterations used for TUNE (train until no error) training; if 1, training is incremental.
static String CONFIG_TUNE_EACH
          Configuration key: Whether to measure results after each TUNE iteration or only at the end of training.
static String CONFIG_TUNE_SINCE
          Configuration key: The training iteration after which to evaluate results for the first time if CONFIG_TUNE_EACH is enabled.
static String CONFIG_TUNE_STOP
          Configuration key: TUNE training is stopped if the training accuracy didn't improve for the specified number of iterations.
 
Constructor Summary
Tuner(float trainingSplit, float testingSplit, String splitSep, int tuneRuns, int tuneStopAfter, boolean measureEachTUNE, int startMeasureTUNE, List tuneEvalList)
          Creates a new instance.
Tuner(TiesConfiguration config, String suffix)
          Creates a new instance.
 
Method Summary
 boolean continueTraining(double[] currentAcc, int i)
          Whether to continue TUNE training after finishing an iteration.
 String getSplitSeparator()
          If not null, the returned string should be used to separate the training from the testing section of the corpus (e.g. "---") and the train split and test split values should be ignored.
 float getTestSplit()
          Returns the percentage of a corpus to use for testing; if -1, all remaining documents should be used.
 float getTrainSplit()
          Returns the percentage of a corpus to use for training.
 Set<Integer> getTuneEvaluations()
          Returns the set of iterations after which to evaluate TUNE training in addition to the last one; should be ignored if isTuneEach() is true.
 int getTuneIterations()
          Returns the maximum number of iterations used for TUNE (train until no error) training; if 1, training is incremental.
 int getTuneSince()
          Returns the training iteration after which to evaluate results for the first time if isTuneEach() is enabled.
 int getTuneStop()
          Returns the TUNE stopping criterion: TUNE training should be stopped if the training accuracy didn't improve for the specified number of iterations.
 boolean isTuneEach()
          Whether to measure results after each TUNE iteration or only at the end of training.
 void reset()
          Resets the state of this instance.
 void selectFiles(String[] allFiles, List<String> trainFiles, List<String> evalFiles)
          Chooses files to use for training and files to use for evaluation, depending on the configured settings.
 boolean shouldEvaluate(boolean continueTraining, int i)
          Whether to evaluate results after this TUNE iteration.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_SPLIT_SEPARATPR

public static final String CONFIG_SPLIT_SEPARATPR
Configuration key: If given, the specified string is used to separate the training from the testing section of the corpus (e.g. "---") and the train split and test split values are ignored.

See Also:
Constant Field Values

CONFIG_TRAIN_SPLIT

public static final String CONFIG_TRAIN_SPLIT
Configuration key: The percentage of a corpus to use for training.

See Also:
Constant Field Values

CONFIG_TEST_SPLIT

public static final String CONFIG_TEST_SPLIT
Configuration key: The percentage of a corpus to use for testing (evaluation).

See Also:
Constant Field Values

CONFIG_TUNE

public static final String CONFIG_TUNE
Configuration key: The maximum number of iterations used for TUNE (train until no error) training; if 1, training is incremental.

See Also:
Constant Field Values

CONFIG_TUNE_STOP

public static final String CONFIG_TUNE_STOP
Configuration key: TUNE training is stopped if the training accuracy didn't improve for the specified number of iterations.

See Also:
Constant Field Values

CONFIG_TUNE_EACH

public static final String CONFIG_TUNE_EACH
Configuration key: Whether to measure results after each TUNE iteration or only at the end of training.

See Also:
Constant Field Values

CONFIG_TUNE_SINCE

public static final String CONFIG_TUNE_SINCE
Configuration key: The training iteration after which to evaluate results for the first time if CONFIG_TUNE_EACH is enabled.

See Also:
Constant Field Values
Constructor Detail

Tuner

public Tuner(TiesConfiguration config,
             String suffix)
Creates a new instance.

Parameters:
config - used to configure this instance
suffix - an optional suffix used to adapt configuration keys; might be null

Tuner

public Tuner(float trainingSplit,
             float testingSplit,
             String splitSep,
             int tuneRuns,
             int tuneStopAfter,
             boolean measureEachTUNE,
             int startMeasureTUNE,
             List tuneEvalList)
      throws IllegalArgumentException
Creates a new instance.

Parameters:
trainingSplit - the percentage of a corpus to use for training
testingSplit - the percentage of a corpus to use for testing (evaluation); if -1, all remaining documents (1 - trainingSplit) are used
splitSep - if not null, the specified string is used to separate the training from the testing section of the corpus and the train split and test split values are ignored.
tuneRuns - the maximum number of iterations used for TUNE (train until no error) training; if 1, training is incremental
tuneStopAfter - TUNE training is stopped if the training accuracy didn't improve for the specified number of iterations.
measureEachTUNE - whether to measure results after each TUNE iteration or only at the end of training
startMeasureTUNE - he training iteration after which to evaluate results for the first time if measureEachTUNE is enabled (ignored otherwise)
tuneEvalList - A list of Integers or int Strings specifying iterations after which to evaluate TUNE training in addition to the last one; ignored if measureEachTUNE is true
Throws:
IllegalArgumentException - if trainingSplit is not a percentage (larger than 1 or smaller than 0) or if tuneRuns is non-positive
Method Detail

continueTraining

public boolean continueTraining(double[] currentAcc,
                                int i)
Whether to continue TUNE training after finishing an iteration.

Parameters:
currentAcc - the list of accuracies for the just finished TUNE iteration
i - the number of the just finished TUNE iterations, counting starts with 1 not with 0
Returns:
whether to continue TUNE training

getTestSplit

public float getTestSplit()
Returns the percentage of a corpus to use for testing; if -1, all remaining documents should be used.

Returns:
the value of the attribute

getTrainSplit

public float getTrainSplit()
Returns the percentage of a corpus to use for training.

Returns:
the value of the attribute

getTuneEvaluations

public Set<Integer> getTuneEvaluations()
Returns the set of iterations after which to evaluate TUNE training in addition to the last one; should be ignored if isTuneEach() is true.

Returns:
the value of the attribute

getSplitSeparator

public String getSplitSeparator()
If not null, the returned string should be used to separate the training from the testing section of the corpus (e.g. "---") and the train split and test split values should be ignored.

Returns:
the value of the attribute

getTuneIterations

public int getTuneIterations()
Returns the maximum number of iterations used for TUNE (train until no error) training; if 1, training is incremental. Note that iterations should be indexed from 1 to X (this number) instead of from 0 to X-1 for compatibility with getTuneEvaluations().

Returns:
the value of the attribute

getTuneSince

public int getTuneSince()
Returns the training iteration after which to evaluate results for the first time if isTuneEach() is enabled.

Returns:
the value of the attribute

getTuneStop

public int getTuneStop()
Returns the TUNE stopping criterion: TUNE training should be stopped if the training accuracy didn't improve for the specified number of iterations.

Returns:
the value of the attribute

isTuneEach

public boolean isTuneEach()
Whether to measure results after each TUNE iteration or only at the end of training.

Returns:
the value of the attribute

reset

public void reset()
Resets the state of this instance. This method must be called before starting to TUNE train a set of instances after finishing training another set.


selectFiles

public void selectFiles(String[] allFiles,
                        List<String> trainFiles,
                        List<String> evalFiles)
                 throws IllegalArgumentException
Chooses files to use for training and files to use for evaluation, depending on the configured settings.

Parameters:
allFiles - the array of file names to process
trainFiles - populated with the files to use for training, will be populated with the first getTrainSplit() * allFiles.length files; must initially be empty
evalFiles - populated with the files to use for evaluation, will be populated from the next getTestSplit() * allFiles.length remaining files (or all remaining files if test split is negative); must initially be empty
Throws:
IllegalArgumentException - if the lists aren't empty

shouldEvaluate

public boolean shouldEvaluate(boolean continueTraining,
                              int i)
Whether to evaluate results after this TUNE iteration.

Parameters:
continueTraining - the result returned by the preceding call to continueTraining(double[], int)
i - the number of the just finished TUNE iterations, counting starts with 1 not with 0
Returns:
whether to evaluate

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.