de.fu_berlin.ties.demo
Class SpamFilterDemo

java.lang.Object
  extended by de.fu_berlin.ties.demo.SpamFilterDemo

public class SpamFilterDemo
extends Object

Instances of this class can be used to demonstrate the how statistical spam filtering works. This class supports only the Winnow classifier and subclasses.

Version:
$Revision: 1.18 $, $Date: 2006/10/21 16:04:09 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CLASS_NONSPAM
          Name of the nonspam (ham) class: "nonspam".
static String CLASS_SPAM
          Name of the spam class: "spam".
 
Constructor Summary
SpamFilterDemo(File trainingSetFile, File testSetFile)
          Creates a new instance.
SpamFilterDemo(SampleMails myTrainingSet, SampleMails myTestSet)
          Creates a new instance.
SpamFilterDemo(String trainingSetFile, String testSetFile)
          Creates a new instance.
 
Method Summary
 FilterResult classify(String text)
          Classifies a text.
 void clearModel()
          Completely resets the internal classification model.
 SampleMails getTestSet()
          Returns the set of mails used for testing.
 SampleMails getTrainingSet()
          Returns the set of mails used for training.
static void main(String[] args)
          Main method for testing.
 void reloadModel()
          Reloads the inital state of the internal classification model.
 String toString()
          Returns a string representation of this object.
 void trainNonspam(String text)
          Trains a text as ham.
 void trainSpam(String text)
          Trains a text as spam.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CLASS_SPAM

public static final String CLASS_SPAM
Name of the spam class: "spam".

See Also:
Constant Field Values

CLASS_NONSPAM

public static final String CLASS_NONSPAM
Name of the nonspam (ham) class: "nonspam".

See Also:
Constant Field Values
Constructor Detail

SpamFilterDemo

public SpamFilterDemo(String trainingSetFile,
                      String testSetFile)
               throws IOException,
                      ProcessingException
Creates a new instance. The ZIP files must follow the SampleMails conventions.

Parameters:
trainingSetFile - a ZIP file containing the mails used for training
testSetFile - a ZIP file containing the mails used for testing
Throws:
IOException - if one of the files cannot be read or is not a valid ZIP file
ProcessingException - if an error occurs while initializing the classifier

SpamFilterDemo

public SpamFilterDemo(File trainingSetFile,
                      File testSetFile)
               throws IOException,
                      ProcessingException
Creates a new instance. The ZIP files must follow the SampleMails conventions.

Parameters:
trainingSetFile - a ZIP file containing the mails used for training
testSetFile - a ZIP file containing the mails used for testing
Throws:
IOException - if one of the files cannot be read or is not a valid ZIP file
ProcessingException - if an error occurs while initializing the classifier

SpamFilterDemo

public SpamFilterDemo(SampleMails myTrainingSet,
                      SampleMails myTestSet)
               throws ProcessingException,
                      IOException
Creates a new instance.

Parameters:
myTrainingSet - the set of mails used for training
myTestSet - the set of mails used for testing
Throws:
ProcessingException - if an error occurs while initializing the classifier
IOException - if an I/O error occurs
Method Detail

main

public static void main(String[] args)
                 throws IOException,
                        ProcessingException
Main method for testing.

Parameters:
args - the command-line arguments (ignored)
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error orrurs while processing the tasks

classify

public FilterResult classify(String text)
                      throws ProcessingException,
                             IOException
Classifies a text.

Parameters:
text - the text to train
Returns:
a FilterResult containing detailed results of the classification
Throws:
ProcessingException - if an error occurs during classification
IOException - if an I/O error occurs

clearModel

public void clearModel()
                throws ProcessingException
Completely resets the internal classification model. After a reset, the classifier will have no idea how "spam" or "nonspam" messages typically look like.

Throws:
ProcessingException - if an error occurs during reset

getTestSet

public SampleMails getTestSet()
Returns the set of mails used for testing.

Returns:
the value of the attribute

getTrainingSet

public SampleMails getTrainingSet()
Returns the set of mails used for training.

Returns:
the value of the attribute

reloadModel

public void reloadModel()
                 throws ProcessingException,
                        IOException
Reloads the inital state of the internal classification model. The model is cleared and then re-trained from the sample mails contained the training set (shuffled in pseudo-random order).

Throws:
ProcessingException - if an error occurs during reset
IOException - if an I/O error occurs

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation

trainNonspam

public void trainNonspam(String text)
                  throws ProcessingException,
                         IOException
Trains a text as ham.

Parameters:
text - the text to train
Throws:
ProcessingException - if an error occurs during training
IOException - if an I/O error occurs

trainSpam

public void trainSpam(String text)
               throws ProcessingException,
                      IOException
Trains a text as spam.

Parameters:
text - the text to train
Throws:
ProcessingException - if an error occurs during training
IOException - if an I/O error occurs


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.