de.fu_berlin.ties.classify.feature
Class SBPHTransformer

java.lang.Object
  extended by de.fu_berlin.ties.classify.feature.FeatureTransformer
      extended by de.fu_berlin.ties.classify.feature.SBPHTransformer
All Implemented Interfaces:
XMLStorable

public class SBPHTransformer
extends FeatureTransformer

Transforms a feature vector using a simple implementation of the sparse binary polynomial hashing (SBPH) technique introduced by CRM114. This transformer discard all comment-only features (indeed all comments). It slides of window of length N over the remaining original features. For each window position, it counts in binary from 1 to 2N. For each odd number, a joint feature is generated where original features at "1" positions are visible and original features at "0" positions are hidden. Separators prior to the first feature are discarded, but all inner separators are kept. E.g. if N=3 and a pipe character "|" is used as separator, from the original features "a", "b", "c", four joint features will be generated at the last position: "c" (binary 1=001), "b|c" (binary 3=011), "a||c" (binary 5=101), "a|b|c" (binary 7=111). Thus 2N-1 joint features are generated for each original (non-comment) feature (except for the very first features).

Instances of this class are thread-safe.

Version:
$Revision: 1.10 $, $Date: 2006/10/21 16:03:57 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String DEFAULT_SEPARATOR
          The separator used by default (a space character).
 
Fields inherited from class de.fu_berlin.ties.classify.feature.FeatureTransformer
CONFIG_TRANSFORMERS, ELEMENT_MAIN
 
Constructor Summary
SBPHTransformer(Element element)
          Creates a new instance from an XML element, fulfilling the recommandation of the XMLStorable interface.
SBPHTransformer(FeatureTransformer precTrans, int len, String sep)
          Creates a new instance.
SBPHTransformer(FeatureTransformer precTrans, TiesConfiguration config)
          Creates a new instance.
 
Method Summary
protected  FeatureVector doTransform(FeatureVector orgFeatures)
          Transforms a feature vector.
 int getLength()
          Returns the maximum number of original features joined.
 String getSeparator()
          Returns the string used to separate original features (by default a space character).
 ObjectElement toElement()
          Stores all relevant fields of this object in an XML element for serialization. An equivalent object can be created by calling ObjectElement.createObject(org.dom4j.Element, Class) on the created element.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class de.fu_berlin.ties.classify.feature.FeatureTransformer
createTransformer, createTransformer, getPrecedingTransformer, transform
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_SEPARATOR

public static final String DEFAULT_SEPARATOR
The separator used by default (a space character).

See Also:
Constant Field Values
Constructor Detail

SBPHTransformer

public SBPHTransformer(Element element)
                throws InstantiationException
Creates a new instance from an XML element, fulfilling the recommandation of the XMLStorable interface.

Parameters:
element - the XML element containing the serialized representation
Throws:
InstantiationException - if the given element does not contain a valid transformer description

SBPHTransformer

public SBPHTransformer(FeatureTransformer precTrans,
                       int len,
                       String sep)
Creates a new instance.

Parameters:
precTrans - the preceding transformer to use if this transformer is part of a chain; null otherwise
len - the maximum number of original features joined
sep - the string used to separate original features -- this string should never occur within original features

SBPHTransformer

public SBPHTransformer(FeatureTransformer precTrans,
                       TiesConfiguration config)
Creates a new instance.

Parameters:
precTrans - the preceding transformer to use if this transformer is part of a chain; null otherwise
config - used to configure this instance
Method Detail

doTransform

protected FeatureVector doTransform(FeatureVector orgFeatures)
Transforms a feature vector.

Specified by:
doTransform in class FeatureTransformer
Parameters:
orgFeatures - the original feature vector to transform
Returns:
a new feature vector containing the transformed features

getLength

public int getLength()
Returns the maximum number of original features joined.

Returns:
the value of the attribute

getSeparator

public String getSeparator()
Returns the string used to separate original features (by default a space character). This string should never occur within original features.

Returns:
the value of the attribute

toElement

public ObjectElement toElement()
Stores all relevant fields of this object in an XML element for serialization. An equivalent object can be created by calling ObjectElement.createObject(org.dom4j.Element, Class) on the created element.

Specified by:
toElement in interface XMLStorable
Overrides:
toElement in class FeatureTransformer
Returns:
the created XML element

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class FeatureTransformer
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.