de.fu_berlin.ties.filter
Class PredictionRewriter2

java.lang.Object
  extended by de.fu_berlin.ties.filter.PredictionRewriter2
All Implemented Interfaces:
DocumentRewriter, TokenProcessor

public class PredictionRewriter2
extends Object
implements DocumentRewriter, TokenProcessor

A variant of the prediction rewriter that uses predictions from another process (e.g. named entities) to provide additional semantic information. This variant does not modify the element structure of the document, but stores the predictions as XML attributes.

You should generally use this class instead of PredictionRewriter since it generally has superior results. Instances of this class are not thread-safe and must not be used to process multiple documents in parallel.

Version:
$Revision: 1.10 $, $Date: 2006/10/21 16:04:20 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String ATTRIB_PRED
          Name of the attribute to add.
static String CONFIG_PRED_NONE
          Configuration key: "None" marker to use for tokens that do not belong to any prediction -- if empty or missing, these tokens are not tagged.
 
Constructor Summary
PredictionRewriter2(String fileExtension, String[] predictionClasses, String myNoneMarker, TokenizerFactory factory, TiesConfiguration conf)
          Creates a new instance.
PredictionRewriter2(TiesConfiguration conf)
          Creates a new instance.
 
Method Summary
 void processToken(Element element, String left, TokenDetails details, String right, ContextMap context)
          Processes a token in an XML element, optionally modifying the element or the document it is part of.
 Document rewrite(Document document, File filename)
          Rewrites a document.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTRIB_PRED

public static final String ATTRIB_PRED
Name of the attribute to add.

See Also:
Constant Field Values

CONFIG_PRED_NONE

public static final String CONFIG_PRED_NONE
Configuration key: "None" marker to use for tokens that do not belong to any prediction -- if empty or missing, these tokens are not tagged.

See Also:
Constant Field Values
Constructor Detail

PredictionRewriter2

public PredictionRewriter2(TiesConfiguration conf)
                    throws ProcessingException
Creates a new instance.

Parameters:
conf - used to configure this instance; must not be null
Throws:
ProcessingException - if an error occurs while initializing the combination strategies

PredictionRewriter2

public PredictionRewriter2(String fileExtension,
                           String[] predictionClasses,
                           String myNoneMarker,
                           TokenizerFactory factory,
                           TiesConfiguration conf)
                    throws ProcessingException
Creates a new instance.

Parameters:
fileExtension - extension of the files containing predictions
predictionClasses - names of the prediction classes to use -- if empty array, all are used
myNoneMarker - "none" marker to use for tokens that do not belong to any prediction -- if empty or null, these tokens are not tagged
factory - used to instantiate tokenizers
conf - used to configure this instance; must not be null
Throws:
ProcessingException - if an error occurs while initializing the combination strategies
Method Detail

processToken

public void processToken(Element element,
                         String left,
                         TokenDetails details,
                         String right,
                         ContextMap context)
                  throws IOException
Processes a token in an XML element, optionally modifying the element or the document it is part of.

Specified by:
processToken in interface TokenProcessor
Parameters:
element - the element containing the token
left - the textual contents of the element to the left of the token (in case of mixed contents, only up to the last preceding child element, if any)
details - details about the token to process
right - the textual contents of the element to the right of the token (in case of mixed contents, only up to the next following child element, if any)
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs

rewrite

public Document rewrite(Document document,
                        File filename)
                 throws IOException,
                        ProcessingException
Rewrites a document.

Specified by:
rewrite in interface DocumentRewriter
Parameters:
document - the document to modify
filename - the name of the document
Returns:
the modified document; this object may or may not be identical to the document passed it
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during rewriting

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class Object
Returns:
a textual representation


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.