de.fu_berlin.ties.eval
Class AverageLength

java.lang.Object
  extended by de.fu_berlin.ties.ConfigurableProcessor
      extended by de.fu_berlin.ties.TextProcessor
          extended by de.fu_berlin.ties.eval.AverageLength
All Implemented Interfaces:
Closeable, Processor

public class AverageLength
extends TextProcessor
implements Closeable

A simple goal that reads a list of EvaluatedExtractionContainers and calculates the average length (in characters and tokens) for extractions of of all types (e.g. speaker, location etc.) and all evaluation statuses (e.g. correct, missing etc.)

Instances of this type are not thread-safe.

Version:
$Revision: 1.8 $, $Date: 2006/10/21 16:04:11 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String KEY_TOKEN_LENGTH
          The key used by the metricsByLength() method to serialize the token lengths.
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
AverageLength()
          Creates a new instance, using a default extension and the standard configuration.
AverageLength(String outExt)
          Creates a new instance, using the standard configuration.
AverageLength(String outExt, TiesConfiguration conf)
          Creates a new instance.
 
Method Summary
 FieldContainer[] calculateAverageLengths()
          Calculates the average length (in visible characters and tokens) for all extractions of all types and all evaluation statuses processed do far.
 void close(int errorCount)
          Closes this instance, releasing all resources and stopping any background threads.
protected  void doProcess(Reader reader, Writer writer, ContextMap context)
          Processes the contents of a reader, writing a modified version to a writer.
 Map<String,FieldContainer> metricsByLength()
          Returns the usual metrics F-measure, precision and recall, calculated separately for all extractions of the same type (as usual) and token length.
 void updateAverageLengths(ExtractionContainer extractions)
          Analyzes an extraction container, updating the average lengths for extractions of all types and all evaluation statuses.
 void updateAverageLengths(Reader reader)
          Analyzes the serialized contents of an extraction container, delegating to updateAverageLengths(ExtractionContainer).
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process, process, process, toString
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

KEY_TOKEN_LENGTH

public static final String KEY_TOKEN_LENGTH
The key used by the metricsByLength() method to serialize the token lengths.

See Also:
Constant Field Values
Constructor Detail

AverageLength

public AverageLength()
Creates a new instance, using a default extension and the standard configuration.


AverageLength

public AverageLength(String outExt)
Creates a new instance, using the standard configuration.

Parameters:
outExt - the extension to use for output files

AverageLength

public AverageLength(String outExt,
                     TiesConfiguration conf)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
conf - the configuration to use
Method Detail

calculateAverageLengths

public FieldContainer[] calculateAverageLengths()
                                         throws IllegalStateException
Calculates the average length (in visible characters and tokens) for all extractions of all types and all evaluation statuses processed do far. Requires at least one previous call to one of the updateAverageLengths methods -- otherwise there is nothing to calculate.

Returns:
an array of two field containers containing the average character counts (first container) and average token counts (second container) in a two-dimensional matrix
Throws:
IllegalStateException - if no update method has been invoked

updateAverageLengths

public void updateAverageLengths(ExtractionContainer extractions)
Analyzes an extraction container, updating the average lengths for extractions of all types and all evaluation statuses.

Parameters:
extractions - the container of evaluated extractions

updateAverageLengths

public void updateAverageLengths(Reader reader)
                          throws IOException
Analyzes the serialized contents of an extraction container, delegating to updateAverageLengths(ExtractionContainer).

Parameters:
reader - reader containg the extractions to analyse in DelimSepValues format; not closed by this method
Throws:
IOException - if an I/O error occurs while reading the extractions

close

public void close(int errorCount)
           throws IOException
Closes this instance, releasing all resources and stopping any background threads.

Specified by:
close in interface Closeable
Parameters:
errorCount - the number of errors (exceptions) that occurred during calls to this instance (0 if none)
Throws:
IOException - if an I/O error occurs

doProcess

protected void doProcess(Reader reader,
                         Writer writer,
                         ContextMap context)
                  throws IOException,
                         ProcessingException
Processes the contents of a reader, writing a modified version to a writer.

Specified by:
doProcess in class TextProcessor
Parameters:
reader - reader containing the text to process; should not be closed by this method
writer - the writer to write the processed text to; might be flushed but not closed by this method; if this method does not use the writer, the underlying file will be deleted afterwards
context - a map of objects that are made available for processing; when called from the implemented process methods in this class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET to the character set of the output writer; from TextProcessor.KEY_OUT_DIRECTORY to the output directory (File); from ContentType.KEY_MIME_TYPE to the document's MIME type; from TextProcessor.KEY_LOCAL_NAME to the local name (String) and either from TextProcessor.KEY_DIRECTORY to the input directory (File), in case of a local file) or from TextProcessor.KEY_URL to the URL (otherwise) of the processed document
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

metricsByLength

public Map<String,FieldContainer> metricsByLength()
                                           throws IllegalStateException
Returns the usual metrics F-measure, precision and recall, calculated separately for all extractions of the same type (as usual) and token length. The return map contains the names of the three metrics as keys and an 2-dimensional representation of their values, indexed by extraction types as column names and token lengths as row names. In a fourth container, the number of answer keys (expected extractions) is returned.

Returns:
a mapping from metrics to field containers as described above
Throws:
IllegalStateException - if no update method has been invoked


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.