de.fu_berlin.ties.extract
Class AnswerBuilder

java.lang.Object
  extended byde.fu_berlin.ties.ConfigurableProcessor
      extended byde.fu_berlin.ties.TextProcessor
          extended byde.fu_berlin.ties.DocumentReader
              extended byde.fu_berlin.ties.extract.AnswerBuilder
All Implemented Interfaces:
ElementProcessor, Processor

public class AnswerBuilder
extends DocumentReader
implements ElementProcessor

Buildings an ExtractionContainer of answer keys from an annotated text (in XML format).

Instances of this class are thread-safe and can process several documents in parallel.

Version:
$Revision: 1.17 $, $Date: 2004/04/13 07:08:30 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String EXT_ANSWERS
          The recommended file extension to use for storing answer keys.
static String KEY_ANSWERS
          Context key referring to the extraction container used for storing the answer keys.
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
AnswerBuilder(String outExt)
          Creates a new instance, configuring the target structure from the standard configuration.
AnswerBuilder(String outExt, TargetStructure targetStruct, TokenizerFactory tFactory, TiesConfiguration config)
          Creates a new instance.
AnswerBuilder(String outExt, TiesConfiguration config)
          Creates a new instance, configuring the target structure from the provided configuration.
 
Method Summary
 ExtractionContainer buildAnswers(Document document)
          Buildings an ExtractionContainer of answer keys from from an annotated XML document.
 TargetStructure getTargetStructure()
          Returns the target structure specifying the classes to recognize.
 void process(Document document, Writer writer, ContextMap context)
          Buildings an ExtractionContainer of answer keys from from an annotated XML document.
 void processElement(Element element, TokenContainer tokenContainer, ContextMap context)
          Classifies an element in an XML document, building features and delegating to the classifier.
static ExtractionContainer readAnswerKeys(TargetStructure targetStruct, File file, Configuration config)
          Reads back answer keys stored by the process(Document, Writer, ContextMap) method of an instance of this class.
static ExtractionContainer readCorrespondingAnswerKeys(TargetStructure targetStruct, File orgFile, Configuration config)
          Reads the answer keys corresponding to a file.
 String toString()
          Returns a string representation of this object.
 
Methods inherited from class de.fu_berlin.ties.DocumentReader
doProcess
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

KEY_ANSWERS

public static final String KEY_ANSWERS
Context key referring to the extraction container used for storing the answer keys.

See Also:
Constant Field Values

EXT_ANSWERS

public static final String EXT_ANSWERS
The recommended file extension to use for storing answer keys.

See Also:
Constant Field Values
Constructor Detail

AnswerBuilder

public AnswerBuilder(String outExt)
Creates a new instance, configuring the target structure from the standard configuration.

Parameters:
outExt - the extension to use for output files

AnswerBuilder

public AnswerBuilder(String outExt,
                     TiesConfiguration config)
Creates a new instance, configuring the target structure from the provided configuration.

Parameters:
outExt - the extension to use for output files
config - the configuration to use

AnswerBuilder

public AnswerBuilder(String outExt,
                     TargetStructure targetStruct,
                     TokenizerFactory tFactory,
                     TiesConfiguration config)
Creates a new instance.

Parameters:
outExt - the extension to use for output files
targetStruct - the target structure specifying the classes to recognize
tFactory - used to instantiate tokenizers
config - the configuration to use
Method Detail

readAnswerKeys

public static ExtractionContainer readAnswerKeys(TargetStructure targetStruct,
                                                 File file,
                                                 Configuration config)
                                          throws IllegalArgumentException,
                                                 IOException
Reads back answer keys stored by the process(Document, Writer, ContextMap) method of an instance of this class.

Parameters:
targetStruct - the target structure used when creating the answer keys
file - the file containing the answer keys
config - configuration used to determine the character set of the keys (cf. IOUtils.openReader(File, Configuration)
Returns:
an extraction container of the answer keys
Throws:
IllegalArgumentException - if the (@linkplain de.fu_berlin.ties.classify.Prediction#getType() type) of some answer keys in the answer keys doesn't fit the target structure
IOException - if an I/O error occurs while reading the file

readCorrespondingAnswerKeys

public static ExtractionContainer readCorrespondingAnswerKeys(TargetStructure targetStruct,
                                                              File orgFile,
                                                              Configuration config)
                                                       throws IllegalArgumentException,
                                                              IOException
Reads the answer keys corresponding to a file. The answer keys must be in a file ending in EXT_ANSWERS instead of the extension of the original file.

Parameters:
targetStruct - the target structure used when creating the answer keys
orgFile - the file whose answer keys should be returned
config - configuration used to determine the character set of the keys (cf. IOUtils.openReader(File, Configuration)
Returns:
an extraction container of the answer keys
Throws:
IllegalArgumentException - if the (@linkplain de.fu_berlin.ties.classify.Prediction#getType() type) of some answer keys in the answer keys doesn't fit the target structure
IOException - if an I/O error occurs while reading the file

buildAnswers

public ExtractionContainer buildAnswers(Document document)
                                 throws IOException,
                                        ProcessingException
Buildings an ExtractionContainer of answer keys from from an annotated XML document.

Parameters:
document - the document to read
Returns:
a container of the answer keys of this document
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

getTargetStructure

public TargetStructure getTargetStructure()
Returns the target structure specifying the classes to recognize.

Returns:
the used target structure

process

public void process(Document document,
                    Writer writer,
                    ContextMap context)
             throws IOException,
                    ProcessingException
Buildings an ExtractionContainer of answer keys from from an annotated XML document.

Specified by:
process in class DocumentReader
Parameters:
document - the document to read
writer - the writer to write the processed text to; flushed but not closed by this method
context - a map of objects that are made available for processing
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

processElement

public void processElement(Element element,
                           TokenContainer tokenContainer,
                           ContextMap context)
Classifies an element in an XML document, building features and delegating to the classifier.

Specified by:
processElement in interface ElementProcessor
Parameters:
element - the element to process
tokenContainer - a container storing all tokens seen in the document so far; TokenContainer.getLast() contains the textual content of the element and its child elements
context - a map of objects that are made available for processing; the KEY_ANSWERS key must map to an extraction container used for storing the answer keys

toString

public String toString()
Returns a string representation of this object.

Overrides:
toString in class TextProcessor
Returns:
a textual representation


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.