Trainable Information Extractor 1.0 API

This document is the API specification for the Trainable Information Extractor (TIE) software.

See:
          Description

Packages
de.fu_berlin.ties This package bundles main entry points and general interfaces and classes for the TIE software.
de.fu_berlin.ties.classify This package provides functionality for classification of texts and feature vectors.
de.fu_berlin.ties.classify.feature This package contains classes for working with features, feature vectors and feature transformers.
de.fu_berlin.ties.classify.winnow This package contains the Winnow classification algorithm and related algorithms and classes.
de.fu_berlin.ties.combi This package provides combination strategies for combining sequential classification decisions.
de.fu_berlin.ties.context This packages provides functionality for building and managing representations of context in texts (XML documents).
de.fu_berlin.ties.context.sensor Sensors are object that look up information for a token, for example semantic information from gazetteers or thesauri.
de.fu_berlin.ties.demo This package contains demo code for showing how the system works.
de.fu_berlin.ties.eval This packages provides functionality for evaluating results of classifiers and extractors.
de.fu_berlin.ties.extract This package handles information extraction and entity recognition.
de.fu_berlin.ties.extract.amend This package provides code for reanalysing proposed extractions and performing suitable amendments to improve results.
de.fu_berlin.ties.extract.reestimate This package contains code for re-estimating the probabilites of extraction, for example based on the length or the content.
de.fu_berlin.ties.filter This packages provides generic filtering and rewriting functionality.
de.fu_berlin.ties.io This package provides classes for input/output handling and for (de)serialization.
de.fu_berlin.ties.preprocess This packages handles format conversions and linguistic preprocessing of documents.
de.fu_berlin.ties.text This package contains utility classes for working with texts.
de.fu_berlin.ties.util This package contains miscellaneous utility classes.
de.fu_berlin.ties.xml This package contains utility classes for working with XML documents and related data.
de.fu_berlin.ties.xml.convert This package contains code for converting XML to/from other formats and for transforming XML documents.
de.fu_berlin.ties.xml.dom This package contains utility classes for working with DOM-like XML representations, focussing especially on dom4j.

 

This document is the API specification for the Trainable Information Extractor (TIE) software. TIE is an incrementally trainable system for information extraction, text classification and generally language engineering. It employs classification models for working with texts. Other modules allow to augment text with linguistic annotations (by delegating to external tools) and to resolve nesting errors and other kinds of well-formedness violations in XML-like input.

Usage Notes

Thread Pooling and Asynchronous Execution

For asynchronous execution of tasks, the static TaskRunner functionality is available. It so often internally, e.g. by several Processors and by ExternalCommand.

To allow efficient thread re-use, it is highly recommended to initially register your interest in the default task runner and to finally deregister. A good idea is to do this at the begin and end of your main method. You should deregister in a finally block and you must not forget to deregister, otherwise your program might run forever (because the worker threads continue waiting for tasks even after all other threads have terminated).



Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.