|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.context.Representation
public abstract class Representation
Abstract class that manages context representations for entity recognition
and information extraction. Subclasses must implement the
buildContext(Element, String, String, String, PriorRecognitions,
Map, String)
method for building representations.
Constructor Summary | |
---|---|
Representation(int recogNum)
Creates a new instance. |
Method Summary | |
---|---|
FeatureVector |
buildContext(Document document,
PriorRecognitions priorRecognitions,
Map<Element,List<LocalFeature>> featureCache,
String logPurpose)
Builds the context representation of a document. |
FeatureVector |
buildContext(Element element,
PriorRecognitions priorRecognitions,
Map<Element,List<LocalFeature>> featureCache,
String logPurpose)
Builds the context representation of an element. |
abstract FeatureVector |
buildContext(Element element,
String leftText,
String mainText,
String rightText,
PriorRecognitions priorRecognitions,
Map<Element,List<LocalFeature>> featureCache,
String logPurpose)
Builds the context representation of text in an element. |
FeatureVector |
buildFeatures(Reader reader)
Extracts a vector of relevant features from a text sequence. The input text must contain a well-formed
XML element, otherwise this method will not work. |
int |
getRecognitionNumber()
Returns the number of preceding recognitions to represent. |
PriorRecognitions |
initDocument(File filename,
TokenizerFactory tFactory)
Initializes the processing of a new document and creates a buffer to be filled with prior Recognition s and passed
as argument to the buildContext(Element, String, String, String,
PriorRecognitions, Map, String) method. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public Representation(int recogNum)
recogNum
- the number of preceding recognitions to representMethod Detail |
---|
public FeatureVector buildContext(Document document, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose) throws ClassCastException
buildContext(Element,
PriorRecognitions, Map, String)
method, using the root element of the
document.
document
- the XML document whose context should be representedpriorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling initDocument(java.io.File, de.fu_berlin.ties.text.TokenizerFactory)
;
might be null
featureCache
- a cache of (local) feature, should be re-used between
all calls for the nodes in a single document (but must not be re-used
when building the context of nodes in different documents!)logPurpose
- the type of contexts of main interest to the caller
(e.g. "Token" or "Sentence"), used for logging
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
spublic FeatureVector buildContext(Element element, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose) throws ClassCastException
buildContext(Element, String,
String, String, PriorRecognitions, Map, String)
method, using the
full textual content
of the element as mainText
and empty strings as
leftText
and rightText
.
element
- the element whose context should be representedpriorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling initDocument(java.io.File, de.fu_berlin.ties.text.TokenizerFactory)
;
might be null
featureCache
- a cache of (local) feature, should be re-used between
all calls for the nodes in a single document (but must not be re-used
when building the context of nodes in different documents!)logPurpose
- the type of contexts of main interest to the caller
(e.g. "Token" or "Sentence"), used for logging
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
spublic abstract FeatureVector buildContext(Element element, String leftText, String mainText, String rightText, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose) throws ClassCastException
element
- the element whose context should be representedleftText
- textual content to the left of (preceding)
mainText
, might be emptymainText
- the main textual content to represent, might be emptyrightText
- textual content to the right of (following)
mainText
, might be emptypriorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling initDocument(java.io.File, de.fu_berlin.ties.text.TokenizerFactory)
;
might be null
featureCache
- a cache of (local) feature, should be re-used between
all calls for the nodes in a single document (but must not be re-used
when building the context of nodes in different documents!)logPurpose
- the type of contexts of main interest to the caller
(e.g. "Token" or "Sentence"), used for logging
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
spublic FeatureVector buildFeatures(Reader reader) throws IOException, ProcessingException
input
text must contain a well-formed
XML element, otherwise this method will not work.
buildFeatures
in interface FeatureExtractor
reader
- a reader containing the text to represent
IOException
- if an I/O error occurs while reading the input
ProcessingException
- if an error occurs while processing the inputpublic PriorRecognitions initDocument(File filename, TokenizerFactory tFactory) throws ProcessingException, IOException
Recognition
s and passed
as argument to the buildContext(Element, String, String, String,
PriorRecognitions, Map, String)
method. The caller must
PriorRecognitions.add(Recognition)
new recognitions to the buffer
but it is not necessary to remove them -- the buffer will automatically
delete the oldest recognitions when appropriate.
filename
- the name of the filetFactory
- a factory that can be used for creating tokenizers,
if required
Recognition
s
ProcessingException
- if an error occurs while starting to
process the document
IOException
- if an I/O error occurspublic int getRecognitionNumber()
public String toString()
toString
in class Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |