|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.context.Representation
de.fu_berlin.ties.context.AbstractRepresentation
de.fu_berlin.ties.context.DefaultRepresentation
public class DefaultRepresentation
The context representation used by default. This class is thread-safe.
Field Summary | |
---|---|
protected static String |
AXIS_ANCESTOR
Ancestor axis. |
protected static String |
AXIS_DESC_OR_SELF
Descendant-or-self axis. |
protected static String |
AXIS_FOLLOW_SIBLING
Following sibling axis. |
protected static String |
AXIS_PREC_SIBLING
Preceeding sibling axis. |
protected static String |
AXIS_PRIOR
The pseudo-axis of prior recognitions. |
protected static org.apache.commons.collections.map.LinkedMap |
TOKEN_TYPE_PATTERNS
A sequence map mapping used by calculateValuesFromText(String, String, List) to determine the
"tokenType" value. |
Fields inherited from class de.fu_berlin.ties.context.AbstractRepresentation |
---|
CONFIG_RECOGN_NUM, CONFIG_SPLIT_MAXIMUM, CONFIG_STORE_NTH |
Constructor Summary | |
---|---|
DefaultRepresentation()
Creates a new instance based on the standard configuration. |
|
DefaultRepresentation(int recogNum,
int detailedRecogs,
int numberOfAncestors,
int numberOfSiblings,
int splitMax,
int prefixMax,
String headElementName,
String headAttribName,
String[] defaultAttribs,
int n,
String outCharset,
String[] sensorNames,
TiesConfiguration config)
Creates a new instance. |
|
DefaultRepresentation(TiesConfiguration config)
Creates a new instance based on the provided configuration. |
Method Summary | |
---|---|
protected void |
buildFeatures(String axisName,
Element element,
ElementPosition position,
boolean recurseInsteadOfText,
LinkedList<Feature> featureList,
boolean addAtEnd,
Map<Element,List<LocalFeature>> cache)
Builds the features of an element and appends them to the specified featureList . |
protected List<LocalFeature> |
buildLocalFeatures(Element element,
ElementPosition position,
boolean ignoreText)
Builds the local features of an element. |
protected List<Feature> |
buildPrior(PriorRecognitions priorRecognitions)
Builds the pseudo-axis of prior recognitions. |
protected void |
buildTextFeatures(String axisName,
Element element,
String trimmedLeft,
String trimmedMain,
String trimmedRight,
LinkedList<Feature> featureList)
Builds the context representation of text in an element, differentiating between three kinds of textual contents: a left part, a main part, and a right part. |
protected void |
calculateHeadValues(Element element,
List<LocalFeature> values)
Creates values that depend on "head" children of an element, if the element contains any of them. |
protected void |
calculatePositionalValues(String elementName,
ElementPosition position,
List<LocalFeature> values)
Calculates values that depend on the position of an element within its parent. |
protected void |
calculateValuesFromText(String elementName,
String trimmedText,
List<LocalFeature> values)
Calculates values that depend on the textual content of an element: prefixes, suffixes, length data, and token type. |
protected String |
determineHeadValue(Element element)
Helper method for determining the head value for an element of type getHeadElement() . |
protected String |
determineRoughPosition(int position,
int elementCount)
Helper method called by calculatePositionalValues(String, ElementPosition, List) to
collapse a position in to one of five values. |
protected FeatureVector |
doBuildContext(Element element,
String leftText,
String mainText,
String rightText,
PriorRecognitions priorRecognitions,
Map<Element,List<LocalFeature>> featureCache,
String logPurpose)
Builds the context representation of text in an element. |
protected List<Feature> |
filterRepresentation(FeatureVector originalRep)
Creates a filtered view of a context representation. |
int |
getAncestorNumber()
Returns the maximum number of ancestors to include in the context representation. |
Set |
getDefaultAttributes()
Returns the unmodifiable set of names of default attributes. |
int |
getDetailedRecognitions()
Returns the number of preceding recognitions to represent in detail. |
String |
getHeadAttribute()
Returns the name of the element to use for calculating head values. |
String |
getHeadElement()
Returns the name of the attribute to use for calculating head values. |
int |
getSiblingNumber()
Returns the basic number of preceding and following siblings to include in the context representation. |
protected void |
handleAncestors(Element element,
int ancestorsToAdd,
int ancestorSiblingsToAdd,
LinkedList<Feature> ancestorFeatures,
LinkedList<Feature> ancestorSiblingFeatures,
Bag processedAncestorNames,
Map<Element,List<LocalFeature>> cache)
Handles ancestors and ancestor siblings of an element. |
protected ElementPosition |
handleSiblings(String axisPrefix,
Element element,
int baseNumber,
LinkedList<Feature> precedingFeatures,
LinkedList<Feature> followingFeatures,
Map<Element,List<LocalFeature>> cache)
Adds the preceding and following siblings of an element. |
protected void |
removeExtraMarkers(List features)
Modifies a list of GlobalFeature s to remove extraneous
FeatureType.MARKER features. |
protected List<Element> |
selectFollowingSiblings(Element mainElement,
LinkedList<Element> allFollowingSiblings,
int baseNumber)
Selects the siblings to keep among all following siblings. |
protected List<Element> |
selectPrecedingSiblings(Element mainElement,
LinkedList<Element> allPrecedingSiblings,
int baseNumber)
Selects the siblings to keep among all preceding siblings. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class de.fu_berlin.ties.context.AbstractRepresentation |
---|
buildContext, getSplitMaximum, getStoreN |
Methods inherited from class de.fu_berlin.ties.context.Representation |
---|
buildContext, buildContext, createRecognitionBuffer, getRecognitionNumber |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected static final String AXIS_ANCESTOR
protected static final String AXIS_DESC_OR_SELF
protected static final String AXIS_FOLLOW_SIBLING
protected static final String AXIS_PREC_SIBLING
protected static final String AXIS_PRIOR
protected static final org.apache.commons.collections.map.LinkedMap TOKEN_TYPE_PATTERNS
calculateValuesFromText(String, String, List)
to determine the
"tokenType" value. Maps description strings to Pattern
s to
match. The description string for the first matching pattern is used.
Initialized in the static constuctor. The value "mixed" is reserved for
tokens that are not matched by any of the patterns.
Constructor Detail |
---|
public DefaultRepresentation() throws ProcessingException
ProcessingException
- if an error occurs while initializing this
instancepublic DefaultRepresentation(TiesConfiguration config) throws ProcessingException
config
- used to configure this instance
ProcessingException
- if an error occurs while initializing this
instancepublic DefaultRepresentation(int recogNum, int detailedRecogs, int numberOfAncestors, int numberOfSiblings, int splitMax, int prefixMax, String headElementName, String headAttribName, String[] defaultAttribs, int n, String outCharset, String[] sensorNames, TiesConfiguration config) throws ProcessingException
DOMUtils.name(Element)
.
recogNum
- the number of preceding recognitions to representdetailedRecogs
- the number of preceding recognitions to represent
in detailnumberOfAncestors
- the maximum number of ancestors to include in
the context representationnumberOfSiblings
- the basic number of preceding and following
siblings to includesplitMax
- the maximum number of subsequences to keep when
a feature value must be split (at whitespace).prefixMax
- the maximum length of prefixed and suffixesheadElementName
- the name of the element to use for
calculating head values (cf. calculateHeadValues(Element, List)
)headAttribName
- the name of the attribute to use for
calculating head valuesdefaultAttribs
- array of names of default attributesoutCharset
- the output character set to use (only used to
store some configurations for inspection purposes, if n
> 0); if null
, the default charset of the current
platform is usedn
- Each n-th context representation is stored if > 0;
otherwise no representation is storedsensorNames
- array of fully specified names of classes
implementing the Sensor
interface to be used to look up semantic
informationconfig
- used to configure the sensors
ProcessingException
- if an error occurs while initializing this
instanceMethod Detail |
---|
protected FeatureVector doBuildContext(Element element, String leftText, String mainText, String rightText, PriorRecognitions priorRecognitions, Map<Element,List<LocalFeature>> featureCache, String logPurpose) throws ClassCastException, IllegalArgumentException
doBuildContext
in class AbstractRepresentation
element
- the element whose context should be representedleftText
- textual content to the left of (preceding)
mainText
, might be emptymainText
- the main textual content to represent, might be emptyrightText
- textual content to the right of (following)
mainText
, might be emptypriorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling Representation.createRecognitionBuffer()
;
might be null
featureCache
- a cache of (local) feature, should be re-used
between all calls for the nodes in a single document (but must not be
re-used when building the context of nodes in different documents!)logPurpose
- the type of contexts of main interest to the caller
(e.g. "Token" or "Sentence"), used for logging
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
s
IllegalArgumentException
- if the specified node is of an
unsupported typeprotected void buildFeatures(String axisName, Element element, ElementPosition position, boolean recurseInsteadOfText, LinkedList<Feature> featureList, boolean addAtEnd, Map<Element,List<LocalFeature>> cache)
featureList
. Handles attributes and calculated
values and the element itself. Child elements are only handled when
recurseInsteadOfText
it true
-- the axis name
is not changed for child elements.
axisName
- the name of the axis, used to start each featureelement
- the element to processposition
- wraps the position of the element within its parent
element and related data, used for calculating positional features;
if null
, no positional features are calculatedrecurseInsteadOfText
- if true
child elements
are recursively processed, but no
values from
text are calculated; otherwise text is processed but child elements are
notfeatureList
- the list of GlobalFeature
s to add features toaddAtEnd
- whether to add the new features at the end of at the
beginning of the featureList
cache
- a cache of local feature, mapping Element
s to
List
s of LocalFeature
protected List<LocalFeature> buildLocalFeatures(Element element, ElementPosition position, boolean ignoreText)
LocalFeature
s can be stored in a cache for re-use; they must
be combined with an axis name for getting a global feature.
element
- the element to processposition
- wraps the position of the element within its parent
element and related data, used for calculating positional features;
if null
, no positional features are calculatedignoreText
- if true
, no values from text or
head values are
calculated
LocalFeature
sprotected List<Feature> buildPrior(PriorRecognitions priorRecognitions) throws ClassCastException
priorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling Representation.createRecognitionBuffer()
GlobalFeature
s representing prior recognitions
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
sprotected void buildTextFeatures(String axisName, Element element, String trimmedLeft, String trimmedMain, String trimmedRight, LinkedList<Feature> featureList)
axisName
- the name of the axis, used to start each featureelement
- the element whose context should be representedtrimmedLeft
- trimmed textual content to the left of (preceding)
trimmedMain
, might be emptytrimmedMain
- trimmed main textual content to represent, might
be emptytrimmedRight
- trimmed textual content to the right of (following)
trimmedMain
, might be emptyfeatureList
- a list of GlobalFeature
s to add the values toprotected void calculateHeadValues(Element element, List<LocalFeature> values)
getHeadElement()
, the
value of the getHeadAttribute()
or else the textual content of
its right-most child element is stored (in a calculated value named
"head"). Child elements are iterated from right to left unless one
containing an appropriate attribute or textual content is found (or none
are left).
getHeadElement()
, the first and last child element of this type
are recursively processed and the results stored in calculated names
named "lhead" and "rhead". Of there is only a single child of appropriate
type, the "lhead" value is omitted.
If a value contains whitespace, only the final subsequence following all whitespace is preserved.
element
- the element to processvalues
- a list of LocalFeature
s to add the calculated
values toprotected void calculatePositionalValues(String elementName, ElementPosition position, List<LocalFeature> values)
elementName
- the name of the element to process, as returned by
DOMUtils.name(Element)
position
- wraps the position of the element within its parent
element and related data, must not be null
values
- a list to add the calculated values toprotected void calculateValuesFromText(String elementName, String trimmedText, List<LocalFeature> values) throws IllegalArgumentException
elementName
- the name of the element to process, as returned by
DOMUtils.name(Element)
trimmedText
- the trimmed textual content of the element to
process, must not be emptyvalues
- a list of LocalFeature
s to add the calculated
values to
IllegalArgumentException
- if the empty string was given as
trimmedText
protected String determineHeadValue(Element element)
getHeadElement()
. See
calculateHeadValues(Element, List)
for a description of the
algorithm.
element
- the element to process, must be of type
getHeadElement()
getHeadAttribute()
nor
textual content)protected String determineRoughPosition(int position, int elementCount)
calculatePositionalValues(String, ElementPosition, List)
to
collapse a position in to one of five values.
position
- the position counted from 0, should be non-negative and
smaller than elementCount
(otherwise the results are
undefined)elementCount
- the number of elements
protected List<Feature> filterRepresentation(FeatureVector originalRep)
Features representing markers (FeatureType.MARKER
),
stand-alone elements (FeatureType.ELEMENT
) and default attributes
(getDefaultAttributes()
) are included in all filtered
representations. Comment-only features are ignored.
originalRep
- a feature vector containing the representation
to filter
Feature
s
combining the filtered representations created by this methodpublic int getAncestorNumber()
public Set getDefaultAttributes()
DOMUtils.name(Attribute)
.
public int getDetailedRecognitions()
public String getHeadAttribute()
calculateHeadValues(Element, List)
public String getHeadElement()
calculateHeadValues(Element, List)
public int getSiblingNumber()
protected void handleAncestors(Element element, int ancestorsToAdd, int ancestorSiblingsToAdd, LinkedList<Feature> ancestorFeatures, LinkedList<Feature> ancestorSiblingFeatures, Bag processedAncestorNames, Map<Element,List<LocalFeature>> cache) throws IllegalArgumentException
element
- the element to processancestorsToAdd
- the number of ancestors to add, must be > 0;
if > 1, this method calls itself recursively, decreasing the number
by 1ancestorSiblingsToAdd
- the number of preceding/following siblings
of the current ancestors to add; if 0 or negative, no siblings are added;
a recursive call decreases this parameter by 1 if any ancestor siblings
were found (if no siblings were found, the number passes unchanged)ancestorFeatures
- the list of GlobalFeature
s to prepend
the features on the ancestors toancestorSiblingFeatures
- the list of GlobalFeature
s to
append the features on the ancestors siblings toprocessedAncestorNames
- a bag that will typically be empty when
first calling this method (will be filled by recursive calls)cache
- a cache of local feature, mapping Element
s to
List
s of LocalFeature
IllegalArgumentException
- if ancestorsToAdd
is 0 or
negativeprotected ElementPosition handleSiblings(String axisPrefix, Element element, int baseNumber, LinkedList<Feature> precedingFeatures, LinkedList<Feature> followingFeatures, Map<Element,List<LocalFeature>> cache)
axisPrefix
- the prefix of the axis name, used to start each
feature; specify the empty string if no prefix should be usedelement
- the element to processbaseNumber
- the basic number of siblings to keep; the actual
number of siblings kept might varyprecedingFeatures
- the list of GlobalFeature
s to prepend
the features on the preceding siblings tofollowingFeatures
- the list of GlobalFeature
s to append
the features on the following siblings tocache
- a cache of local feature, mapping Element
s to
List
s of LocalFeature
calculatePositionalValues(String, ElementPosition, List)
, or
null
if there is no parent elementprotected void removeExtraMarkers(List features)
GlobalFeature
s to remove extraneous
FeatureType.MARKER
features. Keeps only the last one of several
sequential marker features; trailing marker features are removed as well.
features
- the list of features to modifyprotected List<Element> selectFollowingSiblings(Element mainElement, LinkedList<Element> allFollowingSiblings, int baseNumber)
baseNumber
first siblings.
One of the selected siblings may have a different type (name as returned
by DOMUtils.name(Element)
) than the main element --
if there are more with different types, they are skipped.
mainElement
- the element whose siblings should be selectedallFollowingSiblings
- the list of all following siblingsbaseNumber
- the basic number of siblings to keep; the actual
number of siblings kept might vary
protected List<Element> selectPrecedingSiblings(Element mainElement, LinkedList<Element> allPrecedingSiblings, int baseNumber)
baseNumber
last siblings. One of the last siblings may
have a different type (name as returned by
DOMUtils.name(Element)
) than the main element --
if there are more with different types, they are skipped.
If none of the baseNumber
last siblings has a different
types, the last siblings with a different type (name) is also kept.
mainElement
- the element whose siblings should be selectedallPrecedingSiblings
- the list of all preceding siblingsbaseNumber
- the basic number of siblings to keep; the actual
number of siblings kept might vary
public String toString()
toString
in class AbstractRepresentation
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |