|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.context.Representation
de.fu_berlin.ties.context.DefaultRepresentation
The context representation used by default. This class is thread-safe.
Field Summary | |
protected static String |
AXIS_ANCESTOR
Ancestor axis. |
protected static String |
AXIS_DESC_OR_SELF
Descendant-or-self axis. |
protected static String |
AXIS_FOLLOW_SIBLING
Following sibling axis. |
protected static String |
AXIS_PREC_SIBLING
Preceeding sibling axis. |
protected static String |
AXIS_PRIOR
The pseudo-axis of prior recognitions. |
protected static int |
MAXIMUM_PREFIX_LENGTH
The maximum length of prefixed and suffixes. |
protected static SequencedHashMap |
TOKEN_TYPE_PATTERNS
A sequence map mapping used by calculateValuesFromText(String, String, List) to determine the
"tokenType" value. |
Constructor Summary | |
DefaultRepresentation()
Creates a new instance based on the standard TIES configuration. |
|
DefaultRepresentation(int recogNum,
int detailedRecogs,
int numberOfAncestors,
int numberOfSiblings,
int splitMax,
String headElementName,
String headAttribName,
String[] defaultAttribs,
int n,
String outCharset)
Creates a new instance. |
|
DefaultRepresentation(TiesConfiguration config)
Creates a new instance based on the provided configuration. |
Method Summary | |
FeatureVector |
buildContext(Element element,
String leftText,
String mainText,
String rightText,
PriorRecognitions priorRecognitions,
Map featureCache)
Builds the context representation of text in an element. |
protected void |
buildFeatures(String axisName,
Element element,
ElementPosition position,
boolean recurseInsteadOfText,
LinkedList featureList,
boolean addAtEnd,
Map cache)
Builds the features of an element and appends them to the specified featureList . |
protected List |
buildLocalFeatures(Element element,
ElementPosition position,
boolean ignoreText)
Builds the local features of an element. |
protected List |
buildPrior(PriorRecognitions priorRecognitions)
Builds the pseudo-axis of prior recognitions. |
protected void |
buildTextFeatures(String axisName,
Element element,
String trimmedLeft,
String trimmedMain,
String trimmedRight,
LinkedList featureList)
Builds the context representation of text in an element, differentiating between three kinds of textual contents: a left part, a main part, and a right part. |
protected void |
calculateHeadValues(Element element,
List values)
Creates values that depend on "head" children of an element, if the element contains any of them. |
protected void |
calculatePositionalValues(String elementName,
ElementPosition position,
List values)
Calculates values that depend on the position of an element within its parent. |
protected void |
calculateValuesFromText(String elementName,
String trimmedText,
List values)
Calculates values that depend on the textual content of an element: prefixes, suffixes, length data, and token type. |
protected String |
determineHeadValue(Element element)
Helper method for determining the head value for an element of type getHeadElement() . |
protected String |
determineRoughPosition(int position,
int elementCount)
Helper method called by calculatePositionalValues(String, ElementPosition, List) to
collapse a position in to one of five values:
first
for the first element
early
if position is within the first third of all elements (but not the
first one), upper limit included
middle
if position is within the second third of all elements, limits
excluded
late
if position is within the last third of all elements (but not the
last one), lower limit included
last
for the last element
|
protected List |
filterRepresentation(FeatureVector originalRep)
Creates a filtered view of a context representation. |
int |
getAncestorNumber()
Returns the maximum number of ancestors to include in the context representation. |
Set |
getDefaultAttributes()
Returns the unmodifiable set of names of default attributes. |
int |
getDetailedRecognitions()
Returns the number of preceding recognitions to represent in detail. |
String |
getHeadAttribute()
Returns the name of the element to use for calculating head values. |
String |
getHeadElement()
Returns the name of the attribute to use for calculating head values. |
int |
getSiblingNumber()
Returns the basic number of preceding and following siblings to include in the context representation. |
int |
getSplitMaximum()
Returns the maximum number of subsequences to keep when a feature value must be split (at whitespace). |
int |
getStoreN()
Each storeN-th context representation is stored for debugging and inspection purposes (if > 0, otherwise no representation is stored). |
protected void |
handleAncestors(Element element,
int ancestorsToAdd,
int ancestorSiblingsToAdd,
LinkedList ancestorFeatures,
LinkedList ancestorSiblingFeatures,
Bag processedAncestorNames,
Map cache)
Handles ancestors and ancestor siblings of an element. |
protected ElementPosition |
handleSiblings(String axisPrefix,
Element element,
int baseNumber,
LinkedList precedingFeatures,
LinkedList followingFeatures,
Map cache)
Adds the preceding and following siblings of an element. |
protected void |
removeExtraMarkers(List features)
Modifies a list of GlobalFeature s to remove extraneous
FeatureType.MARKER features. |
protected List |
selectFollowingSiblings(Element mainElement,
LinkedList allFollowingSiblings,
int baseNumber)
Selects the siblings to keep among all following siblings. |
protected List |
selectPrecedingSiblings(Element mainElement,
LinkedList allPrecedingSiblings,
int baseNumber)
Selects the siblings to keep among all preceding siblings. |
String |
toString()
Returns a string representation of this object. |
Methods inherited from class de.fu_berlin.ties.context.Representation |
createRecognitionBuffer, getRecognitionNumber |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
protected static final String AXIS_ANCESTOR
protected static final String AXIS_DESC_OR_SELF
protected static final String AXIS_FOLLOW_SIBLING
protected static final String AXIS_PREC_SIBLING
protected static final String AXIS_PRIOR
protected static final int MAXIMUM_PREFIX_LENGTH
protected static final SequencedHashMap TOKEN_TYPE_PATTERNS
calculateValuesFromText(String, String, List)
to determine the
"tokenType" value. Maps description strings to Pattern
s to
match. The description string for the first matching pattern is used.
Initialized in the static constuctor. The value "mixed" is reserved for
tokens that are not matched by any of the patterns.
Constructor Detail |
public DefaultRepresentation()
public DefaultRepresentation(TiesConfiguration config)
config
- used to configure this instancepublic DefaultRepresentation(int recogNum, int detailedRecogs, int numberOfAncestors, int numberOfSiblings, int splitMax, String headElementName, String headAttribName, String[] defaultAttribs, int n, String outCharset)
DOMUtils.name(Element)
.
recogNum
- the number of preceding recognitions to representdetailedRecogs
- the number of preceding recognitions to represent
in detailnumberOfAncestors
- the maximum number of ancestors to include in
the context representationnumberOfSiblings
- the basic number of preceding and following
siblings to includesplitMax
- the maximum number of subsequences to keep when
a feature value must be split (at whitespace).headElementName
- the name of the element to use for
calculating head values (cf. calculateHeadValues(Element, List)
)headAttribName
- the name of the attribute to use for
calculating head valuesdefaultAttribs
- array of names of default attributesn
- Each n-th context representation is stored if > 0;
otherwise no representation is storedoutCharset
- the output character set to use (only used to
store some configurations for inspection purposes, if n
> 0); if null
, the default charset of the current
platform is usedMethod Detail |
public FeatureVector buildContext(Element element, String leftText, String mainText, String rightText, PriorRecognitions priorRecognitions, Map featureCache) throws ClassCastException, IllegalArgumentException
buildContext
in class Representation
element
- the element whose context should be representedleftText
- textual content to the left of (preceding)
mainText
, might be emptymainText
- the main textual content to represent, might be emptyrightText
- textual content to the right of (following)
mainText
, might be emptypriorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling Representation.createRecognitionBuffer()
featureCache
- a cache of (local) feature, should be re-used
between all calls for the nodes in a single document (but must not be
re-used when building the context of nodes in different documents!)
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
s
IllegalArgumentException
- if the specified node is of an
unsupported typeprotected void buildFeatures(String axisName, Element element, ElementPosition position, boolean recurseInsteadOfText, LinkedList featureList, boolean addAtEnd, Map cache)
featureList
. Handles attributes and calculated
values and the element itself. Child elements are only handled when
recurseInsteadOfText
it true
-- the axis name
is not changed for child elements.
axisName
- the name of the axis, used to start each featureelement
- the element to processposition
- wraps the position of the element within its parent
element and related data, used for calculating positional features;
if null
, no positional features are calculatedrecurseInsteadOfText
- if true
child elements
are recursively processed, but no
values from
text are calculated; otherwise text is processed but child elements are
notfeatureList
- the list of GlobalFeature
s to add features toaddAtEnd
- whether to add the new features at the end of at the
beginning of the featureList
cache
- a cache of local feature, mapping Element
s to
List
s of LocalFeature
protected List buildLocalFeatures(Element element, ElementPosition position, boolean ignoreText)
LocalFeature
s can be stored in a cache for re-use; they must
be combined with an axis name for getting a global feature.
element
- the element to processposition
- wraps the position of the element within its parent
element and related data, used for calculating positional features;
if null
, no positional features are calculatedignoreText
- if true
, no values from text or
head values are
calculated
LocalFeature
sprotected List buildPrior(PriorRecognitions priorRecognitions) throws ClassCastException
priorRecognitions
- a buffer of the last Recognition
s from
the document, created by calling Representation.createRecognitionBuffer()
GlobalFeature
s representing prior recognitions
ClassCastException
- if the priorRecognitions
buffer
contains objects that aren't Recognition
sprotected void buildTextFeatures(String axisName, Element element, String trimmedLeft, String trimmedMain, String trimmedRight, LinkedList featureList)
axisName
- the name of the axis, used to start each featureelement
- the element whose context should be representedtrimmedLeft
- trimmed textual content to the left of (preceding)
trimmedMain
, might be emptytrimmedMain
- trimmed main textual content to represent, might
be emptytrimmedRight
- trimmed textual content to the right of (following)
trimmedMain
, might be emptyfeatureList
- a list of GlobalFeature
s to add the values toprotected void calculateHeadValues(Element element, List values)
getHeadElement()
, the
value of the getHeadAttribute()
or else the textual content of
its right-most child element is stored (in a calculated value named
"head"). Child elements are iterated from right to left unless one
containing an appropriate attribute or textual content is found (or none
are left).
getHeadElement()
, the first and last child element of this type
are recursively processed and the results stored in calculated names
named "lhead" and "rhead". Of there is only a single child of appropriate
type, the "lhead" value is omitted.
If a value contains whitespace, only the final subsequence following all whitespace is preserved.
element
- the element to processvalues
- a list of LocalFeature
s to add the calculated
values toprotected void calculatePositionalValues(String elementName, ElementPosition position, List values)
elementName
- the name of the element to process, as returned by
DOMUtils.name(Element)
position
- wraps the position of the element within its parent
element and related data, must not be null
values
- a list to add the calculated values toprotected void calculateValuesFromText(String elementName, String trimmedText, List values) throws IllegalArgumentException
elementName
- the name of the element to process, as returned by
DOMUtils.name(Element)
trimmedText
- the trimmed textual content of the element to
process, must not be emptyvalues
- a list of LocalFeature
s to add the calculated
values to
IllegalArgumentException
- if the empty string was given as
trimmedText
protected String determineHeadValue(Element element)
getHeadElement()
. See
calculateHeadValues(Element, List)
for a description of the
algorithm.
element
- the element to process, must be of type
getHeadElement()
getHeadAttribute()
nor
textual content)protected String determineRoughPosition(int position, int elementCount)
calculatePositionalValues(String, ElementPosition, List)
to
collapse a position in to one of five values:
position
- the position counted from 0, should be non-negative and
smaller than elementCount
(otherwise the results are
undefined)elementCount
- the number of elements
protected List filterRepresentation(FeatureVector originalRep)
Features representing markers (FeatureType.MARKER
),
stand-alone elements (FeatureType.ELEMENT
) and default attributes
(getDefaultAttributes()
) are included in all filtered
representations. Comment-only features are ignored.
originalRep
- a feature vector containing the representation
to filter
Feature
s
combining the filtered representations created by this methodpublic int getAncestorNumber()
public Set getDefaultAttributes()
DOMUtils.name(Attribute)
.
public int getDetailedRecognitions()
public String getHeadAttribute()
calculateHeadValues(Element, List)
public String getHeadElement()
calculateHeadValues(Element, List)
public int getSiblingNumber()
public int getSplitMaximum()
public int getStoreN()
protected void handleAncestors(Element element, int ancestorsToAdd, int ancestorSiblingsToAdd, LinkedList ancestorFeatures, LinkedList ancestorSiblingFeatures, Bag processedAncestorNames, Map cache) throws IllegalArgumentException
element
- the element to processancestorsToAdd
- the number of ancestors to add, must be > 0;
if > 1, this method calls itself recursively, decreasing the number
by 1ancestorSiblingsToAdd
- the number of preceding/following siblings
of the current ancestors to add; if 0 or negative, no siblings are added;
a recursive call decreases this parameter by 1 if any ancestor siblings
were found (if no siblings were found, the number passes unchanged)ancestorFeatures
- the list of GlobalFeature
s to prepend
the features on the ancestors toancestorSiblingFeatures
- the list of GlobalFeature
s to
append the features on the ancestors siblings toprocessedAncestorNames
- a bag that will typically be empty when
first calling this method (will be filled by recursive calls)cache
- a cache of local feature, mapping Element
s to
List
s of LocalFeature
IllegalArgumentException
- if ancestorsToAdd
is 0 or
negativeprotected ElementPosition handleSiblings(String axisPrefix, Element element, int baseNumber, LinkedList precedingFeatures, LinkedList followingFeatures, Map cache)
axisPrefix
- the prefix of the axis name, used to start each
feature; specify the empty string if no prefix should be usedelement
- the element to processbaseNumber
- the basic number of siblings to keep; the actual
number of siblings kept might varyprecedingFeatures
- the list of GlobalFeature
s to prepend
the features on the preceding siblings tofollowingFeatures
- the list of GlobalFeature
s to append
the features on the following siblings tocache
- a cache of local feature, mapping Element
s to
List
s of LocalFeature
calculatePositionalValues(String, ElementPosition, List)
, or
null
if there is no parent elementprotected void removeExtraMarkers(List features)
GlobalFeature
s to remove extraneous
FeatureType.MARKER
features. Keeps only the last one of several
sequential marker features; trailing marker features are removed as well.
features
- the list of features to modifyprotected List selectFollowingSiblings(Element mainElement, LinkedList allFollowingSiblings, int baseNumber)
baseNumber
first siblings.
One of the selected siblings may have a different type (name as returned
by DOMUtils.name(Element)
) than the main element --
if there are more with different types, they are skipped.
mainElement
- the element whose siblings should be selectedallFollowingSiblings
- the list of all preceding siblingsbaseNumber
- the basic number of siblings to keep; the actual
number of siblings kept might vary
protected List selectPrecedingSiblings(Element mainElement, LinkedList allPrecedingSiblings, int baseNumber)
baseNumber
last siblings. One of the last siblings may
have a different type (name as returned by
DOMUtils.name(Element)
) than the main element --
if there are more with different types, they are skipped.
If none of the baseNumber
last siblings has a different
types, the last siblings with a different type (name) is also kept.
mainElement
- the element whose siblings should be selectedallPrecedingSiblings
- the list of all preceding siblingsbaseNumber
- the basic number of siblings to keep; the actual
number of siblings kept might vary
public String toString()
toString
in class Representation
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |