|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.text.TextTokenizer
Splits a text into a sequence of tokens.
This class is not thread-safe, so if you should want to share a tokenizer between threads you have to ensure adequate synchronization.
Constructor Summary | |
TextTokenizer(String[] patterns,
String whitespacePattern,
CharSequence text)
Creates a new instance. |
Method Summary | |
String |
capturedText()
Returns the text captured within "capturing groups" in the last token. |
String |
getNormalizedWhitespace()
Returns the normalized whitespace representation prepended if isNormalizedWhitespacePrepended() is true . |
boolean |
hasPrecedingWhitespace()
Whether the token returned by the last call to nextToken()
is preceded by whitespace (i.e., text not matched by any token).
|
int |
initialWhitespaceCount(String text)
Convenience method that counts the number of whitespace characters at the begin of a string, according to the defined whitespace pattern. |
boolean |
isNormalizedWhitespacePrepended()
Returns whether whitespace is prepended in a normalized form (@link #getNormalizedWhitespace()}) to those tokens where hasPrecedingWhitespace() would return true .
|
boolean |
isValidWhitespace(String text)
Convenience method that checks whether a string matches the defined whitespace pattern. |
boolean |
isWhitespacePatternEnsured()
Whether whitespace (the text between patterns) is checked to ensure that the defined whitespace pattern is matched. |
CharSequence |
leftText()
Returns the complete text to the left (preceding) the token returned by the last call to nextToken() . |
String |
nextToken()
Returns the next token, or null if there are no
more tokens left in the provided text. |
String |
precedingWhitespace()
Returns the whitespace (i.e., text not matched by any token) preceding the token returned by the last call to nextToken() .
|
boolean |
precedingWhitespaceIsValid()
Checks whether the whitespace (i.e., text not matched by any token) preceding the token returned by the last call to nextToken()
matches the defined whitespace pattern. |
void |
reset()
Resets this tokenizer, so it will restart at the begin of the current text. |
void |
reset(CharSequence newText)
Resets this tokenizer, so it will restart at the begin of the provided text. |
CharSequence |
rightText()
Returns the complete text to the right (following) the token returned by the last call to nextToken() . |
void |
setNormalizedWhitespace(String newValue)
Changes the normalized whitespace representation prepended if isNormalizedWhitespacePrepended() is true . |
void |
setNormalizedWhitespacePrepended(boolean newValue)
Changes whether whitespace is prepended in a normalized form (@link #getNormalizedWhitespace()}) to those tokens where hasPrecedingWhitespace() would return true . |
void |
setWhitespacePatternEnsured(boolean ensured)
Specifies whether whitespace (the text between patterns) is checked to ensure that the defined whitespace pattern is matched. |
String |
toString()
Returns a string representation of this object. |
int |
trailingWhitespaceCount(String text)
Convenience method that counts the number of whitespace characters at the end of a string, according to the defined whitespace pattern. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
public TextTokenizer(String[] patterns, String whitespacePattern, CharSequence text) throws PatternSyntaxException
TokenizerFactory
.
patterns
- a list of patterns to accept as tokens; patterns
jointed and compiled with the Pattern.DOTALL
flag activatedwhitespacePattern
- a pattern that should match all text
between tokens ("whitespace"), to ensure that no text is left out
by mistake; the pattern is compiled with the
Pattern.DOTALL
flag activatedtext
- the text to tokenize
PatternSyntaxException
- if the syntax of the provided
patterns is invalidMethod Detail |
public final String capturedText()
public final String getNormalizedWhitespace()
isNormalizedWhitespacePrepended()
is true
. Defaults to
a space character.
public final boolean hasPrecedingWhitespace() throws IllegalStateException, IllegalArgumentException
nextToken()
is preceded by whitespace (i.e., text not matched by any token).
If we arrived at the end of the text to tokenize (last call to
nextToken()
returned null
), this is the whitespace
between the last existing token and the end of the text.
IllegalStateException
- if this method is called
without a prior call to nextToken()
IllegalArgumentException
- if
isWhitespacePatternEnsured()
is true
and the
whitespace preceding the last read token does not match the defined
whitespace patternpublic int initialWhitespaceCount(String text)
text
- the text to check
public final boolean isNormalizedWhitespacePrepended()
hasPrecedingWhitespace()
would return true
.
Defaults to false
.
public boolean isValidWhitespace(String text)
text
- the text to match
true
iff the given text matches the
defined whitespace pattern or is the empty stringpublic final boolean isWhitespacePatternEnsured()
hasPrecedingWhitespace()
or
precedingWhitespace()
will throw an IllegalArgumentException
if the whitespace preceding the
last read token does not match.
public CharSequence leftText() throws IllegalStateException
nextToken()
. This includes any
precedingWhitespace()
.
IllegalStateException
- if this method is called
without a prior call to nextToken()
public final String nextToken() throws IllegalArgumentException
null
if there are no
more tokens left in the provided text. When the tokenizer arrived
at the end of the text, all subsequent calls to this method
will return null
until you call one of the
reset()
methods. If the token is preceded by whitespace
and isNormalizedWhitespacePrepended()
is true
,
the returned token will start with the normalized whitespace
representation (getNormalizedWhitespace()
).
null
if no tokens are left
IllegalArgumentException
- if
isWhitespacePatternEnsured()
and
isNormalizedWhitespacePrepended()
are true
and the
whitespace preceding this token does not match the defined whitespace
patternpublic final String precedingWhitespace() throws IllegalStateException, IllegalArgumentException
nextToken()
.
If we arrived at the end of the text to tokenize (last call to
nextToken()
returned null
), this is the
whitespace between the last existing token and the end of the text.
hasPrecedingWhitespace()
would return false
)
IllegalStateException
- if this method is called
without a prior call to nextToken()
IllegalArgumentException
- if
isWhitespacePatternEnsured()
is true
and the
whitespace preceding the last read token does not match the defined
whitespace patternpublic boolean precedingWhitespaceIsValid() throws IllegalStateException
nextToken()
matches the defined whitespace pattern. This method is called
automatically if isWhitespacePatternEnsured()
is
true
. Otherwise it can be called externally to check
whether the whitespace is valid and take appropriate action if required.
true
iff the preceding whitespace matches the
specified whitespace pattern or if there is no preceding whitespace
IllegalStateException
- if this method is called
without a prior call to nextToken()
public final void reset()
public final void reset(CharSequence newText)
newText
- the new text to tokenizepublic CharSequence rightText() throws IllegalStateException
nextToken()
. This includes any following
whitespace.
IllegalStateException
- if this method is called
without a prior call to nextToken()
public final void setNormalizedWhitespace(String newValue)
isNormalizedWhitespacePrepended()
is true
.
newValue
- the new valuepublic final void setNormalizedWhitespacePrepended(boolean newValue)
hasPrecedingWhitespace()
would return true
.
newValue
- the new valuepublic final void setWhitespacePatternEnsured(boolean ensured)
hasPrecedingWhitespace()
or
precedingWhitespace()
will throw an IllegalArgumentException
if the whitespace preceding the
last read token does not match.
ensured
- the new value of this propertypublic String toString()
public int trailingWhitespaceCount(String text)
text
- the text to check
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |