de.fu_berlin.ties.text
Class TextUtils

java.lang.Object
  extended by de.fu_berlin.ties.text.TextUtils

public final class TextUtils
extends Object

A static class that provides utility constants and methods for working with texts and regular expressions. No instances of this class can be created, only the static members should be used.

Version:
$Revision: 1.17 $, $Date: 2006/10/21 16:04:25 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String LINE_SEPARATOR
          The line separator on the current operating system ("\n" on Unix).
static String NEWLINE_ALTERNATIVES
          Regex fragment listing the newline alternatives used by differents systems: "\r\n" (Windows), "\n" (Unix) or "\r" (Mac).
static Pattern NEWLINE_PATTERN
          A regular expression matching a single newline (build by enclosing NEWLINE_ALTERNATIVES in a non-capturing group).
static Pattern NEWLINE_TAB_PATTERN
          A regular expression matching a single newline or a tab character.
static Pattern NEWLINES_PATTERN
          A regular expression matching newlines, including surrounding whitespace.
static Pattern PUNCTUATION_PATTERN
          A simple regular expression for strings that contain only punctuation characters.
static Pattern PUNCTUATION_SYMBOL_PATTERN
          A simple regular expression for strings that contain only punctuation and symbol characters.
static Pattern SINGLE_LINE_WS
          A regular expression matching a non-line-breaking whitespace character (character class containing space and tab).
static Pattern WHITESPACE_PATTERN
          A simple regular expression for whitespace.
 
Method Summary
static String captureAlternatives(String[] alternatives, boolean quote)
          Helper method for building a regular expression Pattern by combining several alternatives in a capturing group.
static int countFirst(String str, char ch)
          Counts how often a character is repeated at the begin of a string.
static int countLast(String str, char ch)
          Counts how often a character is repeated at the end of a string.
static void ensurePrintableName(String string)
          Checks that a string is a printable name, meaning it has at at least one character and does not contain any whitespace.
static String joinAlternatives(String[] alternatives)
          Helper method for building a regular expression Pattern by combining several alternatives.
static String multipleReplaceAll(CharSequence input, Map replacements)
          Performs multiple replace-all operations on a text.
static String normalize(String input)
          Normalizes the whitespace in a string, replacing all internal whitespace sequences with a single space character and trimming any leading and trailing whitespace.
static boolean punctuation(CharSequence text)
          Checks whether a string contains only punctuation characters.
static boolean punctuationOrSymbol(CharSequence text)
          Checks whether a string contains only punctuation and symbol characters.
static String replaceAll(String input, Matcher matcher, String replacement)
          Replaces each substring of the input matched by the given pattern matcher with the given replacement.
static String replaceAll(String input, Pattern pattern, String replacement)
          Replaces each substring of the input that matches the given Pattern with the given replacement.
static String shorten(String input)
          Delegates to shorten(String, int, int), showing up to 24 characters at the start and the end of the shortened string.
static String shorten(String input, int numChars)
          Delegates to shorten(String, int, int), using the same number of characters at the start and the end of the shortened string.
static String shorten(String input, int startChars, int endChars)
          Shortens a string, inserting an ellipsis ("...") in the middle if the string is too long.
static String[] splitLines(CharSequence input)
          Splits a text into an array of lines.
static String[] splitLinesExact(CharSequence input)
          Splits a text into an array of lines, without trimming lines and discarding empty lines.
static String[] splitString(CharSequence input)
          Splits a string around whitespace.
static String[] splitString(CharSequence input, int splitMaximum)
          Splits a string around whitespace.
static String[] splitString(CharSequence input, Pattern whitespacePattern, int splitMaximum)
          Splits a string around whitespace.
static String weaklyNormalize(String input)
          Weakly normalizes the whitespace in a string, by replacing each whitespace element (space, tab, newline) with a space character.
static void writeln(Writer writer, String text)
          Convenience method that writes a text to a writer and appends to line separator.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LINE_SEPARATOR

public static final String LINE_SEPARATOR
The line separator on the current operating system ("\n" on Unix).


NEWLINE_ALTERNATIVES

public static final String NEWLINE_ALTERNATIVES
Regex fragment listing the newline alternatives used by differents systems: "\r\n" (Windows), "\n" (Unix) or "\r" (Mac).

See Also:
Constant Field Values

SINGLE_LINE_WS

public static final Pattern SINGLE_LINE_WS
A regular expression matching a non-line-breaking whitespace character (character class containing space and tab).


NEWLINE_PATTERN

public static final Pattern NEWLINE_PATTERN
A regular expression matching a single newline (build by enclosing NEWLINE_ALTERNATIVES in a non-capturing group).


NEWLINE_TAB_PATTERN

public static final Pattern NEWLINE_TAB_PATTERN
A regular expression matching a single newline or a tab character.


NEWLINES_PATTERN

public static final Pattern NEWLINES_PATTERN
A regular expression matching newlines, including surrounding whitespace. Will match several newlines if they immediately follow each other or are separated by whitespace only.


PUNCTUATION_PATTERN

public static final Pattern PUNCTUATION_PATTERN
A simple regular expression for strings that contain only punctuation characters.


PUNCTUATION_SYMBOL_PATTERN

public static final Pattern PUNCTUATION_SYMBOL_PATTERN
A simple regular expression for strings that contain only punctuation and symbol characters.


WHITESPACE_PATTERN

public static final Pattern WHITESPACE_PATTERN
A simple regular expression for whitespace.

Method Detail

captureAlternatives

public static String captureAlternatives(String[] alternatives,
                                         boolean quote)
Helper method for building a regular expression Pattern by combining several alternatives in a capturing group.

Parameters:
alternatives - the alternatives to combine
quote - whether to quote the alternatives using the Pattern.quote(java.lang.String) method
Returns:
a pattern string containing the joined alternatives within a capturing group

countFirst

public static int countFirst(String str,
                             char ch)
Counts how often a character is repeated at the begin of a string.

Parameters:
str - the string to check
ch - the character to count
Returns:
how often the character is repeated at the begin of the string (0 if the string starts with another character or is empty)

countLast

public static int countLast(String str,
                            char ch)
Counts how often a character is repeated at the end of a string.

Parameters:
str - the string to check
ch - the character to count
Returns:
how often the character is repeated at the end of the string (0 if the string ends with another character or is empty)

ensurePrintableName

public static void ensurePrintableName(String string)
                                throws IllegalArgumentException
Checks that a string is a printable name, meaning it has at at least one character and does not contain any whitespace.

Parameters:
string - the string to check
Throws:
IllegalArgumentException - if the given string is null or empty or contains whitespace

joinAlternatives

public static String joinAlternatives(String[] alternatives)
Helper method for building a regular expression Pattern by combining several alternatives.

Parameters:
alternatives - the alternatives to combine
Returns:
a pattern string containing the joined alternatives; two or more alternatives are combined in a non-capturing group; a single alternative is just returned as is; if the array is empty, an empty string is returned

multipleReplaceAll

public static String multipleReplaceAll(CharSequence input,
                                        Map replacements)
Performs multiple replace-all operations on a text. The replacements are performed in the order of the key-set iterator of the given map.

Parameters:
input - the character sequence to perform the replacements on
replacements - a mapping of regular expression Patterns to replacement Strings
Returns:
the string constructed by performing all replacements

normalize

public static String normalize(String input)
Normalizes the whitespace in a string, replacing all internal whitespace sequences with a single space character and trimming any leading and trailing whitespace.

Parameters:
input - the string to normalize
Returns:
the normalized string

replaceAll

public static String replaceAll(String input,
                                Matcher matcher,
                                String replacement)
Replaces each substring of the input matched by the given pattern matcher with the given replacement. See Matcher.replaceAll(java.lang.String) for details of the replacement process and special characters in the replacement string.

This method only returns a new string if there is at least one match to replace. Otherwise the reference to the input object is returned. Thus you can use the == operator to find out whether replacements have been made, it is not necessary to use String.equals(java.lang.Object). When there is nothing to replace, it might be more efficient than Matcher.replaceAll(java.lang.String) (and certainly than String.replaceAll(java.lang.String, java.lang.String), because (as of JDK 1.4.2) these methods always create and return new objects.

Matchers are stateful and not thread-safe. It is not necessary to Matcher.reset() the matcher prior to calling this method but you should reset it if you want to used it in other matching operations afterwards.

Parameters:
input - the string to process
matcher - a matcher on the pattern
replacement - the replacement string
Returns:
the resulting string; or a reference to the input string if no replacements were made

shorten

public static String shorten(String input,
                             int startChars,
                             int endChars)
Shortens a string, inserting an ellipsis ("...") in the middle if the string is too long. Specifically:

This method is similar to StringUtils.abbreviate(String, int), but the ellipsis is inserted in the middle of the string, not at the end.

Parameters:
input - the input string
startChars - the number of characters to include before the ellipsis
endChars - the number of characters to include after the ellipsis
Returns:
a shortened string, as described above

shorten

public static String shorten(String input,
                             int numChars)
Delegates to shorten(String, int, int), using the same number of characters at the start and the end of the shortened string.

Parameters:
input - the input string
numChars - the number of characters to to use for both startChars and endChars parameter
Returns:
the shortened string

punctuation

public static boolean punctuation(CharSequence text)
Checks whether a string contains only punctuation characters.

Parameters:
text - the test to check
Returns:
true iff the text contains one or more punctuation characters and no other characters

punctuationOrSymbol

public static boolean punctuationOrSymbol(CharSequence text)
Checks whether a string contains only punctuation and symbol characters.

Parameters:
text - the test to check
Returns:
true iff the text contains one or more punctuation or symbol characters and no other characters

shorten

public static String shorten(String input)
Delegates to shorten(String, int, int), showing up to 24 characters at the start and the end of the shortened string.

Parameters:
input - the input string
Returns:
the shortened string

replaceAll

public static String replaceAll(String input,
                                Pattern pattern,
                                String replacement)
Replaces each substring of the input that matches the given Pattern with the given replacement. See Matcher.replaceAll(java.lang.String) for details of the replacement process and special characters in the replacement string.

This method only returns a new string if there is at least one match to replace. Otherwise the reference to the input object is returned. Thus you can use the == operator to find out whether replacements have been made, it is not necessary to use String.equals(java.lang.Object).

This method is thread-safe since pattern objects are stateless. On the other hand, it needs to create a new Matcher object, thus replaceAll(String, Matcher, String) is more efficient for multiple replacements on the same pattern.

Parameters:
input - the string to process
pattern - the regular expression Pattern to replace
replacement - the replacement string
Returns:
the resulting string; or a reference to the input string if no replacements were made

splitLines

public static String[] splitLines(CharSequence input)
Splits a text into an array of lines. Only the textual contents of non-empty lines are retained; empty lines and training and leading whitespace are removed.

Parameters:
input - the text to split
Returns:
an array of the lines contained in the text; each line is trimmed (trailing and leading whitespace is removed) and empty lines are suppressed

splitLinesExact

public static String[] splitLinesExact(CharSequence input)
Splits a text into an array of lines, without trimming lines and discarding empty lines.

Parameters:
input - the text to split
Returns:
an array of the lines contained in the text

splitString

public static String[] splitString(CharSequence input)
Splits a string around whitespace.

Parameters:
input - the string to split
Returns:
an array of strings computed by splitting the input

splitString

public static String[] splitString(CharSequence input,
                                   int splitMaximum)
Splits a string around whitespace. The number of returned subsequences won't be higher than the specified splitMaximum. If splitting results in more subsequences, only the last splitMaximum are kept, while the other ones are discarded. This implementation splits around the WHITESPACE_PATTERN.

Parameters:
input - the string to split
splitMaximum - the maximum number of subsequences to keep; or -1 if all subsequences should be kept
Returns:
an array of strings computed by splitting the input; will contain at least 1 and at most splitMaximum elements

splitString

public static String[] splitString(CharSequence input,
                                   Pattern whitespacePattern,
                                   int splitMaximum)
Splits a string around whitespace. The number of returned subsequences won't be higher than the specified splitMaximum. If splitting results in more subsequences, only the last splitMaximum are kept, while the other ones are discarded.

Parameters:
input - the string to split
whitespacePattern - the pattern around which to split
splitMaximum - the maximum number of subsequences to keep; or -1 if all subsequences should be kept
Returns:
an array of strings computed by splitting the input; will contain at most splitMaximum elements

weaklyNormalize

public static String weaklyNormalize(String input)
Weakly normalizes the whitespace in a string, by replacing each whitespace element (space, tab, newline) with a space character. This is in accordance with the attribute-value normalization required by the XML specification.

Parameters:
input - the string to normalize
Returns:
the weakly normalized string

writeln

public static void writeln(Writer writer,
                           String text)
                    throws IOException
Convenience method that writes a text to a writer and appends to line separator.

Parameters:
writer - the writer to write to
text - the text to write
Throws:
IOException - if an I/O error occurs


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.