de.fu_berlin.ties.text
Class TextUtils

java.lang.Object
  extended byde.fu_berlin.ties.text.TextUtils

public final class TextUtils
extends Object

A static class that provides utility constants and methods for working with texts and regular expressions. No instances of this class can be created, only the static members should be used.

Version:
$Revision: 1.5 $, $Date: 2004/03/10 18:00:40 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String LINE_SEPARATOR
          The line separator on the current operating system ("\n" on Unix).
static String NEWLINE_ALTERNATIVES
          Regex fragment listing the newline alternatives used by differents systems: "\r\n" (Windows), "\n" (Unix) or "\r" (Mac).
static Pattern NEWLINE_PATTERN
          A regular expression matching a single newlines (build by enclosing NEWLINE_ALTERNATIVES in a non-capturing group).
static Pattern NEWLINES_PATTERN
          A regular expression matching newlines, including surrounding whitespace.
static Pattern SINGLE_LINE_WS
          A regular expression matching a non-line-breaking whitespace character (character class containing space and tab).
static Pattern WHITESPACE_PATTERN
          A simple regular expression for whitespace.
 
Method Summary
static int countFirst(String str, char ch)
          Counts how often a character is repeated at the begin of a string.
static int countLast(String str, char ch)
          Counts how often a character is repeated at the end of a string.
static String joinAlternatives(String[] alternatives)
          Helper method for building a regular expression Pattern by combining several alternatives.
static String multipleReplaceAll(CharSequence input, Map replacements)
          Performs multiple replace-all operations on a text.
static String normalize(String input)
          Normalizes the whitespace in a string, replacing all internal whitespace sequences with a single space character and trimming any leading and trailing whitespace.
static String replaceAll(String input, Matcher matcher, String replacement)
          Replaces each substring of the input matched by the given pattern matcher with the given replacement.
static String replaceAll(String input, Pattern pattern, String replacement)
          Replaces each substring of the input that matches the given Pattern with the given replacement.
static String[] splitLines(CharSequence input)
          Splits a text into an array of lines.
static String[] splitLinesExact(CharSequence input)
          Splits a text into an array of lines, without trimming lines and discarding empty lines.
static String[] splitString(String input, int splitMaximum)
          Splits a string around whitespace.
static String[] splitString(String input, Pattern whitespacePattern, int splitMaximum)
          Splits a string around whitespace.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LINE_SEPARATOR

public static final String LINE_SEPARATOR
The line separator on the current operating system ("\n" on Unix).


NEWLINE_ALTERNATIVES

public static final String NEWLINE_ALTERNATIVES
Regex fragment listing the newline alternatives used by differents systems: "\r\n" (Windows), "\n" (Unix) or "\r" (Mac).

See Also:
Constant Field Values

SINGLE_LINE_WS

public static final Pattern SINGLE_LINE_WS
A regular expression matching a non-line-breaking whitespace character (character class containing space and tab).


NEWLINE_PATTERN

public static final Pattern NEWLINE_PATTERN
A regular expression matching a single newlines (build by enclosing NEWLINE_ALTERNATIVES in a non-capturing group).


NEWLINES_PATTERN

public static final Pattern NEWLINES_PATTERN
A regular expression matching newlines, including surrounding whitespace. Will match several newlines if they immediately follow each other or are separated by whitespace only.


WHITESPACE_PATTERN

public static final Pattern WHITESPACE_PATTERN
A simple regular expression for whitespace.

Method Detail

countFirst

public static int countFirst(String str,
                             char ch)
Counts how often a character is repeated at the begin of a string.

Parameters:
str - the string to check
ch - the character to count
Returns:
how often the character is repeated at the begin of the string (0 if the string starts with another character or is empty)

countLast

public static int countLast(String str,
                            char ch)
Counts how often a character is repeated at the end of a string.

Parameters:
str - the string to check
ch - the character to count
Returns:
how often the character is repeated at the end of the string (0 if the string ends with another character or is empty)

joinAlternatives

public static String joinAlternatives(String[] alternatives)
Helper method for building a regular expression Pattern by combining several alternatives.

Parameters:
alternatives - the alternatives to combine
Returns:
a pattern string containing the joined alternatives; two or more alternatives are combined in a non-capturing group; a single alternative is just returned as is; if the array is empty, an empty string is returned

multipleReplaceAll

public static String multipleReplaceAll(CharSequence input,
                                        Map replacements)
Performs multiple replace-all operations on a text. The replacements are performed in the order of the key-set iterator of the given map.

Parameters:
input - the character sequence to perform the replacements on
replacements - a mapping of regular expression Patterns to replacement Strings
Returns:
the string constructed by performing all replacements

normalize

public static String normalize(String input)
Normalizes the whitespace in a string, replacing all internal whitespace sequences with a single space character and trimming any leading and trailing whitespace.

Parameters:
input - the string to normalize
Returns:
the normalized string

replaceAll

public static String replaceAll(String input,
                                Matcher matcher,
                                String replacement)
Replaces each substring of the input matched by the given pattern matcher with the given replacement. See Matcher.replaceAll(java.lang.String) for details of the replacement process and special characters in the replacement string.

This method only returns a new string if there is at least one match to replace. Otherwise the reference to the input object is returned. Thus you can use the == operator to find out whether replacements have been made, it is not necessary to use String.equals(java.lang.Object). When there is nothing to replace, it might be more efficient than Matcher.replaceAll(java.lang.String) (and certainly than String.replaceAll(java.lang.String, java.lang.String), because (as of JDK 1.4.2) these methods always create and return new objects.

Matchers are stateful and not thread-safe. It is not necessary to Matcher.reset() the matcher prior to calling this method but you should reset it if you want to used it in other matching operations afterwards.

Parameters:
input - the string to process
matcher - a matcher on the pattern
replacement - the replacement string
Returns:
the resulting string; or a reference to the input string if no replacements were made

replaceAll

public static String replaceAll(String input,
                                Pattern pattern,
                                String replacement)
Replaces each substring of the input that matches the given Pattern with the given replacement. See Matcher.replaceAll(java.lang.String) for details of the replacement process and special characters in the replacement string.

This method only returns a new string if there is at least one match to replace. Otherwise the reference to the input object is returned. Thus you can use the == operator to find out whether replacements have been made, it is not necessary to use String.equals(java.lang.Object).

This method is thread-safe since pattern objects are stateless. On the other hand, it needs to create a new Matcher object, thus replaceAll(String, Matcher, String) is more efficient for multiple replacements on the same pattern.

Parameters:
input - the string to process
pattern - the regular expression Pattern to replace
replacement - the replacement string
Returns:
the resulting string; or a reference to the input string if no replacements were made

splitLines

public static String[] splitLines(CharSequence input)
Splits a text into an array of lines. Only the textual contents of non-empty lines are retained; empty lines and training and leading whitespace are removed.

Parameters:
input - the text to split
Returns:
an array of the lines contained in the text; each line is trimmed (trailing and leading whitespace is removed) and empty lines are suppressed

splitLinesExact

public static String[] splitLinesExact(CharSequence input)
Splits a text into an array of lines, without trimming lines and discarding empty lines.

Parameters:
input - the text to split
Returns:
an array of the lines contained in the text

splitString

public static String[] splitString(String input,
                                   int splitMaximum)
Splits a string around whitespace. The number of returned subsequences won't be higher than the specified splitMaximum. If splitting results in more subsequences, only the last splitMaximum are kept, while the other ones are discarded. This implementation splits around the WHITESPACE_PATTERN.

Parameters:
input - the string to split
splitMaximum - the maximum number of subsequences to keep; or -1 if all subsequences should be kept
Returns:
an array of strings computed by splitting the input; will contain at least 1 and at most splitMaximum elements

splitString

public static String[] splitString(String input,
                                   Pattern whitespacePattern,
                                   int splitMaximum)
Splits a string around whitespace. The number of returned subsequences won't be higher than the specified splitMaximum. If splitting results in more subsequences, only the last splitMaximum are kept, while the other ones are discarded.

Parameters:
input - the string to split
whitespacePattern - the pattern around which to split
splitMaximum - the maximum number of subsequences to keep; or -1 if all subsequences should be kept
Returns:
an array of strings computed by splitting the input; will contain at most splitMaximum elements


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.