de.fu_berlin.ties.xml.dom
Class DOMUtils

java.lang.Object
  extended by de.fu_berlin.ties.xml.dom.DOMUtils

public final class DOMUtils
extends Object

A static class that provides utility constants and methods for working with DOM-like XML representations, focussing especially on dom4j. No instances of this class can be created, only the static members should be used.

Version:
$Revision: 1.28 $, $Date: 2006/10/21 16:04:33 $, $Author: siefkes $
Author:
Christian Siefkes

Method Summary
static Attribute attributeByName(Element element, String name)
          Returns the attribute with the given name, compatible to the name format returned by name(Attribute).
static String collectText(Branch branch)
          Recursively collects the complete textual content of a branch, i.e.
static void collectText(Branch branch, StringBuilder appender)
          Recursively collects the complete textual content of a branch, i.e.
static void collectText(Branch branch, Writer writer)
          Recursively collects the complete textual content of a branch, i.e.
static OutputFormat createDefaultOutFormat()
          Creates the default output format used by this class for storing XML.
static QName defaultName(String localName)
          Converts a local name into a qualfied name in the default namespace.
static void deleteAllAttributes(Element element, boolean recurse)
          Deletes all attributes of an element and optionally of all its descendants.
static List elementsByName(Element element, String name)
          Returns the child elements with the given name, compatible to the name format returned by name(Element).
static String name(Attribute attrib)
          Static method that returns a String representing the name of an attribute in an XML document.
static String name(Element element)
          Static method that returns a String representing the name of an element in an XML document.
static Document readDocument(File file)
          Reads an XML document from a local filet.
static Document readDocument(File file, Configuration config)
          Reads an XML document from a local file, using a configured charset.
static Document readDocument(File file, String charset)
          Reads an XML document from a local file, using a given charset.
static Document readDocument(InputStream in)
          Reads an XML document from a given stream.
static Document readDocument(Reader reader)
          Reads an XML document from a given reader.
static String showElement(Element element)
          Builds a simple partial representation of an element, containing the name of the element and its normalized and shortened textual content.
static String showToken(Element element, String token)
          Builds a simple partial representation of a textual token in an element, containing the name of the element and the normalized and shortened text of the token.
static void writeDocument(Document document, File file, TiesConfiguration config, String suffix)
          Writes an XML document to a file, consulting a given configuration about whether to use compression.
static void writeDocument(Document document, OutputStream out)
          Writes an XML document to a given stream.
static void writeDocument(Document document, OutputStreamWriter writer)
          Writes an XML document to a given writer, using the character set of the underlying output stream.
static void writeDocument(Document document, Writer writer, String charset)
          Writes an XML document to a given writer, using the given character set.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

attributeByName

public static Attribute attributeByName(Element element,
                                        String name)
Returns the attribute with the given name, compatible to the name format returned by name(Attribute). If there are more than one attributes with the given name (e.g. in different namespaces) then the first one is returned.

Parameters:
element - the element whose attribute to return
name - the name of the attribute, compatible to the name format returned by name(Attribute)
Returns:
the (first) matching attribute or null if none exists

collectText

public static String collectText(Branch branch)
Recursively collects the complete textual content of a branch, i.e. a document or element.

Parameters:
branch - the branch to recurse
Returns:
the collected text of the branch and all its child elements

collectText

public static void collectText(Branch branch,
                               StringBuilder appender)
Recursively collects the complete textual content of a branch, i.e. a document or element.

Parameters:
branch - the branch to recurse
appender - the collected text of the branch and all its child elements is appended to this string buffer

collectText

public static void collectText(Branch branch,
                               Writer writer)
                        throws IOException
Recursively collects the complete textual content of a branch, i.e. a document or element.

Parameters:
branch - the branch to recurse
writer - the collected text of the branch and all its child elements is appended to this writer; flushed but not closed by this method
Throws:
IOException - if an I/O error occurs while writing to the writer

createDefaultOutFormat

public static OutputFormat createDefaultOutFormat()
Creates the default output format used by this class for storing XML. This format adds platform-specific newlines after each element, but does not indent and trims (normalizes) all other whitespace.

Returns:
the created output format

defaultName

public static QName defaultName(String localName)
Converts a local name into a qualfied name in the default namespace.

Parameters:
localName - the local the use
Returns:
a qualified name representing the local name in the default namespace; or null if localName is null

deleteAllAttributes

public static void deleteAllAttributes(Element element,
                                       boolean recurse)
Deletes all attributes of an element and optionally of all its descendants.

Parameters:
element - the elements whose attributes should be deleted
recurse - whether to recursively delete the attributes of all direct and indirect child elements as well

elementsByName

public static List elementsByName(Element element,
                                  String name)
Returns the child elements with the given name, compatible to the name format returned by name(Element). If no elements are found then this method returns an empty list.

Parameters:
element - the element whose child elements to return
name - the name of the child elements, compatible to the name format returned by name(Attribute)
Returns:
a list of all the child Elements for the given name

name

public static String name(Attribute attrib)
Static method that returns a String representing the name of an attribute in an XML document. This method should always be used when building context representations and related structures to ensure that attributes are represented in a unified way. See name(Element) for details.

Parameters:
attrib - the element to name
Returns:
the name to use for this element

name

public static String name(Element element)
Static method that returns a String representing the name of an element in an XML document. This method should always be used when building context representations and related structures to ensure that elements are represented in a unified way. Please don't call Node.getName() or Element.getQualifiedName() or similar methods directly in such cases.

Currently, only the local name if used, namespace URIs and namespace prefixes are ignored. Including namespace prefixes in context representations would be quite useless, because in different document different prefixes can represent the same namespace and vice versa.

Including namespace URIs might lead to higher precision by avoiding the risk of confusing elements from totally different namespaces. On other other hand it might lead to lower recall and slower learning because elements from similar namespaces (e.g. different versions of the HTML standard) are all considered separated from each other.

Parameters:
element - the element to name
Returns:
the name to use for this element

readDocument

public static Document readDocument(File file)
                             throws DocumentException,
                                    IOException
Reads an XML document from a local filet. Compressed files are automatically decompressed (cf. IOUtils.openCompressableInStream(InputStream)).

Parameters:
file - the file to read
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing
IOException - if an I/O error occurrs

readDocument

public static Document readDocument(File file,
                                    Configuration config)
                             throws DocumentException,
                                    IOException
Reads an XML document from a local file, using a configured charset. Delegates to IOUtils.openReader(File, Configuration) to determine the character set. Compressed files are automatically decompressed (cf. IOUtils.openCompressableInStream(InputStream)).

Parameters:
file - the file to read
config - the configuration to use
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing
IOException - if an I/O error occurrs

readDocument

public static Document readDocument(File file,
                                    String charset)
                             throws DocumentException,
                                    IOException
Reads an XML document from a local file, using a given charset. Compressed files are automatically decompressed (cf. IOUtils.openCompressableInStream(InputStream))

Parameters:
file - the file to read
charset - the character set to use for reading the file; if null, the default charset of the current platform is used
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing
IOException - if an I/O error occurrs

readDocument

public static Document readDocument(InputStream in)
                             throws DocumentException,
                                    IOException
Reads an XML document from a given stream. Compressed files are automatically decompressed (cf. IOUtils.openCompressableInStream(InputStream))

Parameters:
in - stream containing the text to parse; not closed by this method
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing
IOException - if an I/O error occurrs

readDocument

public static Document readDocument(Reader reader)
                             throws DocumentException
Reads an XML document from a given reader.

Parameters:
reader - reader containing the text to parse; not closed by this method
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing

showElement

public static String showElement(Element element)
Builds a simple partial representation of an element, containing the name of the element and its normalized and shortened textual content. Useful for logging.

Parameters:
element - the element to show (may be null)
Returns:
a simple partial representation of the element

showToken

public static String showToken(Element element,
                               String token)
Builds a simple partial representation of a textual token in an element, containing the name of the element and the normalized and shortened text of the token. Useful for logging.

Parameters:
element - the element containing the token; must not be null
token - the token to show (may be null)
Returns:
a simple representation joining element and token

writeDocument

public static void writeDocument(Document document,
                                 File file,
                                 TiesConfiguration config,
                                 String suffix)
                          throws IOException
Writes an XML document to a file, consulting a given configuration about whether to use compression.

Parameters:
document - the document to write
file - the file to write the document to
config - used to decide whether to use compression
suffix - an optional suffix that allows overwriting the general value of the configuration paramter with a more specified value
Throws:
IOException - if an I/O error occurs while writing

writeDocument

public static void writeDocument(Document document,
                                 OutputStream out)
                          throws IOException
Writes an XML document to a given stream.

Parameters:
document - the document to write
out - the stream to write the document to; flushed but not closed by this method
Throws:
IOException - if an I/O error occurs during writing

writeDocument

public static void writeDocument(Document document,
                                 OutputStreamWriter writer)
                          throws IOException
Writes an XML document to a given writer, using the character set of the underlying output stream.

Parameters:
document - the document to write
writer - the writer to write the document to; flushed but not closed by this method
Throws:
IOException - if an I/O error occurs during writing

writeDocument

public static void writeDocument(Document document,
                                 Writer writer,
                                 String charset)
                          throws IllegalArgumentException,
                                 IOException
Writes an XML document to a given writer, using the given character set.

Parameters:
document - the document to write
writer - the writer to write the document to; flushed but not closed by this method
charset - the character set of the writer; this must be a valid charset name (not null or empty etc.), it should be the canonical (standard) name of the used charset
Throws:
IllegalArgumentException - if the specific charset is null or empty
IOException - if an I/O error occurs during writing


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.