de.fu_berlin.ties.xml.dom
Class DOMUtils

java.lang.Object
  extended byde.fu_berlin.ties.xml.dom.DOMUtils

public final class DOMUtils
extends Object

A static class that provides utility constants and methods for working with DOM-like XML representations, focussing especially on dom4j. No instances of this class can be created, only the static members should be used.

Version:
$Revision: 1.6 $, $Date: 2004/02/26 19:32:10 $, $Author: siefkes $
Author:
Christian Siefkes

Method Summary
static Attribute attributeByName(Element element, String name)
          Returns the attribute with the given name, compatible to the name format returned by name(Attribute).
static void collectText(Branch branch, StringBuffer appender)
          Recursively collects the complete textual content of a branch, i.e.
static void collectText(Branch branch, Writer writer)
          Recursively collects the complete textual content of a branch, i.e.
static List elementsByName(Element element, String name)
          Returns the child elements with the given name, compatible to the name format returned by name(Element).
static String name(Attribute attrib)
          Static method that returns a String representing the name of an attribute in an XML document.
static String name(Element element)
          Static method that returns a String representing the name of an element in an XML document.
static Document readDocument(File file, Configuration config)
          Reads an XML document from a local file, using a configured charset.
static Document readDocument(File file, String charset)
          Reads an XML document from a local file, using a given charset.
static Document readDocument(InputStream in)
          Reads an XML document from a given stream.
static Document readDocument(Reader reader)
          Reads an XML document from a given reader.
static String showElement(Element element)
          Builds a simple partial representation of an element, containing the name of the element and its trimmed textual content.
static String showToken(Element element, String token)
          Builds a simple partial representation of a textual token in an element, containing the name of the element and token.
static void writeDocument(Document document, OutputStream out)
          Writes an XML document to a given stream.
static void writeDocument(Document document, OutputStreamWriter writer)
          Writes an XML document to a given writer, using the character set of the underlying output stream.
static void writeDocument(Document document, Writer writer, String charset)
          Writes an XML document to a given writer, using the given character set.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

attributeByName

public static Attribute attributeByName(Element element,
                                        String name)
Returns the attribute with the given name, compatible to the name format returned by name(Attribute). If there are more than one attributes with the given name (e.g. in different namespaces) then the first one is returned.

Parameters:
element - the element whose attribute to return
name - the name of the attribute, compatible to the name format returned by name(Attribute)
Returns:
the (first) matching attribute or null if none exists

collectText

public static void collectText(Branch branch,
                               StringBuffer appender)
Recursively collects the complete textual content of a branch, i.e. a document or element.

Parameters:
branch - the branch to recurse
appender - the collected text of the branch and all its child elements is appended to this string buffer

collectText

public static void collectText(Branch branch,
                               Writer writer)
                        throws IOException
Recursively collects the complete textual content of a branch, i.e. a document or element.

Parameters:
branch - the branch to recurse
writer - the collected text of the branch and all its child elements is appended to this writer; flushed but not closed by this method
Throws:
IOException - if an I/O error occurs while writing to the writer

elementsByName

public static List elementsByName(Element element,
                                  String name)
Returns the child elements with the given name, compatible to the name format returned by name(Element). If no elements are found then this method returns an empty list.

Parameters:
element - the element whose child elements to return
name - the name of the child elements, compatible to the name format returned by name(Attribute)
Returns:
a list of all the child Elements for the given name

name

public static String name(Attribute attrib)
Static method that returns a String representing the name of an attribute in an XML document. This method should always be used when building context representations and related structures to ensure that attributes are represented in a unified way. See name(Element) for details.

Parameters:
attrib - the element to name
Returns:
the name to use for this element

name

public static String name(Element element)
Static method that returns a String representing the name of an element in an XML document. This method should always be used when building context representations and related structures to ensure that elements are represented in a unified way. Please don't call Node.getName() or Element.getQualifiedName() or similar methods directly in such cases.

Currently, only the local name if used, namespace URIs and namespace prefixes are ignored. Including namespace prefixes in context representations would be quite useless, because in different document different prefixes can represent the same namespace and vice versa.

Including namespace URIs might lead to higher precision by avoiding the risk of confusing elements from totally different namespaces. On other other hand it might lead to lower recall and slower learning because elements from similar namespaces (e.g. different versions of the HTML standard) are all considered separated from each other.

Parameters:
element - the element to name
Returns:
the name to use for this element

readDocument

public static Document readDocument(File file,
                                    Configuration config)
                             throws DocumentException,
                                    FileNotFoundException,
                                    UnsupportedEncodingException
Reads an XML document from a local file, using a configured charset. Delegates to IOUtils.openReader(File, Configuration) to determine the character set.

Parameters:
file - the file to read
config - the configuration to use
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing
FileNotFoundException - if the file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading
UnsupportedEncodingException - if the configured charset is not supported

readDocument

public static Document readDocument(File file,
                                    String charset)
                             throws DocumentException,
                                    FileNotFoundException,
                                    UnsupportedEncodingException
Reads an XML document from a local file, using a given charset.

Parameters:
file - the file to read
charset - the character set to use for reading the file; if null, the default charset of the current platform is used
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing
FileNotFoundException - if the file does not exist, is a directory rather than a regular file, or for some other reason cannot be opened for reading
UnsupportedEncodingException - if the named charset is not supported

readDocument

public static Document readDocument(InputStream in)
                             throws DocumentException
Reads an XML document from a given stream.

Parameters:
in - stream containing the text to parse; not closed by this method
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing

readDocument

public static Document readDocument(Reader reader)
                             throws DocumentException
Reads an XML document from a given reader.

Parameters:
reader - reader containing the text to parse; not closed by this method
Returns:
the newly created document
Throws:
DocumentException - if an error occurs during parsing

showElement

public static String showElement(Element element)
Builds a simple partial representation of an element, containing the name of the element and its trimmed textual content. Useful for logging.

Parameters:
element - the element to show
Returns:
a simple partial representation of the element

showToken

public static String showToken(Element element,
                               String token)
Builds a simple partial representation of a textual token in an element, containing the name of the element and token. Useful for logging.

Parameters:
element - the element to show
token - the token to show
Returns:
a simple partial representation of the element

writeDocument

public static void writeDocument(Document document,
                                 OutputStream out)
                          throws IOException
Writes an XML document to a given stream.

Parameters:
document - the document to write
out - the stream to write the document text to; flushed but not closed by this method
Throws:
IOException - if an I/O error occurs during writing

writeDocument

public static void writeDocument(Document document,
                                 OutputStreamWriter writer)
                          throws IOException
Writes an XML document to a given writer, using the character set of the underlying output stream.

Parameters:
document - the document to write
writer - the writer to write the document text to; flushed but not closed by this method
Throws:
IOException - if an I/O error occurs during writing

writeDocument

public static void writeDocument(Document document,
                                 Writer writer,
                                 String charset)
                          throws IOException
Writes an XML document to a given writer, using the given character set.

Parameters:
document - the document to write
writer - the writer to write the document text to; flushed but not closed by this method
charset - the character set of the writer
Throws:
IOException - if an I/O error occurs during writing


Copyright © 2003-2004 Christian Siefkes. All Rights Reserved.