Reading Data into a DOM

Top Contents Index Glossary

1. Generating a DOM from XML Data

Link Summary

Local Links

Exercise Links

API Links

External Links

Level 1 DOM specification

Glossary Terms

DOM, namespace, SAX, URI, validating parser

In this section of the tutorial, you'll construct a Document Object Model (DOM) by reading in an existing XML file. Then you'll write it out as XML to verify that the program is working.

Reading an XML Document into a DOM

The Document Object Model (DOM) provides APIs that let you create nodes, modify them, delete and rearrange them. So it is relatively easy to create a DOM, as you'll see in later in section 5 of this tutorial, Creating and Manipulating a DOM.

However, the Level 1 DOM specification is silent on the subject of how to input and output. It tells you how a DOM has to operate, but does not cover methods for for reading or writing XML. As a result, you can't create a DOM from an existing XML file without going outside the DOM Level 1 specification.

The JAXP DocumentBuilder interface standardizes the solution to that problem by specifying a variety of parse methods that take either a File object, an input stream, a SAX InputSource object, or a URI. When you invoke one of those methods, a DocumentBuilder implementation returns an org.w3c.dom.Document object.

Note:
To output the DOM, you'll utilize a feature of the reference implementation. Parsers from different manufactures may well use use mechanisms to achieve that goal.

Create the Skeleton

Now that you've had a quick overview of how to create a DOM, let's build a simple program to read an XML document into a DOM then write it back out again.

Note:
The code discussed in this section is in DomEcho01.java. The files it operates on are slideSample01.xml and slideSample10.xml. The processing output is in DomEcho01-01.log and DomEcho01-10.log.

Start with a normal basic logic for an app, and check to make sure that an argument has been supplied on the command line:

public class DomEcho {

    public static void main (String argv [])
    {
        if (argv.length != 1) {
            System.err.println ("Usage: java DomEcho filename");
            System.exit (1);
        }
    }// main

}// DomEcho

Import the Required Classes

In this section, you're going to see all the classes individually named. That's so you can see where each class comes from when you want to reference the API documentation. In your own apps, you may well want to replace import statements like those below with the shorter form: javax.xml.parsers.*.

Add these lines to import the JAXP APIs you'll be using:

import javax.xml.parsers.DocumentBuilderFactory;  
import javax.xml.parsers.FactoryConfigurationError;  
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilder;

Add these lines for the exceptions that can be thrown when the XML document is parsed:

import org.xml.sax.SAXException;  
import org.xml.sax.SAXParseException;

Add these lines to read the sample XML file and identify errors:

import java.io.File;
import java.io.IOException;

Finally, import the W3C definition for a DOM and DOM exceptions:

import org.w3c.dom.Document;
import org.w3c.dom.DOMException;
Note:
DOMExceptions are only thrown when traversing or manipulating a DOM. Errors that occur during parsing are reporting using a different mechanism that is covered below.

Declare the DOM

The org.w3c.dom.Document class is the W3C name for a Document Object Model (DOM). Whether you parse an XML document or create one, a Document instance will result. We'll want to reference that object from another method later on in the tutorial, so define it is a global object here:

public class DomEcho
{    
    static Document document;

    public static void main (String argv [])
    {

It needs to be static, because you're going to you generate it's contents from the main method in a few minutes.

Handle Errors

Next, put in the error handling logic. This is the same logic you saw in Handling Errors in the SAX tutorial, so we won't go into it in detail here. The only point worth noting is that a JAXP-conformant document builder is required to report SAX exceptions when it has trouble parsing the XML document. The DOM parser does not have to actually use a SAX parser internally, but since the SAX standard was already there, it seemed to make sense to use it for reporting errors. As a result, the error-handling code for DOM and SAX applications is pretty much identical.

public static void main (String argv [])
{
    if (argv.length != 1) {
        ...
    }

    try {

    } catch (SAXParseException spe) {
       // Error generated by the parser
       System.out.println ("\n** Parsing error" 
          + ", line " + spe.getLineNumber ()
          + ", uri " + spe.getSystemId ());
       System.out.println("   " + spe.getMessage() );

       // Use the contained exception, if any
       Exception  x = spe;
       if (spe.getException() != null)
           x = spe.getException();
       x.printStackTrace();

    } catch (SAXException sxe) {
       // Error generated by this application
       // (or a parser-initialization error)
       Exception  x = sxe;
       if (sxe.getException() != null)
           x = sxe.getException();
       x.printStackTrace();

    } catch (ParserConfigurationException pce) {
       // Parser with specified options can't be built
       pce.printStackTrace();

    } catch (IOException ioe) {
       // I/O error
       ioe.printStackTrace();
    }

}// main

Instantiate the Factory

Next, add the code highlighted below to obtain an instance of a factory that can give us a document builder:

public static void main (String argv [])
{
    if (argv.length != 1) {
        ...
    }
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    try {

Get a Parser and Parse the File

Now, add the code highlighted below to get a instance of a builder, and use it to parse the specified file:

try {
   DocumentBuilder builder = factory.newDocumentBuilder();
   document = builder.parse( new File(argv[0]) );

} catch (SAXParseException spe) {

Write out the XML

At this point, the code has the ability to read in and parse an XML document. To write out the document for inspection, you'll need to step outside the DOM level 1 standard that makes up the DOM section of JAXP. DOM write operations are not specified until DOM level 3. To get by in the meantime, you'll cast the Document object returned by DocumentBuilder to the real object that the reference implementation returns: XmlDocument.

Note:
This material is specific to Project X, Sun's reference implementation for the JAXP standard. The material in this section is not part of the standard. Instead, it represents helpful functionality that you may need to take advantage of until some equivalent mechanism is standardized. Because it is not part of the JAXP standard, the functionality described here may very well not exist in other JAXP-standard parsers. In fact, as standards evolve, future versions of the JAXP reference implementation could employ different mechanisms to achieve the same goals.

Use XmlDocument

Start by adding the import statement that defines the class:

import org.w3c.dom.Document;
import org.w3c.dom.DOMException;

import com.sun.xml.tree.XmlDocument;

public class DomEcho
{

(The com.sun. prefix on that class is your tipoff to the fact that you are moving outside the JAXP standard, and making use of a feature in Sun's reference implementation.)

Next, add the code highlighted below to cast the document object to XmlDocument and write it out:

try {
  DocumentBuilder builder = factory.newDocumentBuilder();
  document = builder.parse( new File(argv[0]) ); 
           
  XmlDocument xdoc = (XmlDocument) document;
  xdoc.write (System.out);

} catch (SAXParseException spe) {

Run the Program

Throughout most of the DOM tutorial, you'll be using the sample slideshows you created in the SAX section. In particular, you'll use slideSample01.xml, a simple XML file without nothing much in it, and slideSample10.xml, a more complex example that includes a DTD, processing instructions, entity references, and a CDATA section.

For instructions on how to compile and run your program, see Compiling the Program and Running the Program, from the SAX tutorial. Substitute "DomEcho" for "Echo" as the name of the program, and you're ready to roll. When you run the program on slideSample01.xml, this is the output you see:

<?xml version="1.0" encoding="UTF-8"?>

<!--  A SAMPLE set of slides  -->
<slideshow title="Sample Slide Show" date="Date of publication" author="Yours Truly">

    
  <!-- TITLE SLIDE -->
    
  <slide type="all">
      
    <title>Wake up to WonderWidgets!</title>
    
  </slide>

    
  <!-- OVERVIEW -->
    
  <slide type="all">
      
    <title>Overview</title>
      
    <item>Why 
      <em>WonderWidgets</em> are great
    </item>
      
    <item />
      
    <item>Who 
      <em>buys</em> WonderWidgets
    </item>
    
  </slide>


</slideshow>

When you run the program on slideSample10.xml, the result is pretty similar. In particular, note that the entity reference stayed as it was originally written (it was not replace with the entity text):

<item>
 &copyright;
</item>

Also, notice that the CDATA section has been preserved:

      
    <item>
      <![CDATA[Diagram:
                         
    frobmorten <------------ fuznaten
        |            <3>        ^
        | <1>                   |   <1> = fozzle
        V                       |   <2> = framboze    
      staten--------------------+   <3> = frenzle
                     <2>
      ]]>
    </item>

Additional Information

Now that you have successfully read in a DOM, there are one or two more things you need to know in order to use DocumentBuilder effectively. Namely, you need to know about:

Configuring the Factory
Handling Validation Errors

Configuring the Factory

By default, the factory returns a nonvalidating parser that knows nothing about namespaces. To get a validating parser, and/or one that understands namespaces, you configure the factory to set either or both of those options using the command(s) highlighted below:

public static void main (String argv [])
{
    if (argv.length != 1) {
        ...
    }
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    factory.setNamespaceAware(true);
    try {
        ...
Note:
JAXP-conformant parsers are not required to support all combinations of those options, even though the reference parser does. If you specify an invalid combination of options, the factory generates a ParserConfigurationException when you attempt to obtain a parser instance.

You'll be learning more about how to use namespaces in the last section of the DOM tutorial, Using Namespaces. To complete this section, though, you'll want to learn something about...

Handling Validation Errors

Remember when you were wading through the SAX tutorial, and all you really wanted to do was construct a DOM? Well, here's when that information begins to pay off.

Recall that the default response to a validation error, as dictated by the SAX standard, is to do nothing. The JAXP standard requires throwing SAX exceptions, so you exactly the same error handling mechanisms as you used for a SAX app. In particular, you need to use the DocumentBuilder's setErrorHandler method to supply it with an object that implements the SAX ErrorHandler interface.

Note:
DocumentBuilder also has a setEntityResolver method you can use

The code below uses an anonymous inner class adapter to provide that ErrorHandler. The highlighted code is the part that makes sure validation errors generate an exception.

builder.setErrorHandler(
  new org.xml.sax.ErrorHandler() {
      // ignore fatal errors (an exception is guaranteed)
      public void fatalError(SAXParseException exception)
      throws SAXException {
      }

      // treat validation errors as fatal
      public void error (SAXParseException e)
      throws SAXParseException
      {
        throw e;
      }

      // dump warnings too
      public void warning (SAXParseException err)
      throws SAXParseException
      {
        System.out.println ("** Warning"
           + ", line " + err.getLineNumber ()
           + ", uri " + err.getSystemId ());
        System.out.println("   " + err.getMessage ());
      }
  }
);

This code uses an anonymous inner class to generate an instance of an object that implements the ErrorHandler interface. Since it has no class name, it's "anonymous". You can think of it as an "ErrorHandler" instance, although technically it's a no-name instance that implements the specified interface. The code is substantially the same as that described the Handling Errors section of the SAX tutorial. For a more background on validation issues, refer to Using the Validating Parser in that part of the tutorial.

Note:
Inner classes are supported in version 1.2 and later versions of the Java Platform. If you are coding for version 1.1, create an external class that implements ErrorHandler as shown above, and use that.

Looking Ahead

At this point, you have successfully parsed an XML document and written it out. To do anything useful with the DOM, though, you will to need to know more about it's structure. For example, how do the entity references and CDATA sections appear in the DOM?

Another interesting question is: How can you convert an existing data structure into an XML document? You'll get answers to those questions in the sections ahead.

In the next section, you'll take a quick look at a mechanism you can use in the JAXP reference implementation (Project X) to convert an arbitrary data structure into XML (assuming that you already have a program capable of reading that structure). In the section after that, you'll display the DOM in a JTree so you can begin to understand its internal structure.

Top Contents Index Glossary