Examining DOM Structure

Top Contents Index Glossary

3b. Examining the Structure of a DOM

Link Summary

Exercise Links

API Links

org.w3c.dom.Node


Glossary Terms

DTD

In this section, you'll use the GUI-fied DomEcho app you created in the last section to visually examine a DOM. You'll see what nodes make up the DOM, and how they are arranged. With the understanding you acquire, you'll be well prepared to construct and modify Document Object Model structures in the future.

Displaying A Simple Tree

We'll start out by displaying a simple file, so you get an idea of basic DOM structure. Then we'll look at the structure that results when you include some of the more advanced XML elements.

Note:
The code used to create the figures in this section is in DomEcho02.java. The file displayed is slideSample01.xml.

Figure 1 shows the tree you see when you run the DomEcho program on the first XML file you created in the DOM tutorial.

Figure 1: Document, Comment, and Element Nodes Displayed

Recall that the first bit of text displayed for each node is the element type. After that comes the element name, if any, and then the element value. This view shows three element types: Document, Comment, and Element. There is only Document type for the whole tree -- that is the root node. The Comment node displays the value attribute, while the Element node displays the element name, "slideshow".

Note:
The different node types, their properties, and the methods used to access them are documented in the org.w3c.dom.Node interface. Compare that table with the code in the AdapterNode's toString method to see whether the name or value is being displayed for a particular node. If you need to make it more clear, modify the program to indicate which property is being displayed (for example, with N: name, V: value).

Expanding the slideshow element brings up the display shown in Figure 2.

Figure 2: Element Node Expanded, No Attribute Nodes Showing

Here, you can see the Text nodes and Comment nodes that are interspersed between Slide elements. The empty Text nodes exist because there is no DTD to tell the parser that no text exists. (Generally, the vast majority of nodes in a DOM tree will be Element and Text nodes.)

Notably absent from this picture are the Attribute nodes. An inspection of the table in org.w3c.dom.Node shows that there is indeed an Attribute node type. But they are not included as children in the DOM hierarchy. They are instead obtained via the Node interface getAttributes method.

Note:
The display of the text nodes is the reason for including the lines below in the AdapterNode's toString method. If your remove them, you'll see the funny characters (typically square blocks) that are generated by the newline characters that are in the text.
String t = domNode.getNodeValue().trim();
int x = t.indexOf("\n");
if (x >= 0) t = t.substring(0, x);
s += t; 

Displaying a More Complex Tree

Here, you'll display the example XML file you created at the end of the SAX tutorial, to see how entity references, processing instructions, and CDATA sections appear in the DOM.

Note:
The file displayed in this section is slideSample10.xml.

Figure 3 shows the result of running the DomEcho app on slideSample10.xml, which includes a DOCTYPE entry that identifies the document's DTD.

Figure 3: DocType Node Displayed

The DocType interface is actually an extension of w3c.org.dom.Node. It defines a getEntities method that you would use to to obtain Entity nodes -- the nodes that define entities like the product entity, which has the value "WonderWidgets". Like Attribute nodes, Entity nodes do not appear as children of DOM nodes.

When you expand the slideshow node, you get the display shown in Figure 4.

Figure 4: Processing Instruction Node Displayed

Here, the processing instruction node is highlighted, showing that those nodes do appear in the tree. The name property contains the target-specification, which identifies the app that the instruction is directed to. The value property contains the text of the instruction..

Note that empty text nodes are also shown here, even though the DTD specifies that a slideshow can contain slide elements only, never text. Logically, then, you might think that these nodes would not appear. (When this file was run through the SAX parser, those elements generated ignorableWhitespace events, rather than character events.)

The empty text elements are included because by default, DocumentBuilder creats a DOM that includes all the lexical information necessary to reconstruct the original document, in it's original form. That includes comment nodes as well as text nodes. There is as yet no standard mechanism for eliminating such lexical information in the DOM so you are left with the logical structure.

Note:
The reference implemenation's XmlDocumentBuilder class defines the setIgnoringLexicalInformation method for this purpose. For more information on using that method, see the Wire Your "Parser" to an XmlDocumentBuilder in the Generating XML from an Arbitrary Data Structure section of the DOM tutorial.

Moving down to the second slide element and opening the item element under it brings up the display shown in Figure 5.

Figure 5: Entity Reference Node Displayed

Here, the Entity Reference node is highlighted. Note that the entity reference contains multiple nodes under it. This example shows only comment and a text nodes, but the entity could conceivable contain other element nodes, as well.

Moving down to the last item element under the last slide brings up the display shown in Figure 6.

Figure 6: CDATA Node Displayed

Here, the CDATA node is highlighted. Note that there are no nodes under it. Since a CDATA section is entirely uninterpreted, all of its contents are contained in the node's value property.

Finishing Up

At this point, you have seen most of the nodes you will ever encounter in a DOM tree. There are one or two more that we'll mention in the next section, but you now know what you need to know to create or modify a DOM structure. In the next section, you'll see how to convert a DOM into a JTree that is suitable for an interactive GUI. Or, if you prefer, you can skip ahead to the 5th section of the DOM tutorial, Creating and Manipulating a DOM, where you'll learn how to create a DOM from scratch.

Top Contents Index Glossary