Home
TOC Index |
Examining the Structure of a DOM
In this section, you'll use the GUI-fied
DomEcho
application you created in the last section to visually examine a DOM. You'll see what nodes make up the DOM, and how they are arranged. With the understanding you acquire, you'll be well prepared to construct and modify Document Object Model structures in the future.Displaying A Simple Tree
We'll start out by displaying a simple file, so you get an idea of basic DOM structure. Then we'll look at the structure that results when you include some of the more advanced XML elements.
Note: The code used to create the figures in this section is inDomEcho02.java
. The file displayed isslideSample01.xml
. (The browsable version isslideSample01-xml.html
.)
Figure 1 shows the tree you see when you run the DomEcho program on the first XML file you created in the DOM tutorial.
Figure 1 Document, Comment, and Element Nodes Displayed
Recall that the first bit of text displayed for each node is the element
type
. After that comes the elementname
, if any, and then the elementvalue
. This view shows three element types:Document
,Comment
, andElement
. There is onlyDocument
type for the whole tree--that is the root node. TheComment
node displays thevalue
attribute, while theElement
node displays the elementname
, "slideshow".Compare Figure 1 with the code in the
AdapterNode
'stoString
method to see whether the name or value is being displayed for a particular node. If you need to make it more clear, modify the program to indicate which property is being displayed (for example, with N: name, V: value).Expanding the slideshow element brings up the display shown in Figure 2.
Figure 2 Element Node Expanded, No Attribute Nodes Showing
Here, you can see the
Text
nodes andComment
nodes that are interspersed between Slide elements. The emptyText
nodes exist because there is no DTD to tell the parser that no text exists. (Generally, the vast majority of nodes in a DOM tree will beElement
andText
nodes.)Text nodes exist under element nodes in a DOM, and data is always stored in text nodes. Perhaps the most common error in DOM processing is to navigate to an element node and expect it to contain the data that is stored in that element. Not so! Even the simplest element node has a text node under it. For example, given
<size>12</size>
, there is an element node (size
), and a text node under it which contains the actual data (12
).Notably absent from this picture are the
Attribute
nodes. An inspection of the table inorg.w3c.dom.Node
shows that there is indeed an Attribute node type. But they are not included as children in the DOM hierarchy. They are instead obtained via the Node interfacegetAttributes
method.
Note: The display of the text nodes is the reason for including the lines below in the AdapterNode'stoString
method. If your remove them, you'll see the funny characters (typically square blocks) that are generated by the newline characters that are in the text.
String t = domNode.getNodeValue().trim(); int x = t.indexOf("); if (x >= 0) t = t.substring(0, x); s += t;Displaying a More Complex Tree
Here, you'll display the example XML file you created at the end of the SAX tutorial, to see how entity references, processing instructions, and CDATA sections appear in the DOM.
Note: The file displayed in this section isslideSample10.xml
. TheslideSample10.xml
file referencesslideshow3.dtd
which, in turn, referencescopyright.xml
and a (very simplistic)xhtml.dtd
.
(The browsable versions areslideSample10-xml.html
,slideshow3-dtd.html
,
copyright-xml.html
, andxhtml-dtd.html
.)
Figure 3 shows the result of running the
DomEcho
application onslideSample10.xml
, which includes aDOCTYPE
entry that identifies the document's DTD.Figure 3 DocType Node Displayed
The
DocType
interface is actually an extension ofw3c.org.dom.Node
. It defines agetEntities
method that you would use to obtainEntity
nodes--the nodes that define entities like theproduct
entity, which has the value "WonderWidgets". LikeAttribute
nodes,Entity
nodes do not appear as children of DOM nodes.When you expand the
slideshow
node, you get the display shown in Figure 4.Figure 4 Processing Instruction Node Displayed
Here, the processing instruction node is highlighted, showing that those nodes do appear in the tree. The
name
property contains the target-specification, which identifies the application that the instruction is directed to. Thevalue
property contains the text of the instruction.Note that empty text nodes are also shown here, even though the DTD specifies that a
slideshow
can containslide
elements only, never text. Logically, then, you might think that these nodes would not appear. (When this file was run through the SAX parser, those elements generatedignorableWhitespace
events, rather thancharacter
events.)Moving down to the second
slide
element and opening theitem
element under it brings up the display shown in Figure 5.Figure 5 JAXP 1.2 DOM -- Item Text Returned from an Entity Reference
Here, you can see that a text node containing the copyright text was inserted into the DOM, rather than the entity reference which pointed to it.
For most applications, the insertion of the text is exactly what you want. That way, when you're looking for the text under a node, you don't have to worry about an entity references it might contain.
For other applications, though, you may need the ability to reconstruct the original XML. For example, an editor application would need to save the result of user modifications without throwing away entity references in the process.
Various
DocumentBuilderFactory
APIs give you control over the kind of DOM structure that is created. For example, add the highlighted line below to produce the DOM structure shown in Figure 6.public static void main(String argv[]) { ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();factory.setExpandEntityReferences(true);
...Figure 6 JAXP 1.1 in 1.4 Platform -- Entity Reference Node Displayed
Here, the Entity Reference node is highlighted. Note that the entity reference contains multiple nodes under it. This example shows only comment and a text nodes, but the entity could conceivably contain other element nodes, as well.
Finally, moving down to the last
item
element under the lastslide
brings up the display shown in Figure 7.Here, the
CDATA
node is highlighted. Note that there are no nodes under it. Since aCDATA
section is entirely uninterpreted, all of its contents are contained in the node'svalue
property.Summary of Lexical Controls
Lexical information is the information you need to reconstruct the original syntax of an XML document. As we discussed earlier, preserving lexical information is important for editing applications, where you want to save a document that is an accurate reflection of the original -- complete with comments, entity references, and any CDATA sections it may have included at the outset.
A majority of applications, however, are only concerned with the content of the XML structures. They can afford to ignore comments, and they don't care whether data was coded in a CDATA section, as plain text, or whether it included an entity reference. For such applications, a minimum of lexical information is desirable, because it simplifies the number and kind of DOM nodes that the application has to be prepared to examine.
The following
DocumentBuilderFactory
methods give you control over the lexical information you see in the DOM:
setCoalescing()
- To convert CDATA nodes to Text node and append to an adjacent Text node (if any).
setExpandEntityReferences()
- To expand entity reference nodes.
setIgnoringComments()
- To ignore comments.
setIgnoringElementContentWhitespace()
- To ignore ignorable whitespace in element content.
The default values for all of these properties is
false
. Table 2 shows the settings you need to preserve all the lexical information necessary to reconstruct the original document, in its original form. It also shows the settings that construct the simplest possible DOM, so the application can focus on the data's semantic content, without having to worry about lexical syntax details.
Finishing Up
At this point, you have seen most of the nodes you will ever encounter in a DOM tree. There are one or two more that we'll mention in the next section, but you now know what you need to know to create or modify a DOM structure. In the next section, you'll see how to convert a DOM into a
JTree
that is suitable for an interactive GUI. Or, if you prefer, you can skip ahead to the 5th section of the DOM tutorial, Creating and Manipulating a DOM, where you'll learn how to create a DOM from scratch.
Home
TOC Index |
This tutorial contains information on the 1.0 version of the Java Web Services Developer Pack.
All of the material in The Java Web Services Tutorial is copyright-protected and may not be published in other works without express written permission from Sun Microsystems.