Previous | Next | Index | TOC | Top | Top Contents Index Glossary


2. Generating XML from an Arbitrary Data Structure

Link Summary
Local Links
API Links

In the last section, you saw how your knowledge of SAX parsing came in handy when you were handling parser errors. In this section, you'll use it to advantage to simply the process of converting an arbitrary data structure to XML.

Note:
The material in this section is specific to Project X, Sun's reference implementation for the JAXP standard. The material in this section is not part of the standard. Instead, it represents helpful functionality that you may need to take advantage of until some equivalent mechanism is standardized. Because it is not part of the JAXP standard, the functionality described here may very well not exist in other JAXP-standard parsers. In fact, as standards evolve, future versions of the JAXP reference implementation could employ different mechanisms to achieve the same goals.

How It Works

Recall from The Project X Reference Implementation in the Overview part of the tutorial that Sun's reference implementation for the JAXP APIs uses a SAX parser to read in XML data when building a DOM. In this section, you'll see how to take advantage of that fact to convert an existing data set to XML.

In general outline, then, you're going to:

  1. Modify an existing program that reads the data and modify it to generate SAX events. (Whether that is a real parser or simply a data filter of some kind is irrelevant for the moment. We'll keep calling it a "parser", in quotes, just so we're clear that it could be either one.)
  2. With the SAX "parser" in hand, wire it to a document builder to create a DOM.
  3. Use the reference implementation's write method to produce XML.

For starters, of course, we have to assume that you have some program which is capable of reading the data. Assuming you have a data set that you want to convert, however, odds are good that you have some application lying around that can read it. That "parser" is the starting point.

Note:
This is an outline of the procedure you'll need to use. The code has not been tested. Any feedback you can provide will be valuable. Please send it to xml-feedback@eng.sun.com.

Modify The "Parser" to Generate SAX Events

The next step is to modify the "parser" to generate SAX events. Start by extending javax.xml.parsers.SAXParser. See com.sun.xml.parser.SAXParserImpl for an example.

Generating a SAX event means invoking one of the org.xml.sax.DocumentHandler methods. You saw most of these methods described in Echoing an XML File with the SAX Parser and Adding Additional Event Handlers. Here is the minimum set of events the "parser" needs to generate for some DocumentHandler, d::

d.startDocument()
d.endDocument()
d.startElement(String name, AttributeList attrs)
d.endElement(String name)
d.characters(char buf [], int offset, int len)

Note:
Since each of these methods can throw a SAXException, the "parser" will have to be prepared to handle them.

Here are the DocumentHandler events you will most likely want to ignore:

setDocumentLocator(Locator l)
ignorableWhitespace(char buf [], int offset, int len)
processingInstruction(String target, String data) 

The data file won't have processing instructions, so it's easy to see why you would ignore that one. And ignorableWhitespace will generate exactly the same XML as plain old characters call, so that one can be ignored, too. That leaves setLocator.

The setLocator event is only useful for an application that is going to interpret the data in an XML file, identify a filename relative to the current location, and retrieve that file. But the events generated by your "parser" are not going to lead to any such processing -- they are going to a DocumentHandler, which will build a DOM tree using your data. Whether that data happens to name a file it wants to reference is irrelevant -- the data is not going to be interpreted, and the file is not going to be accessed, so the setLocator event is irrelevant, as well.

Implement the org.xml.sax.Parser Interface

Once the "parser" can generate SAX events, it needs to be told where to send them. To do that it, it must implement the org.xml.sax.Parser interface and, at a minimum define the setDocumentHandler() method with a non-null implementation.

Here is a list of the methods in that interface. You may choose to provide null implementations for many of them, or may choose to implement some, like setErrorHandler, in the interests of creating a more robust application.

parse(InputSource source) 
parse(java.lang.String systemId) 
setDocumentHandler(DocumentHandler handler) 
setDTDHandler(DTDHandler handler) 
setEntityResolver(EntityResolver resolver) 
setErrorHandler(ErrorHandler handler) 
setLocale(java.util.Locale locale)

Create a Factory

Extend SAXParserFactory and override the newSAXParser method to return an instance of your "parser". See com.sun.xml.parser.SAXParserFactoryImpl for an example.

Wire Your "Parser" to an XmlDocumentBuilder

Next, use code like that shown below to wire your SAX parser to a document builder, and proceed to "parse" the data. (The highlights show the Sun-specific parts of the code.)

import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.Parser; import com.sun.xml.parser.Resolver;
import javax.xml.parsers.DocumentBuilder; import com.sun.xml.tree.XmlDocumentBuilder; import com.sun.xml.tree.XmlDocument;
SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); Parser parser = saxParser.getParser(); builder = new XmlDocumentBuilder(); builder.setIgnoringLexicalInfo(true); // Skip comments, entity refs, etc. parser.setDocumentHandler(builder); parser.parse(Resolver.createInputSource(new File(argv[0]))); XmlDocument document = builder.getDocument();

In this code, you obtain an instance of your SAX parser factory, use that to get your "parser", and then create an instance of the reference implementation's XmlDocumentBuilder. That class implements the DocumentHandler interface, which lets you connect it to your "parser". (The italicized line is optional. It customizes the XmlDocumentBuilder so that it creates a simplified DOM that doesn't have comments and which has the text of entities included inline, rather than having entity-reference nodes. For more information, see the XmlDocumentBuilder APIs.)

You then invoke the "parser's" parse method, assuming you implemented that, or whatever method fires it off. As it parses the data, it generates SAX events. The XmlDocumentBuilder reacts to those events and builds a DOM in the process. You retrieve that DOM with the getDocument method, specifying the class name (XmlDocument) rather than the general interface (Document) so you can use XmlDocument's output method.

Write It Out

As the last step in your program, you write out the DOM as an XML document using the XmlDocument write method you learned about in the last section.

Run It

Finally, specify the full path to your parser factory on the command line as a system property, using the -D flag, like this:

-Djavax.xml.parsers.SAXParserFactory=fully.qualified.name.of.parserFactory

Now run the app. Congratulations! You have now successfully converted an existing data structure to XML with a bare minimum of effort. Well, ok. It was a lot of effort. But you did it! (And it was a lot easier than it could have been.) Congratulations!


Previous | Next | Index | TOC | Top | Top Contents Index Glossary