Echoing an XML File with the SAX Parser

Top Contents Index Glossary

2b. Adding Additional Event Handlers

Link Summary

Local Links

A Quick Introduction to XML

Examples

API References

Locator

Glossary Terms

URL, URN

Besides ignorableWhitespace, there are only two other methods in the DocumentHandler interface: setDocumentLocator and processingInstruction. In this section of the tutorial, you'll implement those two event handlers.

Identifying the Document's Location

A locator is an object that contains the information necessary to find the document. The Locator class encapsulates a system ID (URL) or a public identifier (URN), or both. You would need that information if you wanted to find something relative to the current document -- in the same way, for example, that an HTML browser processes an href="anotherFile" attribute in an anchor tag -- the browser uses the location of the current document to find anotherFile.

You could also use the locator to print out good diagnostic messages. In addition to the document's location and public identifier, the locator contains methods that give the column and line number of the most recently-processed event. The setDocumentLocator method is called only once at the beginning of the parse, though. To get the current line or column number, you would save the locator when setDocumentLocator is invoked and then use it in the other event-handling methods.

Note:
The code discussed in this section is in Echo04.java. Its output is stored at Echo04-01.log.

Add the method below to the Echo program to get the document locator and use it to echo the document's system ID.

    ...
    private String indentString = "    "; // Amount to indent
    private int indentLevel = 0;
  
    public void setDocumentLocator (Locator l)
    {
        try {
          out.write ("LOCATOR");
          out.write ("\n SYS ID: " + l.getSystemId() );
          out.flush ();
        } catch (IOException e) {
            // Ignore errors
        }
    }

    public void startDocument ()
    ...

Notes:

This method, in contrast to every other DocumentHandler method, does not return a SAXException. So, rather than using emit for output, this code writes directly to System.out. (This method is generally expected to simply save the Locator for later use, rather than do the kind of processing that generates an exception, as here.)
The spelling of these methods is "Id", not "ID". So you have getSystemId and getPublicId.

When you compile and run the program on slideSample01.xml, here is the significant part of the output:

LOCATOR
 SYS ID: file:<path>/../samples/slideSample01.xml

START DOCUMENT
<?xml version='1.0' encoding='UTF-8'?>
...

Here, it is apparent that setDocumentLocator is called before startDocument. That can make a difference if you do any initialization in the event handling code.

Handling Processing Instructions

It sometimes makes sense to code application-specific processing instructions in the XML data. In this exercise, you'll add a processing instruction to your slideSample.xml file and then modify the Echo program to display it.

Note:
The code discussed in this section is in Echo05.java. The file it operates on is slideSample02.xml. The output is stored at Echo05-02.log.

As you saw in A Quick Introduction to XML, the format for a processing instruction is <?target data?>, where "target" is the target application that is expected to do the processing, and "data" is the instruction or information for it to process. Add the text highlighted below to add a processing instruction for a mythical slide presentation program that will query the user to find out which slides to display (technical, executive-level, or all):

<slideshow 
    ...
    >
    
    <!-- PROCESSING INSTRUCTION -->
    <?my.presentation.Program: QUERY="exec, tech, all"?>

    <!-- TITLE SLIDE -->

Notes:

The "data" portion of the processing instruction can contain spaces, or may even be null. But there cannot be any space between the initial <? and the target identifier.
The data begins after the first space.
Fully qualifying the target with the complete web-unique package prefix makes sense, so as to preclude any conflict with other programs that might process the same data.
In this case, the target includes a colon (:) after the name of the application. That is probably atypical, but it seemed like a good idea for readability.

Now that you have a processing instruction to work with, add the code highlighted below to the Echo app:

public void characters (char buf [], int offset, int len)
...
}
    
public void processingInstruction (String target, String data)
throws SAXException
{
    nl(); 
    emit ("PROCESS: ");
    emit ("<?"+target+" "+data+"?>");
}

private void emit (String s)
...

When your edits are complete, compile and run the program. The relevant part of the output should look like this:

...
CHARS:   
CHARS:   
PROCESS: <?my.presentation.Program: QUERY="exec, tech, all"?>
CHARS:   
CHARS:   
...

Now that you've had a chance to work with the processing instruction, you can remove that instruction from the XML file. You won't be needing it any more.

Summary

With the minor exception of ignorableWhitespace, you have used all of the methods in the DocumentHandler interface to handle SAX events. You'll see ignorableWhitespace a little later on. Next, though, you'll get deeper insight into how you handle errors in the SAX parsing process.

Top Contents Index Glossary