Previous | Next | Index | TOC | Top | Top Contents Index Glossary


8. Using a LexicalEventListener

Link Summary
Local Links
Exercise Links
API Links
Glossary Terms
DOM

You saw earlier that if you are writing text out as XML, you need to know if you are in a CDATA section. If you are, then angle brackets (<) and ampersands (&) should be output unchanged. But if you're not in a CDATA section, they should be replaced by the predefined entities &lt; and &amp;. But how do you know if you're processing a CDATA section?

Then again, if you are filtering XML in some way, you would want to pass comments along. Normally the parser ignores comments. How can you get comments so that you can echo them?

Finally, there are the parsed entity definitions. If an XML-filtering app sees &myEntity; it needs to echo the same string -- not the text that is inserted in its place. How do you go about doing that?

This section of the tutorial answers those questions. It shows you how to use com.sun.xml.parser.LexicalEventListener to identify comments, CDATA sections, and references to parsed entities.

Note:
This material is specific to Project X, Sun's reference implementation for the JAXP standard. The material in this section is not part of the standard. Instead, it represents helpful functionality that you may need to take advantage of until some equivalent mechanism is standardized. Because it is not part of the JAXP standard, the functionality described here may very well not exist in other JAXP-standard parsers. In fact, as standards evolve, future versions of the JAXP reference implementation could employ different mechanisms to achieve the same goals.

Comments, CDATA tags, and references to parsed entities constitute lexical information -- that is, information that concerns the text of the XML itself, rather than the XML's information content. Most applications, of course, are concerned only with the content of an XML document. Such apps will not use the LexicalEventListener API. But apps that output XML text will find it invaluable.

Note:
The LexicalEventListener API is likely to be part of the SAX 2.0 specification.

How the LexicalEventListener Works

To be informed when the SAX parser sees lexical information, you configure the parser with a LexicalEventListener rather than a DocumentHandler. (For an overview of these two APIs, see An Overview of the Java XML APIs.) The LexicalEventListener interface extends the DocumentHandler interface to add:

Working with a LexicalEventListener

In the remainder of this section, you'll convert the Echo app into a lexical event listener and play with its features.

Note:
The code shown in this section is in Echo11.java. The output is shown in Echo11-09.log.

To start, add the code highlighted below to implement the LexicalEventListener interface and add the appropriate methods.

import com.sun.xml.parser.LexicalEventListener;

public class Echo extends HandlerBase
   implements LexicalEventListener
{ 
    ...

    public void processingInstruction (String target, String data)
      ...
    }
     
    public void comment(String text)
              throws SAXException
    {
    }

    public void startCDATA()
              throws SAXException
    {
    }

    public void endCDATA()
              throws SAXException
    {
    }

    public void startParsedEntity(String name)
              throws SAXException
    {
    }

    public void endParsedEntity(String name,
                                boolean included)
              throws SAXException
    {
    }

  private void emit (String s)
    ...

Those are the only changes you need to make to turn the Echo class into a lexical event listener. The parser checks the class type, and knows that the "document handler" you specify with:

parser.setDocumentHandler ( new Echo() );

is really the extended class, LexicalEventListener.

Echoing Comments

The next step is to do something with one of the new methods. Add the code highlighted below to echo comments in the XML file:

    public void comment(String text)
              throws SAXException
    {
       nl(); emit ("COMMENT: "+text);
    }

When you compile the Echo program and run it on your XML file, the result looks something like this:

COMMENT:  A SAMPLE set of slides  
COMMENT:  
    DTD for a simple "slide show".

COMMENT:  Defines the %inline; declaration 
COMMENT:  ...

The line endings in the comments are passed as part of the comment string, once again normalized to newlines (\n). You can also see that comments in the DTD are echoed along with comments from the file. (That can pose problems when you want to echo only comments that are in the data file. To get around that problem, you use the startDTD and endDTD methods in the DtdEventListener interface.)

Echoing Other Lexical Information

To finish up this section, you'll exercise the remaining LexicalEventHandler methods.

Note:
The code shown in this section is in Echo12.java. The file it operates on is slideSample10.xml. The results of processing are in Echo12-10.log.

Make the changes highlighted below to remove the comment echo (you don't need that any more) and echo the other events:

public void comment(String text)
     throws SAXException
{

  nl(); emit ("COMMENT: "+text);
}

public void startCDATA()
      throws SAXException
{
  nl(); emit ("START CDATA SECTION");
}

public void endCDATA()
      throws SAXException
{
  nl(); emit ("END CDATA SECTION");
}

public void startParsedEntity(String name)
               throws SAXException
{
  nl(); emit ("START PARSED ENTITY: "+name);
}

public void endParsedEntity(String name,
                    boolean included)
             throws SAXException
{
  nl(); emit ("END PARSED ENTITY: "+name);
  emit (", INCLUDED="+included);
}

Here is what happens when the internally defined products entity is processed with the latest version of the program:

ELEMENT: <slide-title>
CHARS:   Wake up to 
START PARSED ENTITY: products
CHARS:   WonderWidgets
END PARSED ENTITY: products, INCLUDED=true
CHARS:   !
END_ELM: </slide-title> 
And here is the result of processing the external copyright entity:
            START PARSED ENTITY: copyright
            CHARS:   
This is the standard copyright message ...
            END PARSED ENTITY: copyright, INCLUDED=true
Finally, you get output like this for the CDATA section:
START CDATA SECTION
CHARS:   Diagram:
         
frobmorten <------------ fuznaten
   |            <3>        ^
   | <1>                   |   <1> = fozzle
   V                       |   <2> = framboze    
staten --------------------+   <3> = frenzle
            <2>

END CDATA SECTION
In summary, the LexicalEventListener gives you the event-notifications you need to produce an accurate reflection of the original XML text.

Previous | Next | Index | TOC | Top | Top Contents Index Glossary