Previous | Next | Index | TOC | Top | Top Contents Index Glossary


6. Using the Validating Parser

Link Summary
Exercise Links

API References

Glossary Terms
DTD, error, XHTML

By now, you have done a lot of experimenting with the nonvalidating parser. It's time to have a look at the validating parser and find out what happens when you use it to parse the sample presentation.

Two things to understand about the validating parser at the outset are:

  1. The DTD is required.
  2. Since the DTD is present, the ignorableWhitespace method is invoked whenever the DTD makes that possible.

Configuring the Factory

The first step is modify the Echo program so that it uses the validating parser instead of the nonvalidating parser.

Note:
The code in this section is contained in Echo10.java.

To use the validating parser, make the changes highlighted below:

public static void main (String argv [])
{
    if (argv.length != 1) {
        ...
    }

// Use the default (non-validating) parser // Use the validating parser SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setValidating(true); try { ...
Here, you configured the factory so that it will produce a validating parser when newSAXParser is invoked. You can also configure it to return a namespace-aware parser using setNamespaceAware(true). The reference implementation supports any combination of configuration options. If the combination of .

Changing the Environment Variable

If no other factory class is specified, the default SAXParserFactory class is used. To use a different manufacturer's parser, you can change the value of the environment variable that points to it. You can do that from the command line, like this:

> java -Djavax.xml.parsers.SAXParserFactory=yourFactoryHere ...

The factory name you specify must be a fully qualified class name (all package prefixes included). For more information, see the documentation for the SAXParserFactory class.

Experimenting with Validation Errors

To see what happens when the XML document does not specify a DTD, remove the DOCTYPE statement from the XML file and run the Echo program on it.

Note:
The output shown here is contained in Echo10-01.log.

The result you see looks like this:

<?xml version='1.0' encoding='UTF-8'?>
** Warning, line 5, uri file: ... Valid documents must have a <!DOCTYPE declaration. ** Parsing error, line 5, uri file: ... Element type "slideshow" is not declared.

So now you know that a DTD is a requirement for a valid document. That makes sense. (Note, though, that the lack of a type declaration only generates a warning, as specified in the standard. On the other hand, any attempt to actually parse the document is immediately greeted with an error! Oh well...)

So what happens when you run the parser on your current version of the slide presentation, with the DTD specified?

Note:
The output shown here is contained in Echo10-07.log.

This time, the parser gives the following error message:

** Parsing error, line 28, uri file:...
   Element "slide" does not allow "item" here.

This error occurs because the definition of the slide element requires a title. That element is not optional, and the copyright slide does not have one. To fix the problem, add the question mark highlighted below to make title an optional element:

<!ELEMENT slide (image?, title?, item*)>

Now what happens when you run the program?

Note:
You could also remove the copyright slide, which produces the same result shown below, as reflected in Echo10-06.log.

The answer is that everything runs fine, until the parser runs into the <em> tag contained in the overview slide. Since that tag was not defined in the DTD, the attempt to validate the document fails. The output looks like this:

...
    ELEMENT: <title>
    CHARS:   Overview
    END_ELM: </title>
    ELEMENT: <item>
    CHARS:   Why ** Parsing error, line 24, uri file:...
Element "item" does not allow "em" -- (#PCDATA|item)
org.xml.sax.SAXParseException: Element "item" does not allow "em" -- (#PCDATA|item)
       at com.sun.xml.parser.Parser.error(Parser.java:2798)
...

The error message identifies the part of the DTD that caused validation to fail. In this case it is the line that defines an item element as (#PCDATA | item).

Exercise: Make a copy of the file and remove all occurrences of <em> from it. Can the file be validated now? (In the next section, you'll learn how to define parameter entries so that we can use XHTML in the elements we are defining as part of the slide presentation.)

Error Handling in the Validating Parser

It is important to recognize that the only reason an exception is thrown when the file fails validation is as a result of the error-handling code you entered in the early stages of this tutorial. That code is reproduced below:

static class MyErrorHandler extends HandlerBase
{
    public void error (SAXParseException e)
    throws SAXParseException
    {
        throw e;
    }
    ...
} 

If that exception is not thrown, the validation errors are simply ignored.

Exercise: Try commenting out the line that throws the exception. What happens when you run the parser now?

In general, a SAX parsing error is a validation error, although we have seen that it can also be generated if the file specifies a version of XML that the parser is not prepared to handle. The thing to remember is that your application will not generate a validation exception unless you supply an error handler like the one above.


Previous | Next | Index | TOC | Top | Top Contents Index Glossary