The JavaTM Web Services Tutorial
Home
TOC
Index
PREV TOP NEXT
Divider

Using the Validating Parser

By now, you have done a lot of experimenting with the nonvalidating parser. It's time to have a look at the validating parser and find out what happens when you use it to parse the sample presentation.

Two things to understand about the validating parser at the outset are:

Configuring the Factory

The first step is modify the Echo program so that it uses the validating parser instead of the nonvalidating parser.


Note: The code in this section is contained in Echo10.java.

To use the validating parser, make the changes highlighted below:

public static void main(String argv[])
{
  if (argv.length != 1) {
    ...
  }
  // Use the default (non-validating) parser
  // Use the validating parser
  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.setValidating(true);
  try {
    ...
 

Here, you configured the factory so that it will produce a validating parser when newSAXParser is invoked. You can also configure it to return a namespace-aware parser using setNamespaceAware(true). The reference implementation supports any combination of configuration options. (If a combination is not supported by any particular implementation, it is required to generate a factory configuration error.)

Validating with XML Schema

Although a full treatment of XML Schema is beyond the scope of this tutorial, this section will show you the steps you need to take to validate an XML document using an existing schema written in the XML Schema language. (You can also examine the sample programs that are part of the JAXP download. They use a simple XML Schema definition to validate personnel data stored in an XML file.)


Note: There are multiple schema-definition languages, including RELAX NG, Schematron, and the W3C "XML Schema" standard. (Even a DTD qualifies as a "schema", although it is the only one that does not use XML syntax to describe schema constraints.) However, "XML Schema" presents us with a terminology challenge. While the phrase "XML Schema schema" would be precise, we'll use the phrase "XML Schema definition" to avoid the appearance of redundancy.

To be notified of validation errors in an XML document, the parser factory must be configured to create a validating parser, as shown in the previous section. In addition,

  1. The appropriate properties must be set on the SAX parser.
  2. The appropriate error handler must be set.
  3. The document must be associated with a schema.

Setting the SAX Parser Properties

It's helpful to start by defining the constants you'll use when setting the properties:

static final String JAXP_SCHEMA_LANGUAGE =
    "http://java.sun.com/xml/jaxp/properties/schemaLanguage";

static final String W3C_XML_SCHEMA =
    "http://www.w3.org/2001/XMLSchema";
 

Next, you need to configure the parser factory to generate a parser that is namespace-aware parser, as well as validating:

...
  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.setNamespaceAware(true);
  factory.setValidating(true);
 

You'll learn more about namespaces in Using Namespaces. For now, understand that schema validation is a namespace-oriented process. Since JAXP-compliant parsers are not namespace-aware by default, it is necessary to set the property for schema validation to work.

The last step is to configure the parser to tell it which schema language to use. Here, you will use the constants you defined earlier to specify the W3C's XML Schema language:

saxParser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
 

In the process, however, there is an extra error to handle. You'll take a look at that error next.

Setting up the Appropriate Error Handling

In addition to the error handling you've already learned about, there is one error that can occur when you are configuring the parser for schema-based validation. If the parser is not 1.2 compliant, and therefore does not support XML Schema, it could throw a SAXNotRecognizedException.

To handle that case, you wrap the setProperty() statement in a try/catch block, as shown in the code highlighted below.

...
SAXParser saxParser = factory.newSAXParser();
try {
  saxParser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
} 
catch (SAXNotRecognizedException x) {
  // Happens if the parser does not support JAXP 1.2
  ...
}
...
 

Associating a Document with A Schema

Now that the program is ready to validate the data using an XML Schema definition, it is only necessary to ensure that the XML document is associated with one. There are two ways to do that:


Note: When the application specifies the schema to use, it overrides any schema declaration in the document.

To specify the schema definition in the document, you would create XML like this:

<documentRoot
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation='YourSchemaDefinition.xsd'
>
  ...
 

The first attribute defines the XML NameSpace (xmlns) prefix, "xsi", where "xsi" stands for "XML Schema Instance". The second line specifies the schema to use for elements in the document that do not have a namespace prefix -- that is, for the elements you typically define in any simple, uncomplicated XML document.


Note: You'll be learning about namespaces in Using Namespaces. For now, think of these attributes as the "magic incantation" you use to validate a simple XML file that doesn't use them. Once you've learned more about namespaces, you'll see how to use XML Schema to validate complex documents that use them. Those ideas are discussed in Validating with Multiple Namespaces.

You can also specify the schema file in the application, using code like this:

static final String JAXP_SCHEMA_SOURCE =
    "http://java.sun.com/xml/jaxp/properties/schemaSource";

...
SAXParser saxParser = spf.newSAXParser();
...
saxParser.setProperty(JAXP_SCHEMA_SOURCE,
    new File(schemaSource));
 

Now that you know how to make use of an XML Schema definition, we'll turn our attention to the kinds of errors you can see when the application is validating its incoming data. To that, you'll use a Document Type Definition (DTD) as you experiment with validation.

Experimenting with Validation Errors

To see what happens when the XML document does not specify a DTD, remove the DOCTYPE statement from the XML file and run the Echo program on it.


Note: The output shown here is contained in Echo10-01.txt. (The browsable version is Echo10-01.html.)

The result you see looks like this:

<?xml version='1.0' encoding='UTF-8'?>
** Parsing error, line 9, uri .../slideSample01.xml
  Document root element "slideshow", must match DOCTYPE root 
"null"
 

Note: The message above was generated by the JAXP 1.2 libraries. If you are using a different parser, the error message is likely to be somewhat different.

This message says that the root element of the document must match the element specified in the DOCTYPE declaration. That declaration specifies the document's DTD. Since you don't have one yet, it's value is "null". In other words, the message is saying that you are trying to validate the document, but no DTD has been declared, because no DOCTYPE declaration is present.

So now you know that a DTD is a requirement for a valid document. That makes sense. What happens when you run the parser on your current version of the slide presentation, with the DTD specified?


Note: The output shown here, produced from slideSample07.xml is contained in Echo10-07.txt. (The browsable version is Echo10-07.html.)

This time, the parser gives a different error message:

  ** Parsing error, line 29, uri file:...
  The content of element type "slide" must match 
"(image?,title,item*)
 

Note: The message above was generated by the JAXP 1.2 libraries. If you are using a different parser, the error message is likely to be somewhat different.

This message says that the element found at line 29 (<item>) does not match the definition of the <slide> element in the DTD. The error occurs because the definition says that the slide element requires a title. That element is not optional, and the copyright slide does not have one. To fix the problem, add the question mark highlighted below to make title an optional element:

<!ELEMENT slide (image?, title?, item*)>
 

Now what happens when you run the program?


Note: You could also remove the copyright slide, which produces the same result shown below, as reflected in Echo10-06.txt. (The browsable version is Echo10-06.html.)

The answer is that everything runs fine until the parser runs into the <em> tag contained in the overview slide. Since that tag was not defined in the DTD, the attempt to validate the document fails. The output looks like this:

  ...
  ELEMENT: <title>
  CHARS:   Overview
  END_ELM: </title>
  ELEMENT: <item>
  CHARS:   Why ** Parsing error, line 28, uri: ...
Element "em" must be declared.
org.xml.sax.SAXParseException: ...
...
 

Note: The message above was generated by the JAXP 1.2 libraries. If you are using a different parser, the error message is likely to be somewhat different.

The error message identifies the part of the DTD that caused validation to fail. In this case it is the line that defines an item element as (#PCDATA | item).

Exercise: Make a copy of the file and remove all occurrences of <em> from it. Can the file be validated now? (In the next section, you'll learn how to define parameter entries so that we can use XHTML in the elements we are defining as part of the slide presentation.)

Error Handling in the Validating Parser

It is important to recognize that the only reason an exception is thrown when the file fails validation is as a result of the error-handling code you entered in the early stages of this tutorial. That code is reproduced below:

public void error(SAXParseException e)
throws SAXParseException
{
  throw e;
}
 

If that exception is not thrown, the validation errors are simply ignored.

Exercise: Try commenting out the line that throws the exception. What happens when you run the parser now?

In general, a SAX parsing error is a validation error, although we have seen that it can also be generated if the file specifies a version of XML that the parser is not prepared to handle. The thing to remember is that your application will not generate a validation exception unless you supply an error handler like the one above.

Divider
Home
TOC
Index
PREV TOP NEXT
Divider

This tutorial contains information on the 1.0 version of the Java Web Services Developer Pack.

All of the material in The Java Web Services Tutorial is copyright-protected and may not be published in other works without express written permission from Sun Microsystems.