Home
TOC Index |
Validating with XML Schema
Now that you understand more about namespaces, you're ready to take a deeper look at the process of XML Schema validation. Although a full treatment of XML Schema is beyond the scope of this tutorial, this section will show you the steps you need to take to validate an XML document using an XML Schema definition. (You can also examine the sample programs that are part of the JAXP download. They use a simple XML Schema definition to validate personnel data stored in an XML file.)
Note: There are multiple schema-definition languages, including RELAX NG, Schematron, and the W3C "XML Schema" standard. (Even a DTD qualifies as a "schema", although it is the only one that does not use XML syntax to describe schema constraints.) However, "XML Schema" presents us with a terminology challenge. While the phrase "XML Schema schema" would be precise, we'll use the phrase "XML Schema definition" to avoid the appearance of redundancy.
At the end of this section, you'll also learn how to use an XML Schema definition to validate a document that contains elements from multiple namespaces.
Overview of the Validation Process
To be notified of validation errors in an XML document,
- The factory must configured, and the appropriate error handler set.
- The document must be associated with at least one schema, and possibly more.
Configuring the DocumentBuilder Factory
It's helpful to start by defining the constants you'll use when configuring the factory. (These are same constants you define when using XML Schema for SAX parsing.)
static final StringJAXP_SCHEMA_LANGUAGE
= "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; static final StringW3C_XML_SCHEMA
= "http://www.w3.org/2001/XMLSchema";Next, you need to configure
DocumentBuilderFactory
to generate a namespace-aware, validating parser that uses XML Schema:... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance()factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { // Happens if the parser does not support JAXP 1.2 ... }
Since JAXP-compliant parsers are not namespace-aware by default, it is necessary to set the property for schema validation to work. You also set a factory attribute specify the parser language to use. (For SAX parsing, on the other hand, you set a property on the parser generated by the factory.)
Associating a Document with a Schema
Now that the program is ready to validate with an XML Schema definition, it is only necessary to ensure that the XML document is associated with (at least) one. There are two ways to do that:
- With a schema declaration in the XML document.
- By specifying the schema(s) to use in the application.
Note: When the application specifies the schema(s) to use, it overrides any schema declarations in the document.
To specify the schema definition in the document, you would create XML like this:
<documentRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation
='YourSchemaDefinition
.xsd' > ...The first attribute defines the XML NameSpace (xmlns) prefix, "xsi", where "xsi" stands for "XML Schema Instance". The second line specifies the schema to use for elements in the document that do not have a namespace prefix -- that is, for the elements you typically define in any simple, uncomplicated XML document. (You'll see how to deal with multiple namespaces in the next section.)
To can also specify the schema file in the application, like this:
static final String schemaSource = "YourSchemaDefinition
.xsd"; static final StringJAXP_SCHEMA_SOURCE
= "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ...factory.setAttribute
(JAXP_SCHEMA_SOURCE
, new File(schemaSource));Here, too, there are mechanisms at your disposal that will let you specify multiple schemas. We'll take a look at those next.
Validating with Multiple Namespaces
Namespaces let you combine elements that serve different purposes in the same document, without having to worry about overlapping names.
Note: The material discussed in this section also applies to validating when using the SAX parser. You're seeing it here, because at this point you've learned enough about namespaces for the discussion to make sense.
To contrive an example, consider an XML data set that keeps track of personnel data. The data set may include information from the w2 tax form, as well as information from the employee's hiring form, with both elements named
<form>
in their respective schemas.If a prefix is defined for the "tax" namespace, and another prefix defined for the "hiring" namespace, then the personnel data could include segments like this:
<employee id="..."> <name>....</name><tax:form>
...w2 tax form data...</tax:form>
<hiring:form>
...employment history, etc....</hiring:form>
</employee>The contents of the
tax:form
element would obviously be different from the contents of thehiring:form
, and would have to be validated differently.Note, too, that there is a "default" namespace in this example, that the unqualified element names
employee
andname
belong to. For the document to be properly validated, the schema for that namespace must be declared, as well as the schemas for thetax
andhiring
namespaces.
Note: The "default" namespace is actually a specific namespace. It is defined as the "namespace that has no name". So you can't simply use one namespace as your default this week, and another namespace as the default later on. This "unnamed namespace" or "null namespace" is like the number zero. It doesn't have any value, to speak of (no name), but it is still precisely defined. So a namespace that does have a name can never be used as the "default" namespace.
When parsed, each element in the data set will be validated against the appropriate schema, as long as those schemas have been declared. Again, the schemas can either be declared as part of the XML data set, or in the program. (It is also possible to mix the declarations. In general, though, it is a good idea to keep all of the declarations together in one place.)
Declaring the Schemas in the XML Data Set
To declare the schemas to use for the example above in the data set, the XML code would look something like this:
<documentRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation
="employeeDatabase
.xsd"xsi:schemaLocation
= "http://www.irs.gov/fullpath
/w2TaxForm.xsd http://www.ourcompany.com/relpath
/hiringForm.xsd"xmlns:tax
="http://www.irs.gov/"xmlns:hiring
="http://www.ourcompany.com/" > ...The
noNamespaceSchemaLocation
declaration is something you've seen before, as are the last two entries, which define the namespace prefixestax
andhiring
. What's new is the entry in the middle, which defines the locations of the schemas to use for each namespace referenced in the document.The
xsi:schemaLocation
declaration consists of entry pairs, where the first entry in each pair is a fully qualified URI that specifies the namespace, and the second entry contains a full path or a relative path to the schema definition. (In general, fully qualified paths are recommended. That way, only one copy of the schema will tend to exist.)Of particular note is the fact that the namespace prefixes cannot be used when defining the schema locations. The
xsi:schemaLocation
declaration only understands namespace names, not prefixes.Declaring the Schemas in the Application
To declare the equivalent schemas in the application, the code would look something like this:
static final StringemployeeSchema
= "employeeDatabase.xsd"; static final StringtaxSchema
= "w2TaxForm.xsd"; static final StringhiringSchema
= "hiringForm.xsd"; static final String[]schemas = { employeeSchema, taxSchema, hiringSchema, }
; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ...factory.setAttribute
(JAXP_SCHEMA_SOURCE,schemas
);Here, the array of strings that points to the schema definitions (
.xsd
files) is passed as the argument to factory.setAttribute method. Note the differences from when you were declaring the schemas to use as part of the XML data set:
- There is no special declaration for the "default" (unnamed) schema.
- You don't specify the namespace name. Instead, you only give pointers to the
.xsd
files.To make the namespace assignments, the parser reads the .xsd files, and finds in them the name of the target namespace they apply to. Since the files are specified with URIs, the parser can use an EntityResolver (if one has been defined) to find a local copy of the schema.
If the schema definition does not define a target namespace, then it applies to the "default" (unnamed, or null) namespace. So, in the example above, you would expect to see these target namespace declarations in the schemas:
- employeeDatabase.xsd -- none
- w2TaxForm.xsd -- http://www.irs.gov/
- hiringForm.xsd -- http://www.ourcompany.com
At this point, you have seen two possible values for the schema source property when invoking the
factory.setAttribute()
method, a File object infactory.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource))
. and an array of strings infactory.setAttribute(JAXP_SCHEMA_SOURCE, schemas)
. Here is a complete list of the possible values for that argument:
- String that points to the URI of the schema
- InputStream with the contents of the schema
- SAX InputSource
- File
- an array of Objects, each of which is one of the types defined above.
Note: An array of Objects can be used only when the schema language (likehttp://java.sun.com/xml/jaxp/properties/schemaLanguage
) has the ability to assemble a schema at runtime. Also: When an array of Objects is passed it is illegal to have two schemas that share the same namespace.
Home
TOC Index |
This tutorial contains information on the 1.0 version of the Java Web Services Developer Pack.
All of the material in The Java Web Services Tutorial is copyright-protected and may not be published in other works without express written permission from Sun Microsystems.