Previous | Next | Index | TOC | Top | Top Contents Index Glossary


2. XML and Related Specs: Digesting the Alphabet Soup

Link Summary
Local Links

External Links

Glossary Terms

DTD, entity, prolog

Now that you have a basic understanding of XML, it makes sense to get a high-level overview of the various XML-related acronyms and what they mean. There is a lot of work going on around XML, so there is a lot to learn.

The current APIs for accessing XML documents either serially or in random access mode are, respectively, SAX and DOM. The specifications for ensuring the validity of XML documents are DTD (the original mechanism, defined as part of the XML specification) and various schema proposals (newer mechanisms that use XML syntax to do the job of describing validation criteria). Other future standards that are nearing completion include the XSL standard -- a mechanism for setting up translations of XML documents (for example to HTML or other XML) and for dictating how the document is rendered. Another effort nearing completion is the XML Link Language specification (XLL), which enables links between XML documents.

Those are the major initiatives you will want to be familiar with. This section also surveys a number of other interesting proposals, including the HTML-lookalike standard, XHTML, and the meta-standard for describing the information an XML document contains, RDF. It also covers the XML Namespaces initiative that promotes modular reuse of XML documents by avoiding naming collisions.

Several of the XML schema proposals are covered here as well, along with a quick survey of the standards efforts that are using XML for remote control of desktops (DMTF) and document servers (WebDAV).

Finally, there are a number of interesting standards and standards-proposals that build on XML, including Synchronized Multimedia Integration Language (SMIL), Mathematical Markup Language (MathML), Scalable Vector Graphics (SVG), and DrawML.

The remainder of this section gives you a more detailed description of these initiatives. To help keep things straight, it's divided into:

Skim the terms once, so you know what's here, and keep a copy of this document handy so you can refer to it whenever you see one of these terms in something you're reading. Pretty soon, you'll have them all committed to memory, and you'll be at least "conversant" with XML!

W3C Recommendations

W3C "recommendations" are, in reality, the final form of specifications generated by the W3C. It's a "recommendation" because they are not imposing it on anyone, but it's not like the specification is open for further discussion and review. The case is closed. This is the spec you implement in order to conform to the standard.

SAX
Simple API for XML

This API was actually a product of collaboration on the XML-DEV mailing list, rather than a product of the W3C. It's included here because it has the same "final" characteristics as a W3C recommendation.

You can also think of this standard as the "serial access" protocol for XML. This is the fast-to-execute mechanism you would use to read and write XML data in a server, for example. This is also called an event-driven protocol, because the technique is to register your handler with a SAX parser, after which the parser invokes your callback methods whenever it sees a new XML tag (or encounters an error, or wants to tell you anything else).

For more information on the SAX protocol, see Serial Access with the Simple API for XML.

DOM
Document Object Model

The Document Object Model protocol converts an XML document into a collection of objects in your program. You can then manipulate the object model in any way that makes sense. This mechanism is also known as the "random access" protocol, because you can visit any part of the data at any time. You can then modify the data, remove it, or insert new data. For more information on the DOM specification, see Manipulating Document Contents with the Document Object Model.

DTD
Document Type Definition

The DTD specification is actually part of the XML specification, rather than a separate entity. On the other hand, it is optional -- you can write an XML document without it. And there are a number of schema proposals that offer more flexible alternatives. So it is treated here as though it were a separate specification.

A DTD specifies the kinds of tags that can be included in your XML document, and the valid arrangements of those tags. You can use the DTD to make sure you don't create an invalid XML structure. You can also use it to make sure that the XML structure you are reading (or that got sent over the net) is indeed valid.

Unfortunately, it is difficult to specify a DTD for a complex document in such a way that it prevents all invalid combinations and allows all the valid ones. So constructing a DTD is something of an art. The DTD can exist at the front of the document, as part of the prolog. It can also exist as a separate entity, or it can be split between the document prolog and one or more additional entities.

However, while the DTD mechanism was the first method defined for specifying valid document structure, it was not the last. Several newer schema specifications have been devised. You'll learn about those momentarily.

For more information, see Defining a Document Type.

RDF
Resource Description Framework

RDF is a proposed standard for defining data about data. Used in conjunction with the XHTML specification, for example, or with HTML pages, RDF could be used to describe the content of the pages. For example, if your browser stored your ID information as FIRSTNAME, LASTNAME, and EMAIL, an RDF description could make it possible to transfer data to an application that wanted NAME and EMAILADDRESS. Just think! Some day you may not need to type your name and address at every web site you visit!

For the latest information on RDF, see http://www.w3.org/TR/PR-rdf-syntax/.

Namespaces

The namespace standard lets you write an XML document that uses two or more sets of XML tags in modular fashion. Suppose for example that you created an XML-based parts list that uses XML descriptions of parts supplied by other manufacturers (online!). The "price" data supplied by the subcomponents would be amounts you want to total up, while the "price" data for the structure as a whole would be something you want to display. The namespace specification defines mechanisms for qualifying the names so as to eliminate ambiguity. That lets you write programs that use information from other sources and do the right things with it.

The latest information on namespaces can be found at http://www.w3.org/TR/REC-xml-names.

W3C Proposed Recommendations

A W3C "proposed recommendation" is a not-quite-final-but-probably-really-close proposal for a W3C recommendation. It is still open for review, and may see some change if the harsh light of reality forces it. But a lot of thought has been given to the proposal by many gifted people, so it's a pretty good bet that a standard in this category will go forward without much change.

RDF Schema

The RDF Schema proposal allows the specification of consistency rules and additional information that describe how the statements in a Resource Description Framework (RDF) should be interpreted.

For more information on the RDF Schema recommendation, see http://www.w3.org/TR/PR-rdf-schema.

W3C Working Drafts

A W3C working draft is a reasonable first cut at what the standard will eventually be. It makes sense conceptually, and is ready for people to begin implementation. The feedback that is developed from the efforts to actually put the standard into practice is likely to cause some change to the internal details, but not to the overall outline of the specification.

XSL
Extensible Stylesheet Language

The XML standard specifies how to identify data, not how to display it. HTML, on the other hand, told how things should be displayed without identifying what they were. The coalescing XSL standard is essentially a translation mechanism that lets you specify what to convert an XML tag into so that it can be displayed -- for example, in HTML. Different XSL formats can then be used to display the same data in different ways, for different uses.

The translation part of XSL is pretty complete, and a number of implementations exist. The second part of XSL is a bit more tenuous, however. That part covers formatting objects, also known as flow objects, which give you the ability to define multiple areas on a page and then link them together. When a text stream is directed at the collection, it fills the first area and then "flows" into the second when the first area is filled. Such objects are used by newsletters, catalogs, and periodical publications.

The latest W3C work on XSL is at http://www.w3.org/TR/WD-xsl.

XLL
XML Link Language

The XLL protocol consists of two proposed specifications to handle links between XML documents: XLink and XPointer, discussed next. These specifications are still in their preliminary stages, but are sure to have a big impact on how XML documents are used.

XLink: The XLink protocol is a proposed specification to handle links between XML documents. This specification allows for some pretty sophisticated linking, including two-way links, links to multiple documents, "expanding" links that insert the linked information into your document rather than replacing your document with a new page, links between two documents that are created in a third, independent document, and indirect links (so you can point to an "address book" rather than directly to the target document -- updating the address book then automatically changes any links that use it). For more information on the XLink specification, see http://www.w3.org/TR/WD-xml-link.

XPointer: In general, the XLink specification targets a document or document-segment using its ID. The XPointer specification defines mechanisms for "addressing into the internal structures of XML documents", without requiring the author of the document to have defined an ID for that segment. To quote the spec, it provides for "reference to elements, character strings, and other parts of XML documents, whether or not they bear an explicit ID attribute". For the latest XPointer specification, see http://www.w3.org/TR/WD-xptr.

XHTML

The XHTML specification is a way of making XML documents that look and act like HTML documents. Since an XML document can contain any tags you care to define, why not define a set of tags that look like HTML? That's the thinking behind the XHTML specification, at any rate. The result of this specification is a document that can be displayed in browsers and also treated as XML data. The data may not be quite as identifiable as "pure" XML, but it will be a heck of a lot easier to manipulate than standard HTML, because XML specifies a good deal more regularity and consistency.

For example, every tag in a well-formed XML document must either have an end-tag associated with it or it must end in />. So you might see <p>...</p>, or you might see <p/>, but you will never see <p> standing by itself. The upshot of that requirement is that you never have to program for the weird kinds of cases you see in HTML where, for example, a <dt> tag might be terminated by </dt>, by another <dt>, by <dd>, or by </dl>. That makes it a lot easier to write code!

The XHTML specification is a reformulation of HTML 4.0 into XML. The latest information is at
http://www.w3.org/TR/WD-html-in-xml/.

XML Schema

This specification is built on the schema proposals described below. It defines the types of elements a document can contain, their relationships, and the data they can contain in ways that go far beyond what the current DTD specification provides. See the "Schema Proposals" section below for more insight into the limitations of DTDs. For more information on the XML Schema proposal, see the W3C specs XML Schema (Structures) and XML Schema (Datatypes).

W3C "Notes"

"Notes" are not W3C standards at all. Instead, they are proposals made by various individuals and groups that cover topics that are under consideration. The W3C publishes them so that people who are busy working on the standards and reviewing them have some ideas to get started. One "note" is no more likely to reflect the eventual standard than any other -- each will be judged on its merits and, hopefully, the best features of all will be combined in the W3C draft. Most of the schema proposals to date [Mar 1999] fall into this category.

Schema Proposals

Although DTDs let you validate XML documents, they suffer from a number of shortcomings. Many of the issues stem from the fact that a DTD specification is not hierarchical. For a mailing address that contained several "parsed character data" (PCDATA) elements, for example, the DTD might look something like this:

    <!ELEMENT mailAddress (name, address, zipcode)>
    <!ELEMENT name (#PCDATA)>
    <!ELEMENT address (#PCDATA)>
    <!ELEMENT zipcode (#PCDATA)>

As you can see, the specifications are linear. There is no sense of containment, which can pollute the namespace, forcing you to come up with new names for similar elements in different settings. So if you wanted to add another "name" element to the DTD that contained of the elements firstName, middleInitial, and lastName, then you would have to come up with another identifier. You could not simply call it "name" without conflicting with the name element defined for use in a mailAddress.

Another problem with the nonhierarchical nature of DTD specifications is that it is not clear what comments are meant to explain. A comment at the top like <!-- Address used for mailing via the postal system --> would apply to all of the elements that constitute a mailing address. But a comment like <!-- Addressee --> would apply to the name element only. On the other hand, a comment like <!-- A 5-digit string --> would apply specifically to the #PCDATA part of the zipcode element, to describe the valid formats. Finally, DTDs do not allow you to formally specify field-validation criteria, such as the 5-digit (or 5 and 4) limitation for the zipcode field.

To remedy these shortcomings, a number of proposals have been made for a more database-like, hierarchical "schema" that specifies validation criteria. Some of the major proposals are shown below.

DDML / Xschema
Document Definition Markup Language / XSchema

Document definitions like DTD are good to have, but a DTD has a somewhat strange syntax. DDML is the new name for the older XSchema proposal, which specifies validity constraints for an XML document using XML. DDML is one of several proposals that aim to be the successor to DTD. It is not yet clear what the final validation standard will be.

For more information on DDML, see http://www.w3.org/TR/NOTE-ddml.

DCD
Document Content Description

The DCD proposal is a mechanism for defining a standard XML front end for databases.

For more information on DCD, see http://www.w3.org/TR/NOTE-dcd.

SOX
Schema for Object-oriented XML

SOX is a schema proposal that includes extensible data types, namespaces, and embedded documentation.

For more information on SOX, see http://www.w3.org/TR/NOTE-SOX.

Other W3C Notes

Other proposals for XML-based standards include:

ICE
Information and Content Exchange

ICE is a protocol for use by content syndicators and their subscribers. It focuses on "automating content exchange and reuse, both in traditional publishing contexts and in business-to-business relationships".

For more information on ICE, see http://www.w3.org/TR/NOTE-ice.

Standards That Build on XML

The following standards and proposals build on XML. Since XML is basically a language-definition tool, these specifications use it to define standardized languages for specialized purposes.

Extended Document Standards

SMIL
Synchronized Multimedia Integration Language

SMIL is a W3C recommendation that covers audio, video, and animations. It also addresses the difficult issue of synchronizing the playback of such elements.

For more information on SMIL, see http://www.w3.org/TR/REC-smil.

MathML
Mathematical Markup Language

MathML is a W3C recommendation that deals with the representation of mathematical formulas.

For more information on MathML, see http://www.w3.org/TR/REC-MathML.

SVG
Scalable Vector Graphics

SVG is a W3C working draft that covers the representation of vector graphic images. (Vector graphic images that are built from commands that say things like "draw a line (square, circle) from point x,y to point m,n" rather than encoding the image as a series of bits. Such images are more easily scalable, although they typically require more processing time to render.)

For more information on SVG, see http://www.w3.org/TR/WD-SVG.

DrawML
Drawing Meta Language

DrawML is a W3C note that covers 2D images for technical illustrations. It also addresses the problem of updating and refining such images.

For more information on DrawML, see http://www.w3.org/TR/1998/NOTE-drawml-19981203.

eCommerce Standards

cXML
Commerce XML

cXML is a RosettaNet (www.rosettanet.org) standard for setting up interactive online catalogs for different buyers, where the pricing and product offerings are company specific. Includes mechanisms to handle purchase orders, change orders, status updates, and shipping notifications.

For more information on cXML, see http://corp.ariba.com/News/AribaArchive/cxml.htm.

CBL
Common Business Library

CBL is a library of element and attribute definitions maintained by CommerceNet (www.commerce.net).

For more information on CBL and a variety of other initiatives that work together to enable eCommerce applications, see http://www.commerce.net/projects/currentprojects/eco/wg/eCo_Framework_Specifications.html.

Software Administration and Maintenance Standards

DMTF
Desktop Management Task Force

The DMTF is a group that is coming up with standards to remotely administer desktop equipment. They are planning to use XML to maintain catalogs of devices and their descriptions, and for other remote-management tasks. This group is not part of the W3C, but their activities appear to have progressed to the draft stage, so they are listed here.

For more information on this organization, see http://www.dmtf.org/.

WebDAV
Web Distributed Authoring and Versioning

WebDAV is an effort from the IETF that uses XML to maintain web servers. It allows a server's content to be created, modified, and changed over an HTTP connection. (The IETF is not affiliated with the W3C, but their "draft standard" is approximately the equivalent of a W3C "recommendation", so it is included here.)

For more information, see the "webdav" working group at http://www.ietf.org.


Previous | Next | Index | TOC | Top | Top Contents Index Glossary