Previous | Next | Index | TOC | Top | Top Contents Index Glossary


5c. Defining Attributes and Entities in the DTD

Link Summary
Exercise Links

Glossary Terms

entity, external entity, notation

The DTD you've defined so far is fine for use with the nonvalidating parser. It tells where text is expected and where it isn't, which is all the nonvalidating parser is going to pay attention to. But for use with the validating parser, the DTD needs to specify the valid attributes for the different elements. You'll do that in this section, after which you'll define one internal entity and one external entity that you can reference in your XML file.

Defining Attributes in the DTD

Let's start by defining the attributes for the elements in the slide presentation.

Note:
The XML written in this section is contained in slideshow1b.dtd.

Add the text highlighted below to define the attributes for the slideshow element:

<!ELEMENT slideshow (slide+)>
<!ATTLIST slideshow 
            title    CDATA    #REQUIRED
            date     CDATA    #IMPLIED
            author   CDATA    "unknown"
>
<!ELEMENT slide (title, item*)>

The DTD tag ATTLIST begins the series of attribute definitions. The name that follows ATTLIST specifies the element for which the attributes are being defined. In this case, the element is the slideshow element. (Note once again the lack of hierarchy in DTD specifications.)

Each attribute is defined by a series of three space-separated values. Commas and other separators are not allowed, so formatting the definitions as shown above is helpful for readability. The first element in each line is the name of the attribute: title, date, or author, in this case. The second element indicates the type of the data: CDATA is character data -- unparsed data, once again, in which a left-angle bracket (<) will never be construed as part of an XML tag. The following table presents the valid choices for the attribute type.

Attribute Type Specifies...
(value1 | value2 | ...)
A list of values separated by vertical bars. (Example below)
CDATA
"Unparsed character data". (For normal people, a text string.)
ID
A name that no other ID attribute shares.
IDREF
A reference to an ID defined elsewhere in the document.
IDREFS
A space-separated list containing one or more ID references.
ENTITY
The name of an entity defined in the DTD.
ENTITIES
A space-separated list of entities.
NMTOKEN
A valid XML name composed of letters, numbers, hyphens, underscores, and colons.
NMTOKENS
A space-separated list of names.
NOTATION
The name of a DTD-specified notation, which describes a non-XML data format, such as those used for image files.*

*This is a rapidly obsolescing specification which will be discussed in greater length towards the end of this section.

When the attribute type consists of a parenthesized list of choices separated by vertical bars, the attribute must use one of the specified values. For an example, add the text highlighted below to the DTD:

<!ELEMENT slide (title, item*)>
<!ATTLIST slide 
            type   (tech | exec | all) #IMPLIED
>
<!ELEMENT title (#PCDATA)>
<!ELEMENT item (#PCDATA | item)* >

This specification says that the slide element's type attribute must be given as type="tech", type="exec", or type="all". No other values are acceptable. (DTD-aware XML editors can use such specifications to present a pop-up list of choices.)

The last entry in the attribute specification determines the attributes default value, if any, and tells whether or not the attribute is required. The table below shows the possible choices.

Specification
Specifies...
#REQUIRED
The attribute value must be specified in the document.
#IMPLIED
The value need not be specified in the document. If it isn't, the application will have a default value it uses.
"defaultValue"
The default value to use, if a value is not specified in the document.
#FIXED "fixedValue"
The value to use. If the document specifies any value at all, it must be the same.

Defining Entities in the DTD

So far, you've seen predefined entities like &amp; and you've seen that an attribute can reference an entity. It's time now for you to learn how to define entities of your own.

Note: The XML defined here is contained in slideSample06.xml. The output is shown in Echo09-06.log.

Add the text highlighted below to the DOCTYPE tag in your XML file:

<!DOCTYPE slideshow SYSTEM "slideshow1.dtd" [
  <!ENTITY product  "WonderWidget">
  <!ENTITY products "WonderWidgets">
]>

The ENTITY tag name says that you are defining an entity. Next comes the name of the entity and its definition. In this case, you are defining an entity named "product" that will take the place of the product name. Later when the product name changes (as it most certainly will), you will only have to change the name one place, and all your slides will reflect the new value.

The last part is the substitution string that replaces the entity name whenever it is referenced in the XML document. The substitution string is defined in quotes, which are not included when the text is inserted into the document.

Just for good measure, we defined two versions, one singular and one plural, so that when the marketing mavens come up with "Wally" for a product name, you will be prepared to enter the plural as "Wallies" and have it substituted correctly.

Note: Truth be told, this is the kind of thing that really belongs in an external DTD. That way, all your documents can reference the new name when it changes. But, hey, this is an example...

Now that you have the entities defined, the next step is to reference them in the slide show. Make the changes highlighted below to do that:

<slideshow 
    title="WonderWidget&product; Slide Show" 
    ...

    <!-- TITLE SLIDE -->
    <slide type="all">
       <title>Wake up to WonderWidgets&products;!</title>
    </slide>

    <!-- OVERVIEW -->
    <slide type="all">
      <title>Overview</title>
      <item>Why <em>WonderWidgets&products;</em> are great</item>
      <item/>
      <item>Who <em>buys</em> WonderWidgets&products;</item>
    </slide>

The points to notice here are that entities you define are referenced with the same syntax (&entityName;) that you use for predefined entities, and that the entity can be referenced in an attribute value as well as in an element's contents.

Echoing the Entity References

When you run the Echo program on this version of the file, here is the kind of thing you see:

ELEMENT: <title>
CHARS:   Wake up to 
CHARS:   WonderWidgets
CHARS:   !
END_ELM: </title>

Note that the existence of the entity reference generates an extra call to the characters method, and that the text you see is what results from the substitution.

Additional Useful Entities

Here are three other examples for entity definitions that you might find useful when you write an XML document:
<!ENTITY ldquo  "&#147;"> <!-- Left Double Quote --> 
<!ENTITY rdquo  "&#148;"> <!-- Right Double Quote -->
<!ENTITY trade  "&#153;"> <!-- Trademark Symbol (TM) -->
<!ENTITY rtrade "&#174;"> <!-- Registered Trademark (R) -->
<!ENTITY copyr  "&#169;"> <!-- Copyright Symbol --> 

Referencing External Entities

You can also use the SYSTEM or PUBLIC identifier to name an entity that is defined in an external file. You'll do that now.

Note: The XML defined here is contained in slideSample07.xml and in copyright.xml. The Echo output is shown in Echo09-07.log.

To reference an external entity, add the text highlighted below to the DOCTYPE statement in your XML file:

<!DOCTYPE slideshow SYSTEM "slideshow.dtd" [
  <!ENTITY product  "WonderWidget">
  <!ENTITY products "WonderWidgets">
  <!ENTITY copyright SYSTEM "copyright.xml">
]>

This definition references a copyright message contained in a file named copyright.xml. Create that file and put some interesting text in it, perhaps something like this:

<!--  A SAMPLE copyright  -->
This is the standard copyright message that our lawyers
make us put everywhere so we don't have to shell out a
million bucks every time someone spills hot coffee in their
lap...

Finally, add the text highlighted below to your slideSample.xml file to reference the external entity:

<!-- TITLE SLIDE -->
  ...
</slide>

<!-- COPYRIGHT SLIDE -->
<slide type="all">
   <item>&copyright;</item>
</slide>

You could also use an external entity declaration to access a servlet that produces the current date using a definition something like this:

 <!ENTITY currentDate SYSTEM
     "http://www.example.com/servlet/CurrentDate?fmt=dd-MMM-yyyy"> 

You would then reference that entity the same as any other entity:

Today's date is &currentDate;.

Echoing the External Entity

When you run the Echo program on your latest version of the slide presentation, here is what you see:

        ...
        END_ELM: </slide>
        ELEMENT: <slide
           ATTR: type	"all"
        >
            ELEMENT: <item>
            CHARS:   
This is the standard copyright message that our lawyers
make us put everywhere so we don't have to shell out a
million bucks every time someone spills hot coffee in their
lap...
            END_ELM: </item>
        END_ELM: </slide>
        ...

Note that the newline which follows the comment in the file is echoed as a character, but that the comment itself is ignored. That is the reason that the copyright message appears to start on the next line after the CHARS: label, instead of immediately after the label -- the first character echoed is actually the newline that follows the comment.

Summarizing Entities

An entity that is referenced in the document content, whether internal or external, is termed a general entity. An entity that contains DTD specifications that are referenced from within the DTD is termed a parameter entity. (More on that later.)

An entity which contains XML (text and markup), and which is therefore parsed, is known as a parsed entity. An entity which contains binary data (like images) is known as an unparsed entity. (By its very nature, it must be external.) We'll be discussing references to unparsed entities in the next section of this tutorial.


Previous | Next | Index | TOC | Top | Top Contents Index Glossary