In tutorial 3, we developed a document template for creating XML documents that can be viewed in Web browsers as HTML documents. In this tutorial, we will create a document type definition (DTD) for this template. This DTD defines a set of rules that are associated with all of the XML documents created using the template. This DTD can be used to create and validate the XML documents that conform to the rules defined in the DTD.
Many tools are available for creating and editing DTDs¡ªfor example, XML Authority, XML Spy, and Near and Far. We will use XML Authority to create and edit our DTD. You can download a trial version of XML Authority from http://www.extensibility.com. Microsoft XML Notepad cannot be used to edit DTDs (although it can validate a document that has a DTD).
In this chapter, we will build a DTD that defines a set of rules for the content of the sample Web document template we created in Chapter 3. The DTD can be used to verify that a set of XML documents is created according to the rules defined in the DTD by checking the validity of the documents.
NOTE
If you are building a large Internet system, you can define a set of rules that all developers must use when creating Web pages. If the Web pages are written using XML, a DTD can be used to verify that all the pages follow the rules. XML can also be used to pass information from one corporation to another or from one department to another within a corporation. The DTD can be used to verify that the incoming information is in the correct format.
To open the sample document in XML Authority, follow these steps:
- Open XML Authority, select New from the File menu, and then select New (DTD) from the submenu. If a default UNNAMED element appears at the top of the document, delete it.
- Choose Import from the File menu, and then choose XML Document from the submenu.
- Select the Standard.xml document you created in Chapter 3. XML Authority will import the document as a DTD.
Figure 4-1 shows Standard.xml displayed in XML Authority.
Figure 4-1. The Standard.xml template displayed in XML Authority.
- Choose Source from the View menu.
XML Authority automatically builds a DTD for the XML document, so in this case, the source is a DTD for the Standard.xml XML document. The complete source code that XML Authority generated is shown here:
<!ELEMENT html (head, body)> <!ELEMENT head (title, base)> <!ELEMENT title ( )> <!ELEMENT base ( )> <!ATTLIST base target CDATA #REQUIRED> <!ELEMENT body (basefont, a, table)> <!ATTLIST body alink CDATA #REQUIRED text CDATA #REQUIRED bgcolor CDATA #REQUIRED link CDATA #REQUIRED vlink CDATA #REQUIRED> <!ELEMENT basefont ( )> <!ATTLIST basefont size CDATA #REQUIRED> <!ELEMENT a ( )> <!ATTLIST a href CDATA #IMPLIED name CDATA #IMPLIED target CDATA #IMPLIED> <!ELEMENT table (tr)> <!ATTLIST table width CDATA #REQUIRED rules CDATA #REQUIRED frame CDATA #REQUIRED align CDATA #REQUIRED cellpadding CDATA #REQUIRED border CDATA #REQUIRED cellspacing CDATA #REQUIRED> <!ELEMENT tr (td)> <!ATTLIST tr bgcolor CDATA #REQUIRED valign CDATA #REQUIRED align CDATA #REQUIRED> <!ELEMENT td (CellContent)> <!ATTLIST td bgcolor CDATA #REQUIRED valign CDATA #REQUIRED align CDATA #REQUIRED rowspan CDATA #REQUIRED colspan CDATA #REQUIRED> <!ELEMENT CellContent (h1, p)> <!ATTLIST CellContent cellname CDATA #REQUIRED> <!ELEMENT h1 ( )> <!ATTLIST h1 align CDATA #REQUIRED> <!ELEMENT p (font+, img, br, a, ul, ol)> <!ELEMENT font (b)> <!ATTLIST font color CDATA #REQUIRED face CDATA #REQUIRED size CDATA #REQUIRED> <!ELEMENT b ( )> <!ELEMENT img ( )> <!ATTLIST img width CDATA #REQUIRED height CDATA #REQUIRED hspace CDATA #REQUIRED vspace CDATA #REQUIRED src CDATA #REQUIRED alt CDATA #REQUIRED align CDATA #REQUIRED border CDATA #REQUIRED lowsrc CDATA #REQUIRED> <!ELEMENT br ( )> <!ATTLIST br clear CDATA #REQUIRED> <!ELEMENT ul (font, li)> <!ATTLIST ul type CDATA #REQUIRED> <!ELEMENT li (font, a)> <!ELEMENT ol (font, li)> <!ATTLIST ol type CDATA #REQUIRED start CDATA #REQUIRED> |
As you can see, the DTD consists of two basic components: !ELEMENT and !ATTLIST. In this chapter, we will look at these two statements in detail.
NOTE
The DTD that has been generated here is only the first approximation. In this chapter, you will refine this DTD so that it defines a set of rules for your XML documents.Every element used in your XML documents has to be declared by using the <!ELEMENT> tag in the DTD. The format for declaring an element in a DTD is shown here:
<!ELEMENT ElementName Rule>The Rule component defines the rule for the content contained in the element. These rules define the logical structure of the XML document and can be used to check the document's validity. The rule can consist of a generic declaration and one or more elements, either grouped or unordered.
Three generic content declarations are predefined for XML DTDs: PCDATA, ANY, and EMPTY.
PCDATA
The PCDATA declaration can be used when the content within an element is only text¡ªthat is, when the content contains no child elements. Our sample document contains several such elements, including title, a, h1, and b. These elements can be declared as follows. (The pound sign identifies a special predefined name.)
<!ELEMENT title (#PCDATA)> <!ELEMENT a (#PCDATA)> <!ELEMENT h1 (#PCDATA)> <!ELEMENT b (#PCDATA)>NOTE
PCDATA is also valid with empty elements.ANY
The ANY declaration can include both text content and child elements. The html element, for example, could use the ANY declaration as follows:
<!ELEMENT html ANY>This ANY declaration would allow the body and head elements to be included in the html element in an XML document:
<html><head/><body/></html>The following XML would also be valid:
<html>This is an HTML document.<head/><body/></html>And this XML would be valid with the ANY declaration in our sample DTD:
<html>This is an HTML document.<head/><body/><AnotherTag/></html>The ANY declaration allows any content to be marked by the element tags, provided the content is well-formed XML. Although this flexibility might seem useful, it defeats the purpose of the DTD, which is to define the structure of the XML document so that the document can be validated. In brief, any element that uses ANY cannot be checked for validity, only for being well formed.
EMPTY
It is possible for an element to have no content¡ªthat is, no child elements or text. The img element is an example of this scenario. The following is its definition:
<!ELEMENT img EMPTY>The base, br, and basefont elements are also correctly declared using EMPTY in our sample DTD.
Instead of using the ANY declaration for the html element, you should define the content so that the html element can be validated. The following is a declaration that specifies the content of the html element and is the same as the one given by XML Authority:
<!ELEMENT html (head, body)>This (head, body) declaration signifies that the html element will have two child elements: head and body. You can list one child element within the parentheses or as many child elements as are required. You must separate each child element in your declaration with a comma.
For the XML document to be valid, the order in which the child elements are declared must match the order of the elements in the XML document. The comma that separates each child element is interpreted as followed by; therefore, the preceding declaration tells us that the html element will have a head child element followed by a body child element. Building on the preceding declaration, the following is valid XML:
<html><head></head><body/></html>However, the following statement would not be valid:
<html><body></body><head/></html>This statement indicates that the html element must contain two child elements¡ªthe first is body and the second is head¡ªand there can only be one instance of each element.
The following two statements would also be invalid:
<html><body></body></html> <html><head/><body/><head/><body/></html>The first statement is missing the head element, and in the second statement the head and body elements are listed twice.
Reoccurrence
You will want every html element to include one head and one body child element, in the order listed. Other elements, such as the body and table elements, will have child elements that might be included multiple times within the main element or might not be included at all. XML provides three markers that can be used to indicate the reoccurrence of a child element, as shown in the following table:
XML Element Markers
Marker Meaning ? The element either does not appear or can appear only once (0 or 1). + the element must appear at least once (1 or more). * The element can appear any number of times, or it might not appear at all (0 or more). Putting no marker after the child element indicates that the element must be included and that it can appear only one time.
The head element contains an optional base child element. To declare this element as optional, modify the preceding declaration as follows:
<!ELEMENT head (title, base?)>The body element contains a basefont element and an a element that are also optional. In our example, the table element is a required element used to format the page, so you want to make table a required element that appears only once in the body element. You can now rewrite the Body element as follows:
<!ELEMENT body (basefont?, a?, table)>The table element can have as many rows as are needed to format the page but must include at least one row. The table element should now be written as follows:
<!ELEMENT table (tr+)>The same conditions hold true for the tr element: the row element must have at least one column, as shown here:
<!ELEMENT tr (td+)>The a, ul, and ol elements might not be included in the p element, or they might be included many times, as shown here:
<!ELEMENT p (font+, img, br, a*, ul*, ol*)>Because the br element formats text around an image, the img and br tags should always be used together.
Grouping child elements
Fortunately, XML provides a way to group elements. For example, you can rewrite the p element as follows:
<!ELEMENT p (font*, (img, br?)*, a*, ul*, ol*)>This declaration specifies that an img element followed by a br element appears zero or more times in the p element.
One problem remains in this declaration. As mentioned, the comma separator can be interpreted as the words followed by. Thus, each p element will have font, img, br, a, ul, and ol child elements, in that order. This is not exactly what you want; instead, you want to be able to use these elements in any order and to use some elements in some paragraphs and other elements in other paragraphs. For example, you would like to be able to write the following code:
<p> <font size=5> <b>Three Reasons to Shop Northwind Traders</b> </font> <ol> <li> <a href="Best.htm">Best Prices</a> </li> <li> <a href="Quality.htm">Quality</a> </li> <li> <a href="Service.htm">Fast Service</a> </li> </ol> <!--The following img element is not in the correct order.--> <img src="Northwind.jpg"></img> </p>As you can see, the img element is not in the correct order¡ªit should precede the ol element, since the declaration imposes a strict ordering on the elements.
NOTE
Also, numerous elements are declared but are not included (for example, ul). The missing elements are not a problem because you have declared each element with an asterisk (*), indicating that there can be zero or more of each element.To allow a "reordering" of elements, you could rewrite the declaration as follows:
<!ELEMENT p (font*, (img, br?)*, a*, ul*, ol*)+>The plus sign (+) at the very end of the declaration indicates that one or more copies of these child elements can occur within a p element.
The preceding XML code could thus be interpreted as two sets of child elements, as shown here:
<p> <!--The elements that follow are the first set of (font*, (img, br?)*, a*, ul*, ol*) elements (missing the (img, br), a, and ul elements).--> <font size=5> <b>Three Reasons to Shop Northwind Traders</b> </font> <ol> <li> <a href="Best.htm">Best Prices</a> </li> <li> <a href="Quality.htm">Quality</a> </li> <li> <a href="Service.htm">Fast Service</a> </li> </ol> <!--The img element that follows is a second set of (font*,(img, br?)*, a*, ul*, ol*) elements containing only an img element.--> <img src="Northwind.jpg"></img> </p>This new declaration is better, but it still does not allow you to choose any element in any order. All of the elements have been declared as optional and yet at least one member of the group must still be included (as indicated by the plus sign at the end of the list of elements). There is another option.
Creating an unordered set of child elements
In addition to using commas to separate elements, you can use a vertical bar (|). The vertical bar separator indicates that one child element or the other child element but not both will be included within the element¡ªin other words, one element or the other must be present. The preceding declaration can thus be rewritten as follows:
<!ELEMENT p (font | (img, br?) | a | ul | ol)+>This declaration specifies that the p element can include a font child element, an (img, br?) child element, an a child element, a ul child element, or an ol child element, but only one of these elements. The plus sign (+) indicates that the element must contain one or more copies of one or several child elements. With this declaration, you can use child elements in any order, as many times as needed.
NOTE
The additional markers (?, +, *) can be used to override the vertical bar (|), which limits the occurrences of the child element to one or none.According to the new declaration, our XML code will be interpreted as follows:
<p> <!--First group, containing single font element--> <font size=5> <b>Three Reasons to Shop Northwind Traders</b> </font> <!--Second group, containing the single child element ol--> <ol> <li> <a href="Best.htm">Best Prices</a> </li> <li> <a href="Quality.htm">Quality</a> </li> <li> <a href="Service.htm">Fast Service</a> </li> </ol> <!--Third group, containing a single child element img--> <img src="Northwind.jpg"></img> </p>Suppose you also want to include text within the p element. To do this, you will need to add a PCDATA declaration to the group. You will have to use the vertical bar separator because you cannot use the PCDATA declaration if the child elements are separated by commas. You also cannot have a subgroup such as (img, br?) within a group that includes PCDATA. We can solve this problem by creating a new element named ImageLink that contains the subgroup and add it to the p element as follows:
<!ELEMENT ImageLink (img, br?)> <!ELEMENT p (#PCDATA | font | ImageLink | a | ul | ol)+>Web browsers that do not understand XML will ignore the ImageLink element. When you use PCDATA within a group of child elements, it must be listed first and must be preceded by a pound sign (#).
You can use the DTD to make certain sections of the document appear in a certain order and include a specific number of child elements (as was done with the html element). You can also create sections of the document that contain an unspecified number of child elements in any order. DTDs are extremely flexible and can enable you to develop a set of rules that matches your requirements.
Every element can have a set of attributes associated with it. The attributes for an element are defined in an !ATTLIST statement. The format for the !ATTLIST statement is shown here:
<!ATTLIST ElementName AttributeDefinition>ElementName is the name of the element to which these attributes belong.
AttributeDefinition consists of the following components:
AttributeName AttributeType DefaultDeclarationAttributeName is the name of the attribute. AttributeType refers to the data type of the attribute. DefaultDeclaration contains the default declaration section of the attribute definition.
XML DTD attributes can have the following data types: CDATA, enumerated, ENTITY, ENTITIES, ID, IDREF, IDREFS, NMTOKEN, and NMTOKENS.
CDATA
The CDATA data type indicates that the attribute can be set to any allowable character value. For our sample DTD used for creating Web pages, the vast majority of the elements will have attributes with a CDATA data type. The following body attributes should all be CDATA:
<!ATTLIST body alink CDATA #REQUIRED text CDATA #REQUIRED bgcolor CDATA #REQUIRED link CDATA #REQUIRED vlink CDATA #REQUIRED>Notice that you can list multiple attributes for a single element.
Enumerated
The enumerated data type lists a set of values that are allowed for the attribute. Using an enumerated data type, you can rewrite the font element to limit the color attribute to Cyan, Lime, Black, White, or Maroon; limit the size attribute to 2, 3, 4, 5, or 6; and limit the face attribute to Times New Roman or Arial. The new font declaration would look as follows:
<!ATTLIST font color (Cyan | Lime | Black | White | Maroon) #REQUIRED size (2 | 3 | 4 | 5 | 6) #REQUIRED face ('Times New Roman'|Arial) #REQUIRED>NOTE
Keep in mind that this declaration is case sensitive. Thus, entering cyan as a color value would cause an error. Also notice the use of ' as a placeholder for a single quotation mark and the use of the parentheses to group the collection of choices.In the section "The Default Declaration" later in this chapter, you'll learn how to declare a default value for the color and size attributes.
ENTITY and ENTITIES
The ENTITY and ENTITIES data types are used to define reusable strings that are represented by a specific name. These data types will be discussed in detail in Chapter 5.
ID, IDREF, and IDREFS
Within a document, you may want to be able to identify certain elements with an attribute that is of the ID data type. The name of the attribute with an ID data type must be unique for all of the elements in the document. Other elements can reference this ID by using the IDREF or IDREFS data types. IDREFS can be used to declare multiple attributes as IDREF.
When you work with HTML, you use anchor (a) elements to bookmark sections of your document. These bookmarks can be used to link to sections of the document. Unlike the ID data type, the a element does not have to be unique. In XML, IDs are used to create links to different places in your document. When we examine linking in detail in Chapter 6, you'll see that the ID data type offers other advantages.
Our example document includes an a element at the top of the document as an anchor that can be used to jump to the top of the page. You can modify the a element definition in the DTD as follows:
<!ATTLIST a linkid ID #REQUIRED href CDATA #IMPLIED name CDATA #IMPLIED target CDATA #IMPLIED>Now when you create an XML document, you can define an a element at the top of the page and associate a unique ID with it using the linkid attribute. To reference this ID from another element, you first have to add an IDREF attribute to that element, as shown here:
<!ATTLIST ul headlink IDREF #IMPLIED type CDATA #REQUIRED>In your XML document, you can associate the linkid attribute of the a element with the headlink attribute of the ul element by assigning the same value (HeadAnchor, for example) to these two attributes. If a second ID attribute, named footlink, was added to an element at the bottom of the XML document, you could define references to both of these elements. In this case, you would need to use IDREFS, as shown here:
<!ATTLIST ul headlink footlink IDREFS #IMPLIED type CDATA #REQUIRED>The actual XML document would contain the following code:
<a linkid="HeadAnchor" name="head"> <!--Head anchor--> </a> <!--Some HTML code here--> <a href="#head"> <ul headlink="HeadAnchor"> <!--li elements here--> </ul> </a> <a href="#foot"> <ul footlink="FootAnchor"> <!--li elements here--> </ul> </a> <!--Some more HTML code here--> <a linkid="FootAnchor" name="foot"> <!--Foot anchor--> </a>This code will work with non-XML browsers and with browsers that support XML.
NMTOKEN and NMTOKENS
The NMTOKEN and NMTOKENS data types are similar to the CDATA data type in that they represent character values. The name tokens are strings that consist of letters, digits, underscores, colons, hyphens, and periods. They cannot contain spaces. A declaration using these data types could look as follows:
<!ATTLIST body background NMTOKEN "Blue" foreground NMTOKENS "Green, Yellow, Orange" >The default declaration can consist of any valid value for your attributes, or it can consist of one of three predefined keywords: #REQUIRED, #IMPLIED, or #FIXED. The #REQUIRED keyword indicates that the attribute must be included with the element and that it must be assigned a value. There are no default values when #REQUIRED is used. The #IMPLIED keyword indicates that the attribute does not have to be included with the element and that there is no default value. The #FIXED keyword sets the attribute to one default value that cannot be changed. The default value is listed after the #FIXED keyword. If none of these three keywords are used, a default value can be assigned if an attribute is not set in the XML document.
Based on this information about the components of the !ELEMENT and !ATTLIST statements, we can rewrite our original DTD as follows:
<!ELEMENT html (head, body)> <!ELEMENT head (title, base?)> <!ELEMENT title (#PCDATA)> <!ELEMENT base EMPTY> <!ATTLIST base target CDATA #REQUIRED> <!ELEMENT body (basefont?, a?, table)> <!ATTLIST body alink CDATA #IMPLIED text CDATA #IMPLIED bgcolor CDATA #IMPLIED link CDATA #IMPLIED vlink CDATA #IMPLIED> <!ELEMENT basefont EMPTY> <!ATTLIST basefont size CDATA #REQUIRED> <!ELEMENT a (#PCDATA)> <!ATTLIST a linkid ID #IMPLIED href CDATA #IMPLIED name CDATA #IMPLIED target CDATA #IMPLIED> <!ELEMENT table (tr+)> <!ATTLIST table width CDATA #IMPLIED rules CDATA #IMPLIED frame CDATA #IMPLIED align CDATA 'Center' cellpadding CDATA '0' border CDATA '0' cellspacing CDATA '0'> <!ELEMENT tr (td+)> <!ATTLIST tr bgcolor (Cyan | Lime | Black | White | Maroon) 'White' valign (Top | Middle | Bottom) 'Middle' align (Left | Right | Center) 'Center'> <!ELEMENT td (CellContent)> <!ATTLIST td bgcolor (Cyan | Lime | Black | White | Maroon) 'White' valign (Top | Middle | Bottom) 'Middle' align (Left | Right | Center) 'Center' rowspan CDATA #IMPLIED colspan CDATA #IMPLIED> <!ELEMENT CellContent (h1?| p?)+> <!ATTLIST CellContent cellname CDATA #REQUIRED> <!ELEMENT h1 (#PCDATA)> <!ATTLIST h1 align CDATA #IMPLIED> <!ELEMENT ImageLink (img, br?)> <!ELEMENT p (#PCDATA | font | ImageLink | a | ul | ol)+> <!ATTLIST p align CDATA #IMPLIED> <!ELEMENT font (#PCDATA | b)*> <!ATTLIST font color (Cyan | Lime | Black | White | Maroon) 'Black' face ('Times New Roman '| Arial) #REQUIRED size (2 | 3 | 4 | 5 | 6) '3'> <!ELEMENT b (#PCDATA)> <!ELEMENT img EMPTY> <!ATTLIST img width CDATA #IMPLIED height CDATA #IMPLIED hspace CDATA #IMPLIED vspace CDATA #IMPLIED src CDATA #IMPLIED alt CDATA #IMPLIED align CDATA #IMPLIED border CDATA #IMPLIED lowsrc CDATA #IMPLIED> <!ELEMENT br EMPTY> <!ATTLIST br clear CDATA #REQUIRED> <!ELEMENT ul (font?, li+)> <!ATTLIST ul type CDATA #IMPLIED> <!ELEMENT li (font?| a?)+> <!ELEMENT ol (font?, li+)> <!ATTLIST ol type CDATA #REQUIRED start CDATA #REQUIRED>The body element contains two optional child elements, basefont and a, and one required element, table. For this example, because you are using a table to format the page and all information will go into the table, the table element is required. The a element is used to create an anchor to the top of the page, and the basefont element specifies the default font size for the text in the document. Because all of the attributes associated with the body element are optional, they include the keyword #IMPLIED.
In the base element, the target attribute is required. It would make no sense to include a base element without specifying the target attribute, as the specification of this attribute is the reason you would use the base element. Therefore, the target attribute is #REQUIRED.
In the font element, the color and size attributes have enumerated data types and are assigned default values (Black and 3). The face attribute remains unchanged.
Now that the DTD has been created, it can be used to validate the Help.htm document we created in Chapter 3. There are two ways to associate a DTD with an XML document: the first is to place the DTD code within the XML document, and the second is to create a separate DTD document that is referenced by the XML document. Creating a separate DTD document allows multiple XML documents to reference the same DTD. We will take a look at how to declare a DTD first, and then examine how to place a DTD within the XML document.
The !DOCTYPE statement is used to declare a DTD. For an internal DTD, called an internal subset, you can use the following syntax:
<!DOCTYPE DocName [ DTD ]>The new XML document that combines Help.htm and the DTD would look like this:
<!DOCTYPE HTML [ <!ELEMENT html (head, body)> <!ELEMENT head (title, base?)> <!ELEMENT title (#PCDATA)> <!ELEMENT base EMPTY> <!ATTLIST base target CDATA #REQUIRED> <!ELEMENT body (basefont?, a?, table)> <!ATTLIST body alink CDATA #IMPLIED text CDATA #IMPLIED bgcolor CDATA #IMPLIED link CDATA #IMPLIED vlink CDATA #IMPLIED> <!ELEMENT basefont EMPTY> <!ATTLIST basefont size CDATA #REQUIRED> <!ELEMENT a (#PCDATA)> <!ATTLIST a linkid ID #IMPLIED href CDATA #IMPLIED name CDATA #IMPLIED target CDATA #IMPLIED> <!ELEMENT table (tr+)> <!ATTLIST table width CDATA #IMPLIED rules CDATA #IMPLIED frame CDATA #IMPLIED align CDATA 'Center' cellpadding CDATA '0' border CDATA '0' cellspacing CDATA '0'> <!ELEMENT tr (td+)> <!ATTLIST tr bgcolor (Cyan | Lime | Black | White | Maroon) 'White' valign (Top | Middle | Bottom) 'Middle' align (Left | Right | Center) 'Center'> <!ELEMENT td (CellContent)> <!ATTLIST td bgcolor (Cyan | Lime | Black | White | Maroon) 'White' valign (Top | Middle | Bottom) 'Middle' align (Left | Right | Center) 'Center' rowspan CDATA #IMPLIED colspan CDATA #IMPLIED> <!ELEMENT CellContent (h1? | p?)+> <!ATTLIST CellContent cellname CDATA #REQUIRED> <!ELEMENT h1 (#PCDATA)> <!ATTLIST h1 align CDATA #IMPLIED> <!ELEMENT ImageLink (img, br?)> <!ELEMENT p (#PCDATA | font | ImageLink | a | ul | ol)+> <!ATTLIST p align CDATA #IMPLIED> <!ELEMENT font (#PCDATA | b)*> <!ATTLIST font color (Cyan | Lime | Black | White | Maroon) 'Black' face ('Times New Roman '| Arial)#REQUIRED size (2 | 3 | 4 | 5 | 6) '3'> <!ELEMENT b (#PCDATA)> <!ELEMENT img EMPTY> <!ATTLIST img width CDATA #IMPLIED height CDATA #IMPLIED hspace CDATA #IMPLIED vspace CDATA #IMPLIED src CDATA #IMPLIED alt CDATA #IMPLIED align CDATA #IMPLIED border CDATA #IMPLIED lowsrc CDATA #IMPLIED> <!ELEMENT br EMPTY> <!ATTLIST br clear CDATA #REQUIRED> <!ELEMENT ul (font?, li+)> <!ATTLIST ul type CDATA #IMPLIED> <!ELEMENT li (font? | a?)+> <!ELEMENT ol (font?, li+)> <!ATTLIST ol type CDATA #REQUIRED start CDATA #REQUIRED> ]> <html> <head> <title>Northwind Traders Help Desk</title> <base target=""><!--Default link for page--></base> </head> <body text="#000000" bgcolor="#FFFFFF" link="#003399" alink="#FF9933" vlink="#996633"> <!--Default display colors for entire body--> <a name="Top"><!--Anchor for top of page--></a> <table border="0" frame="" rules="" width="100%" align="" cellspacing="0" cellpadding="0"> <!--Rules/frame is used with border--> <tr valign="Center"> <td rowspan="" colspan="2" align="Center"> <!--Either rowspan or colspan can be used, but not both--> <!--Valign: top, bottom, middle--> <CellContent cellname="Table Header"> <h1 align="Center">Help Desk</h1> </CellContent> </td> </tr> <tr valign="Top"> <td rowspan="" colspan="" align="Left"> <CellContent cellname="Help Topic List"> <p align=""> <ul type=""> <font face="" color="" size="3"> <b>For First-Time Visitors</b> </font> <li> <a href="FirstTimeVisitorInfo.htm" target=""> First-Time Visitor Information </a> </li> <li> <a href="SecureShopping.htm" target=""> Secure Shopping at Northwind Traders </a> </li> <li> <a href="FreqAskedQ.htm" target=""> Frequently Asked Questions </a> </li> <li> <a href="NavWeb.htm" target=""> Navigating the Web </a> </li> </ul> </p> </CellContent> </td> <td rowspan="" colspan="" align="Left"> <CellContent cellname="Shipping Links"> <p align=""> <ul type=""> <font face=""> <b>Shipping</b> </font> <li> <a href="Rates.htm" target=""> Rates </a> </li> <li> <a href="OrderCheck.htm" target=""> Checking on Your Order </a> </li> <li> <a href="Returns.htm" target=""> Returns </a> </li> </ul> </p> </CellContent> </td> </tr> </table> </body> </html>The marked-up text has remained the same with one exception. Any element that uses an enumerated data type cannot have an attribute set to an empty string (""). For example, if a tr element does not use the align attribute, the attribute must be removed from the element. Because a default value (Center) has been assigned in the DTD for the align attribute of the tr element, the default value will be applied only when the attribute is omitted.
If you open this document in the browser, you will find that it almost works. The closing brackets (]>) belonging to the !DOCTYPE statement will appear in the browser, however, which is not acceptable. To solve this problem, save the original DTD in a file called StandardHTM.dtd, remove the empty attributes that have an enumerated data type, and reference the external file StandardHTM.dtd in the new file named HelpHTM.htm. The format for a reference to an external DTD is as follows:
<!DOCTYPE RootElementName SYSTEM|PUBLIC [Name]DTD-URI>RootElementName is the name of the root element (in this example, html). The SYSTEM keyword is needed when you are using an unpublished DTD. If a DTD has to be published and given a name, the PUBLIC keyword can be used. If the parser cannot identify the name, the DTD-URI will be used. You must specify the location of the Uniform Resource Identifier (URI) of the DTD in the DTD-URI. A URI is a general type of system identifier. One type of URI is the Uniform Resource Locator (URL) you're familiar with from the Internet.
For our example, we would need to add the following line of code to the beginning of the document HelpHTM.htm:
<!DOCTYPE html SYSTEM "StandardHTM.dtd">A browser that does not understand XML will ignore this statement. Thus, by using an external DTD, you not only have an XML document that can be validated, but also one that can be displayed in any browser.
You now know how to build a DTD to define a set of rules that can be used to validate an XML document. Using DTDs, a standard set of rules can be developed that can be used to create standard XML documents. These documents can be exchanged between corporations or internally within a corporation and validated using the DTD. The DTD can also be used to create standard documents within a group, such as a group that is building an e-commerce site.
In Chapter 5, we'll look at entities. Entities enable you to create reusable strings within a DTD.
discuss this topic to forum
