• home
  • forum
  • my
  • kt
  • download
  • An Introduction to Document Type Definitions

    Author: 2007-08-27 17:03:58 From:

    In tutorial 3, we developed a document template for creating XML documents that can be viewed in Web browsers as HTML documents. In this tutorial, we will create a document type definition (DTD) for this template. This DTD defines a set of rules that are associated with all of the XML documents created using the template. This DTD can be used to create and validate the XML documents that conform to the rules defined in the DTD.

    Many tools are available for creating and editing DTDs¡ªfor example, XML Authority, XML Spy, and Near and Far. We will use XML Authority to create and edit our DTD. You can download a trial version of XML Authority from http://www.extensibility.com. Microsoft XML Notepad cannot be used to edit DTDs (although it can validate a document that has a DTD).

    In this chapter, we will build a DTD that defines a set of rules for the content of the sample Web document template we created in Chapter 3. The DTD can be used to verify that a set of XML documents is created according to the rules defined in the DTD by checking the validity of the documents.

    NOTE
    If you are building a large Internet system, you can define a set of rules that all developers must use when creating Web pages. If the Web pages are written using XML, a DTD can be used to verify that all the pages follow the rules. XML can also be used to pass information from one corporation to another or from one department to another within a corporation. The DTD can be used to verify that the incoming information is in the correct format.

    To open the sample document in XML Authority, follow these steps:

    1. Open XML Authority, select New from the File menu, and then select New (DTD) from the submenu. If a default UNNAMED element appears at the top of the document, delete it.
    2. Choose Import from the File menu, and then choose XML Document from the submenu.
    3. Select the Standard.xml document you created in Chapter 3. XML Authority will import the document as a DTD.

      Figure 4-1 shows Standard.xml displayed in XML Authority.

      Figure 4-1. The Standard.xml template displayed in XML Authority.

    4. Choose Source from the View menu.

    XML Authority automatically builds a DTD for the XML document, so in this case, the source is a DTD for the Standard.xml XML document. The complete source code that XML Authority generated is shown here:


      <!ELEMENT html  (head, body)>
    
      <!ELEMENT head  (title, base)>
    
      <!ELEMENT title  ( )>
    
      <!ELEMENT base  ( )>
      <!ATTLIST base  target CDATA  #REQUIRED>
      <!ELEMENT body  (basefont, a, table)>
      <!ATTLIST body  alink   CDATA  #REQUIRED
                      text    CDATA  #REQUIRED
                      bgcolor CDATA  #REQUIRED
                      link    CDATA  #REQUIRED
                      vlink   CDATA  #REQUIRED>
      <!ELEMENT basefont  ( )>
      <!ATTLIST basefont  size CDATA  #REQUIRED>
      <!ELEMENT a  ( )>
      <!ATTLIST a  href   CDATA  #IMPLIED
                   name   CDATA  #IMPLIED
                   target CDATA  #IMPLIED>
      <!ELEMENT table  (tr)>
      <!ATTLIST table  width       CDATA  #REQUIRED
                       rules       CDATA  #REQUIRED
                       frame       CDATA  #REQUIRED
                       align       CDATA  #REQUIRED
                       cellpadding CDATA  #REQUIRED
                       border      CDATA  #REQUIRED
                       cellspacing CDATA  #REQUIRED>
      <!ELEMENT tr  (td)>
      <!ATTLIST tr  bgcolor CDATA  #REQUIRED
                    valign  CDATA  #REQUIRED
                    align   CDATA  #REQUIRED>
      <!ELEMENT td  (CellContent)>
      <!ATTLIST td  bgcolor CDATA  #REQUIRED
                    valign  CDATA  #REQUIRED
                    align   CDATA  #REQUIRED
                    rowspan CDATA  #REQUIRED
                    colspan CDATA  #REQUIRED>
      <!ELEMENT CellContent  (h1, p)>
      <!ATTLIST CellContent  cellname CDATA  #REQUIRED>
      <!ELEMENT h1  ( )>
      <!ATTLIST h1  align CDATA  #REQUIRED>
      <!ELEMENT p  (font+, img, br, a, ul, ol)>
    
      <!ELEMENT font  (b)>
      <!ATTLIST font  color CDATA  #REQUIRED
                      face  CDATA  #REQUIRED
                      size  CDATA  #REQUIRED>
      <!ELEMENT b  ( )>
    
      <!ELEMENT img  ( )>
      <!ATTLIST img  width  CDATA  #REQUIRED
                     height CDATA  #REQUIRED
                     hspace CDATA  #REQUIRED
                     vspace CDATA  #REQUIRED
                     src    CDATA  #REQUIRED
                     alt    CDATA  #REQUIRED
                     align  CDATA  #REQUIRED
                     border CDATA  #REQUIRED
                     lowsrc CDATA  #REQUIRED>
      <!ELEMENT br  ( )>
      <!ATTLIST br  clear CDATA  #REQUIRED>
      <!ELEMENT ul  (font, li)>
      <!ATTLIST ul  type CDATA  #REQUIRED>
      <!ELEMENT li  (font, a)>
    
      <!ELEMENT ol  (font, li)>
      <!ATTLIST ol  type  CDATA  #REQUIRED
                    start CDATA  #REQUIRED>
      

    As you can see, the DTD consists of two basic components: !ELEMENT and !ATTLIST. In this chapter, we will look at these two statements in detail.

    NOTE


    The DTD that has been generated here is only the first approximation. In this chapter, you will refine this DTD so that it defines a set of rules for your XML documents.

    Every element used in your XML documents has to be declared by using the <!ELEMENT> tag in the DTD. The format for declaring an element in a DTD is shown here:


      <!ELEMENT ElementName Rule>
      

    The Rule component defines the rule for the content contained in the element. These rules define the logical structure of the XML document and can be used to check the document's validity. The rule can consist of a generic declaration and one or more elements, either grouped or unordered.

    Three generic content declarations are predefined for XML DTDs: PCDATA, ANY, and EMPTY.

    PCDATA

    The PCDATA declaration can be used when the content within an element is only text¡ªthat is, when the content contains no child elements. Our sample document contains several such elements, including title, a, h1, and b. These elements can be declared as follows. (The pound sign identifies a special predefined name.)


      <!ELEMENT title (#PCDATA)>
      <!ELEMENT a (#PCDATA)>
      <!ELEMENT h1 (#PCDATA)>
      <!ELEMENT b (#PCDATA)>
      

    NOTE
    PCDATA is also valid with empty elements.

    ANY

    The ANY declaration can include both text content and child elements. The html element, for example, could use the ANY declaration as follows:


      <!ELEMENT html  ANY>
      

    This ANY declaration would allow the body and head elements to be included in the html element in an XML document:


      <html><head/><body/></html>
      

    The following XML would also be valid:


      <html>This is an HTML document.<head/><body/></html>
      

    And this XML would be valid with the ANY declaration in our sample DTD:


      <html>This is an HTML document.<head/><body/><AnotherTag/></html>
      

    The ANY declaration allows any content to be marked by the element tags, provided the content is well-formed XML. Although this flexibility might seem useful, it defeats the purpose of the DTD, which is to define the structure of the XML document so that the document can be validated. In brief, any element that uses ANY cannot be checked for validity, only for being well formed.

    EMPTY

    It is possible for an element to have no content¡ªthat is, no child elements or text. The img element is an example of this scenario. The following is its definition:


      <!ELEMENT img EMPTY>
      

    The base, br, and basefont elements are also correctly declared using EMPTY in our sample DTD.

    Instead of using the ANY declaration for the html element, you should define the content so that the html element can be validated. The following is a declaration that specifies the content of the html element and is the same as the one given by XML Authority:


      <!ELEMENT html  (head, body)>
      

    This (head, body) declaration signifies that the html element will have two child elements: head and body. You can list one child element within the parentheses or as many child elements as are required. You must separate each child element in your declaration with a comma.

    For the XML document to be valid, the order in which the child elements are declared must match the order of the elements in the XML document. The comma that separates each child element is interpreted as followed by; therefore, the preceding declaration tells us that the html element will have a head child element followed by a body child element. Building on the preceding declaration, the following is valid XML:


      <html><head></head><body/></html>
      

    However, the following statement would not be valid:


      <html><body></body><head/></html>
      

    This statement indicates that the html element must contain two child elements¡ªthe first is body and the second is head¡ªand there can only be one instance of each element.

    The following two statements would also be invalid:


      <html><body></body></html>
      <html><head/><body/><head/><body/></html>
      

    The first statement is missing the head element, and in the second statement the head and body elements are listed twice.

    Reoccurrence

    You will want every html element to include one head and one body child element, in the order listed. Other elements, such as the body and table elements, will have child elements that might be included multiple times within the main element or might not be included at all. XML provides three markers that can be used to indicate the reoccurrence of a child element, as shown in the following table:

    XML Element Markers

    MarkerMeaning
    ? The element either does not appear or can appear only once (0 or 1).
    +the element must appear at least once (1 or more).
    *The element can appear any number of times, or it might not appear at all (0 or more).

    Putting no marker after the child element indicates that the element must be included and that it can appear only one time.

    The head element contains an optional base child element. To declare this element as optional, modify the preceding declaration as follows:


      <!ELEMENT head  (title, base?)>
      

    The body element contains a basefont element and an a element that are also optional. In our example, the table element is a required element used to format the page, so you want to make table a required element that appears only once in the body element. You can now rewrite the Body element as follows:


      <!ELEMENT body (basefont?, a?, table)>
      

    The table element can have as many rows as are needed to format the page but must include at least one row. The table element should now be written as follows:


      <!ELEMENT table (tr+)>
      

    The same conditions hold true for the tr element: the row element must have at least one column, as shown here:


      <!ELEMENT tr (td+)>
      

    The a, ul, and ol elements might not be included in the p element, or they might be included many times, as shown here:


      <!ELEMENT p (font+, img, br, a*, ul*, ol*)>
      

    Because the br element formats text around an image, the img and br tags should always be used together.

    Grouping child elements

    Fortunately, XML provides a way to group elements. For example, you can rewrite the p element as follows:


      <!ELEMENT p (font*, (img, br?)*, a*, ul*, ol*)>
      

    This declaration specifies that an img element followed by a br element appears zero or more times in the p element.

    One problem remains in this declaration. As mentioned, the comma separator can be interpreted as the words followed by. Thus, each p element will have font, img, br, a, ul, and ol child elements, in that order. This is not exactly what you want; instead, you want to be able to use these elements in any order and to use some elements in some paragraphs and other elements in other paragraphs. For example, you would like to be able to write the following code:


      <p>
          <font size=5> 
              <b>Three Reasons to Shop Northwind Traders</b>
          </font>
          <ol>
              <li>
                  <a href="Best.htm">Best Prices</a>
              </li>
              <li>
                  <a href="Quality.htm">Quality</a>
              </li>
              <li>
                  <a href="Service.htm">Fast Service</a>
              </li>
          </ol>
          <!--The following img element is not in the correct order.-->
          <img src="Northwind.jpg"></img>
      </p>
      

    As you can see, the img element is not in the correct order¡ªit should precede the ol element, since the declaration imposes a strict ordering on the elements.

    NOTE
    Also, numerous elements are declared but are not included (for example, ul). The missing elements are not a problem because you have declared each element with an asterisk (*), indicating that there can be zero or more of each element.

    To allow a "reordering" of elements, you could rewrite the declaration as follows:


      <!ELEMENT p  (font*, (img, br?)*, a*, ul*, ol*)+>
      

    The plus sign (+) at the very end of the declaration indicates that one or more copies of these child elements can occur within a p element.

    The preceding XML code could thus be interpreted as two sets of child elements, as shown here:


      <p>
          <!--The elements that follow are the first set of   
              (font*, (img, br?)*, a*, ul*, ol*) elements (missing 
              the (img, br), a, and ul elements).-->
    
          <font  size=5> 
              <b>Three Reasons to Shop Northwind Traders</b>
          </font>
          <ol>
              <li>
                  <a href="Best.htm">Best Prices</a>
              </li>
              <li>
                  <a href="Quality.htm">Quality</a>
              </li>
              <li>
                  <a href="Service.htm">Fast Service</a>
              </li>
          </ol>
          <!--The img element that follows is a second set of   
              (font*,(img, br?)*, a*, ul*, ol*) elements containing 
              only an img element.-->
          <img src="Northwind.jpg"></img>
      </p>
      

    This new declaration is better, but it still does not allow you to choose any element in any order. All of the elements have been declared as optional and yet at least one member of the group must still be included (as indicated by the plus sign at the end of the list of elements). There is another option.

    Creating an unordered set of child elements

    In addition to using commas to separate elements, you can use a vertical bar (|). The vertical bar separator indicates that one child element or the other child element but not both will be included within the element¡ªin other words, one element or the other must be present. The preceding declaration can thus be rewritten as follows:


      <!ELEMENT p  (font | (img, br?) | a | ul | ol)+>
      

    This declaration specifies that the p element can include a font child element, an (img, br?) child element, an a child element, a ul child element, or an ol child element, but only one of these elements. The plus sign (+) indicates that the element must contain one or more copies of one or several child elements. With this declaration, you can use child elements in any order, as many times as needed.

    NOTE
    The additional markers (?, +, *) can be used to override the vertical bar (|), which limits the occurrences of the child element to one or none.

    According to the new declaration, our XML code will be interpreted as follows:


      <p>
          <!--First group, containing single font element-->
          <font size=5>
              <b>Three Reasons to Shop Northwind Traders</b>
          </font>
          <!--Second group, containing the single child element ol-->
          <ol>
              <li>
                  <a href="Best.htm">Best Prices</a>
              </li>
              <li>
                  <a href="Quality.htm">Quality</a>
              </li>
              <li>
                  <a href="Service.htm">Fast Service</a>
              </li>
          </ol>
          <!--Third group, containing a single child element img-->
          <img src="Northwind.jpg"></img>
      </p>
      

    Suppose you also want to include text within the p element. To do this, you will need to add a PCDATA declaration to the group. You will have to use the vertical bar separator because you cannot use the PCDATA declaration if the child elements are separated by commas. You also cannot have a subgroup such as (img, br?) within a group that includes PCDATA. We can solve this problem by creating a new element named ImageLink that contains the subgroup and add it to the p element as follows:


      <!ELEMENT ImageLink  (img, br?)>
      <!ELEMENT p  (#PCDATA | font | ImageLink | a | ul | ol)+>
      

    Web browsers that do not understand XML will ignore the ImageLink element. When you use PCDATA within a group of child elements, it must be listed first and must be preceded by a pound sign (#).

    You can use the DTD to make certain sections of the document appear in a certain order and include a specific number of child elements (as was done with the html element). You can also create sections of the document that contain an unspecified number of child elements in any order. DTDs are extremely flexible and can enable you to develop a set of rules that matches your requirements.

    Every element can have a set of attributes associated with it. The attributes for an element are defined in an !ATTLIST statement. The format for the !ATTLIST statement is shown here:


      <!ATTLIST ElementName AttributeDefinition>
      

    ElementName is the name of the element to which these attributes belong.

    AttributeDefinition consists of the following components:


      AttributeName AttributeType DefaultDeclaration
      

    AttributeName is the name of the attribute. AttributeType refers to the data type of the attribute. DefaultDeclaration contains the default declaration section of the attribute definition.

    XML DTD attributes can have the following data types: CDATA, enumerated, ENTITY, ENTITIES, ID, IDREF, IDREFS, NMTOKEN, and NMTOKENS.

    CDATA

    The CDATA data type indicates that the attribute can be set to any allowable character value. For our sample DTD used for creating Web pages, the vast majority of the elements will have attributes with a CDATA data type. The following body attributes should all be CDATA:


      <!ATTLIST body  alink   CDATA  #REQUIRED
                      text    CDATA  #REQUIRED
                      bgcolor CDATA  #REQUIRED
                      link    CDATA  #REQUIRED
                      vlink   CDATA  #REQUIRED>
      

    Notice that you can list multiple attributes for a single element.

    Enumerated

    The enumerated data type lists a set of values that are allowed for the attribute. Using an enumerated data type, you can rewrite the font element to limit the color attribute to Cyan, Lime, Black, White, or Maroon; limit the size attribute to 2, 3, 4, 5, or 6; and limit the face attribute to Times New Roman or Arial. The new font declaration would look as follows:


      <!ATTLIST font  color (Cyan | Lime | Black | White | Maroon) #REQUIRED
                      size  (2 | 3 | 4 | 5 | 6)  #REQUIRED
                      face  (&apos;Times New Roman&apos;|Arial)  #REQUIRED>
      

    NOTE
    Keep in mind that this declaration is case sensitive. Thus, entering cyan as a color value would cause an error. Also notice the use of &apos; as a placeholder for a single quotation mark and the use of the parentheses to group the collection of choices.

    In the section "The Default Declaration" later in this chapter, you'll learn how to declare a default value for the color and size attributes.

    ENTITY and ENTITIES

    The ENTITY and ENTITIES data types are used to define reusable strings that are represented by a specific name. These data types will be discussed in detail in Chapter 5.

    ID, IDREF, and IDREFS

    Within a document, you may want to be able to identify certain elements with an attribute that is of the ID data type. The name of the attribute with an ID data type must be unique for all of the elements in the document. Other elements can reference this ID by using the IDREF or IDREFS data types. IDREFS can be used to declare multiple attributes as IDREF.

    When you work with HTML, you use anchor (a) elements to bookmark sections of your document. These bookmarks can be used to link to sections of the document. Unlike the ID data type, the a element does not have to be unique. In XML, IDs are used to create links to different places in your document. When we examine linking in detail in Chapter 6, you'll see that the ID data type offers other advantages.

    Our example document includes an a element at the top of the document as an anchor that can be used to jump to the top of the page. You can modify the a element definition in the DTD as follows:


      <!ATTLIST a  linkid   ID     #REQUIRED
                   href     CDATA  #IMPLIED
                   name     CDATA  #IMPLIED
                   target   CDATA  #IMPLIED>
      

    Now when you create an XML document, you can define an a element at the top of the page and associate a unique ID with it using the linkid attribute. To reference this ID from another element, you first have to add an IDREF attribute to that element, as shown here:


      <!ATTLIST ul   headlink IDREF  #IMPLIED
                     type     CDATA  #REQUIRED>
      

    In your XML document, you can associate the linkid attribute of the a element with the headlink attribute of the ul element by assigning the same value (HeadAnchor, for example) to these two attributes. If a second ID attribute, named footlink, was added to an element at the bottom of the XML document, you could define references to both of these elements. In this case, you would need to use IDREFS, as shown here:


      <!ATTLIST ul   headlink footlink  IDREFS  #IMPLIED
                     type               CDATA   #REQUIRED>
      

    The actual XML document would contain the following code:


      <a linkid="HeadAnchor" name="head">
          <!--Head anchor-->
      </a>
      <!--Some HTML code here-->
      <a href="#head">
          <ul headlink="HeadAnchor">
              <!--li elements here-->
          </ul>
      </a>
      <a href="#foot">
          <ul footlink="FootAnchor">
              <!--li elements here-->
          </ul>
      </a>
      <!--Some more HTML code here-->
      <a linkid="FootAnchor" name="foot"> 
          <!--Foot anchor-->
      </a>
      

    This code will work with non-XML browsers and with browsers that support XML.

    NMTOKEN and NMTOKENS

    The NMTOKEN and NMTOKENS data types are similar to the CDATA data type in that they represent character values. The name tokens are strings that consist of letters, digits, underscores, colons, hyphens, and periods. They cannot contain spaces. A declaration using these data types could look as follows:


      <!ATTLIST body
         background NMTOKEN "Blue"
         foreground NMTOKENS "Green, Yellow, Orange"
      >
      

    The default declaration can consist of any valid value for your attributes, or it can consist of one of three predefined keywords: #REQUIRED, #IMPLIED, or #FIXED. The #REQUIRED keyword indicates that the attribute must be included with the element and that it must be assigned a value. There are no default values when #REQUIRED is used. The #IMPLIED keyword indicates that the attribute does not have to be included with the element and that there is no default value. The #FIXED keyword sets the attribute to one default value that cannot be changed. The default value is listed after the #FIXED keyword. If none of these three keywords are used, a default value can be assigned if an attribute is not set in the XML document.

    Based on this information about the components of the !ELEMENT and !ATTLIST statements, we can rewrite our original DTD as follows:


      <!ELEMENT html  (head, body)>
    
      <!ELEMENT head  (title, base?)>
    
      <!ELEMENT title  (#PCDATA)>
    
      <!ELEMENT base EMPTY>
      <!ATTLIST base  target CDATA  #REQUIRED>
      <!ELEMENT body  (basefont?, a?, table)>
      <!ATTLIST body  alink   CDATA  #IMPLIED
                      text    CDATA  #IMPLIED
                      bgcolor CDATA  #IMPLIED
                      link    CDATA  #IMPLIED
                      vlink   CDATA  #IMPLIED>
      <!ELEMENT basefont EMPTY>
      <!ATTLIST basefont  size CDATA  #REQUIRED>
      <!ELEMENT a  (#PCDATA)>
      <!ATTLIST a  linkid ID     #IMPLIED
                   href   CDATA  #IMPLIED
                   name   CDATA  #IMPLIED
                   target CDATA  #IMPLIED>
      <!ELEMENT table  (tr+)>
      <!ATTLIST table  width       CDATA  #IMPLIED
                       rules       CDATA  #IMPLIED
                       frame       CDATA  #IMPLIED
                       align       CDATA  'Center'
                       cellpadding CDATA  '0'
                       border      CDATA  '0'
                       cellspacing CDATA  '0'>
      <!ELEMENT tr  (td+)>
      <!ATTLIST tr  bgcolor  (Cyan | Lime | Black | White | Maroon)  'White'
                    valign   (Top | Middle | Bottom)  'Middle'
                    align    (Left | Right | Center)  'Center'>
      <!ELEMENT td  (CellContent)>
      <!ATTLIST td  bgcolor  (Cyan | Lime | Black | White | Maroon)  'White'
                    valign   (Top | Middle | Bottom)  'Middle'
                    align    (Left | Right | Center)  'Center'
                    rowspan CDATA  #IMPLIED
                    colspan CDATA  #IMPLIED>
      <!ELEMENT CellContent  (h1?| p?)+>
      <!ATTLIST CellContent  cellname CDATA  #REQUIRED>
      <!ELEMENT h1  (#PCDATA)>
      <!ATTLIST h1  align CDATA  #IMPLIED>
      <!ELEMENT ImageLink  (img, br?)>
    
      <!ELEMENT p  (#PCDATA | font | ImageLink | a | ul | ol)+>
      <!ATTLIST p  align CDATA  #IMPLIED>
      <!ELEMENT font  (#PCDATA | b)*>
      <!ATTLIST font  color  (Cyan | Lime | Black | White | Maroon)  'Black'
                      face   (&apos;Times New Roman &apos;| Arial) #REQUIRED
                      size   (2 | 3 | 4 | 5 | 6)  '3'>
      <!ELEMENT b  (#PCDATA)>
    
      <!ELEMENT img EMPTY>
      <!ATTLIST img  width  CDATA  #IMPLIED
                     height CDATA  #IMPLIED
                     hspace CDATA  #IMPLIED
                     vspace CDATA  #IMPLIED
                     src    CDATA  #IMPLIED
                     alt    CDATA  #IMPLIED
                     align  CDATA  #IMPLIED
                     border CDATA  #IMPLIED
                     lowsrc CDATA  #IMPLIED>
      <!ELEMENT br EMPTY>
      <!ATTLIST br  clear CDATA  #REQUIRED>
      <!ELEMENT ul  (font?, li+)>
      <!ATTLIST ul  type CDATA  #IMPLIED>
      <!ELEMENT li  (font?| a?)+>
    
      <!ELEMENT ol  (font?, li+)>
      <!ATTLIST ol  type  CDATA  #REQUIRED
                    start CDATA  #REQUIRED>
      

    The body element contains two optional child elements, basefont and a, and one required element, table. For this example, because you are using a table to format the page and all information will go into the table, the table element is required. The a element is used to create an anchor to the top of the page, and the basefont element specifies the default font size for the text in the document. Because all of the attributes associated with the body element are optional, they include the keyword #IMPLIED.

    In the base element, the target attribute is required. It would make no sense to include a base element without specifying the target attribute, as the specification of this attribute is the reason you would use the base element. Therefore, the target attribute is #REQUIRED.

    In the font element, the color and size attributes have enumerated data types and are assigned default values (Black and 3). The face attribute remains unchanged.


    Now that the DTD has been created, it can be used to validate the Help.htm document we created in Chapter 3. There are two ways to associate a DTD with an XML document: the first is to place the DTD code within the XML document, and the second is to create a separate DTD document that is referenced by the XML document. Creating a separate DTD document allows multiple XML documents to reference the same DTD. We will take a look at how to declare a DTD first, and then examine how to place a DTD within the XML document.

    The !DOCTYPE statement is used to declare a DTD. For an internal DTD, called an internal subset, you can use the following syntax:


      <!DOCTYPE DocName [ DTD ]>
      

    The new XML document that combines Help.htm and the DTD would look like this:


      <!DOCTYPE HTML
      [
      <!ELEMENT html  (head, body)>
    
      <!ELEMENT head  (title, base?)>
    
      <!ELEMENT title  (#PCDATA)>
    
      <!ELEMENT base EMPTY>
      <!ATTLIST base  target CDATA  #REQUIRED>
      <!ELEMENT body  (basefont?, a?, table)>
      <!ATTLIST body  alink   CDATA  #IMPLIED
                      text    CDATA  #IMPLIED
                      bgcolor CDATA  #IMPLIED
                      link    CDATA  #IMPLIED
                      vlink   CDATA  #IMPLIED>
      <!ELEMENT basefont EMPTY>
      <!ATTLIST basefont  size CDATA  #REQUIRED>
      <!ELEMENT a  (#PCDATA)>
      <!ATTLIST a  linkid ID     #IMPLIED
                   href   CDATA  #IMPLIED
                   name   CDATA  #IMPLIED
                   target CDATA  #IMPLIED>
      <!ELEMENT table  (tr+)>
      <!ATTLIST table  width       CDATA  #IMPLIED
                       rules       CDATA  #IMPLIED
                       frame       CDATA  #IMPLIED
                       align       CDATA  'Center'
                       cellpadding CDATA  '0'
                       border      CDATA  '0'
                       cellspacing CDATA  '0'>
      <!ELEMENT tr  (td+)>
      <!ATTLIST tr  bgcolor  (Cyan | Lime | Black | White | Maroon) 'White'
                    valign   (Top | Middle | Bottom)  'Middle'
                    align    (Left | Right | Center)  'Center'>
      <!ELEMENT td  (CellContent)>
      <!ATTLIST td  bgcolor  (Cyan | Lime | Black | White | Maroon) 'White'
                    valign   (Top | Middle | Bottom)  'Middle'
                    align    (Left | Right | Center)  'Center'
                    rowspan CDATA  #IMPLIED
                    colspan CDATA  #IMPLIED>
      <!ELEMENT CellContent  (h1? | p?)+>
      <!ATTLIST CellContent  cellname CDATA  #REQUIRED>
      <!ELEMENT h1  (#PCDATA)>
      <!ATTLIST h1  align CDATA  #IMPLIED>
      <!ELEMENT ImageLink  (img, br?)>
    
      <!ELEMENT p  (#PCDATA | font | ImageLink | a | ul | ol)+>
      <!ATTLIST p  align CDATA  #IMPLIED>
      <!ELEMENT font  (#PCDATA | b)*>
      <!ATTLIST font  color  (Cyan | Lime | Black | White | Maroon) 'Black'
                      face   (&apos;Times New Roman &apos;| Arial)#REQUIRED
                      size   (2 | 3 | 4 | 5 | 6)  '3'>
      <!ELEMENT b  (#PCDATA)>
    
      <!ELEMENT img EMPTY>
      <!ATTLIST img  width  CDATA  #IMPLIED
                     height CDATA  #IMPLIED
                     hspace CDATA  #IMPLIED
                     vspace CDATA  #IMPLIED
                     src    CDATA  #IMPLIED
                     alt    CDATA  #IMPLIED
                     align  CDATA  #IMPLIED
                     border CDATA  #IMPLIED
                     lowsrc CDATA  #IMPLIED>
      <!ELEMENT br EMPTY>
      <!ATTLIST br  clear CDATA  #REQUIRED>
      <!ELEMENT ul  (font?, li+)>
      <!ATTLIST ul  type CDATA  #IMPLIED>
      <!ELEMENT li  (font? | a?)+>
    
      <!ELEMENT ol  (font?, li+)>
      <!ATTLIST ol  type  CDATA  #REQUIRED
                    start CDATA  #REQUIRED>
      ]>
    
      <html>
          <head>
              <title>Northwind Traders Help Desk</title>
              <base target=""><!--Default link for page--></base>
          </head>
          <body text="#000000" bgcolor="#FFFFFF" link="#003399" 
                alink="#FF9933" vlink="#996633">
              <!--Default display colors for entire body-->
              <a name="Top"><!--Anchor for top of page--></a>
              <table border="0" frame="" rules="" width="100%" align=""
                     cellspacing="0" cellpadding="0">
                  <!--Rules/frame is used with border-->
                  <tr valign="Center">
                      <td rowspan="" colspan="2" align="Center">
                          <!--Either rowspan or colspan can be used, but  
                              not both-->
                          <!--Valign: top, bottom, middle-->
                          <CellContent cellname="Table Header">
                              <h1 align="Center">Help Desk</h1>
                          </CellContent>
                      </td>
                  </tr>
                  <tr valign="Top">
                      <td rowspan="" colspan="" align="Left">
                          <CellContent cellname="Help Topic List">
                              <p align="">
                              <ul type="">
                              <font face="" color="" size="3">
                                  <b>For First-Time Visitors</b>
                              </font>
                              <li>
                              <a href="FirstTimeVisitorInfo.htm" target="">
                                  First-Time Visitor Information
                              </a>
                              </li>
                              <li>
                              <a href="SecureShopping.htm" target="">
                                  Secure Shopping at Northwind Traders
                              </a>
                              </li>
                              <li>
                              <a href="FreqAskedQ.htm" target="">
                                  Frequently Asked Questions
                              </a>
                              </li>
                              <li>
                              <a href="NavWeb.htm" target="">
                                  Navigating the Web
                              </a>
                              </li>
                              </ul>
                              </p>
                          </CellContent>
                      </td>
                      <td rowspan="" colspan="" align="Left">
                          <CellContent cellname="Shipping Links">
                              <p align="">
                              <ul type="">
                              <font face="">
                                  <b>Shipping</b>
                              </font>
                              <li>
                              <a href="Rates.htm" target="">
                                  Rates
                              </a>
                              </li>
                              <li>
                              <a href="OrderCheck.htm" target="">
                                  Checking on Your Order
                              </a>
                              </li>
                              <li>
                              <a href="Returns.htm" target="">
                                  Returns
                              </a>
                              </li>
                              </ul>
                              </p>
                          </CellContent>
                      </td>
                  </tr>
              </table>
          </body>
      </html>
      

    The marked-up text has remained the same with one exception. Any element that uses an enumerated data type cannot have an attribute set to an empty string (""). For example, if a tr element does not use the align attribute, the attribute must be removed from the element. Because a default value (Center) has been assigned in the DTD for the align attribute of the tr element, the default value will be applied only when the attribute is omitted.

    If you open this document in the browser, you will find that it almost works. The closing brackets (]>) belonging to the !DOCTYPE statement will appear in the browser, however, which is not acceptable. To solve this problem, save the original DTD in a file called StandardHTM.dtd, remove the empty attributes that have an enumerated data type, and reference the external file StandardHTM.dtd in the new file named HelpHTM.htm. The format for a reference to an external DTD is as follows:


      <!DOCTYPE RootElementName SYSTEM|PUBLIC [Name]DTD-URI>
      

    RootElementName is the name of the root element (in this example, html). The SYSTEM keyword is needed when you are using an unpublished DTD. If a DTD has to be published and given a name, the PUBLIC keyword can be used. If the parser cannot identify the name, the DTD-URI will be used. You must specify the location of the Uniform Resource Identifier (URI) of the DTD in the DTD-URI. A URI is a general type of system identifier. One type of URI is the Uniform Resource Locator (URL) you're familiar with from the Internet.

    For our example, we would need to add the following line of code to the beginning of the document HelpHTM.htm:


      <!DOCTYPE html SYSTEM "StandardHTM.dtd">
      

    A browser that does not understand XML will ignore this statement. Thus, by using an external DTD, you not only have an XML document that can be validated, but also one that can be displayed in any browser.


    You now know how to build a DTD to define a set of rules that can be used to validate an XML document. Using DTDs, a standard set of rules can be developed that can be used to create standard XML documents. These documents can be exchanged between corporations or internally within a corporation and validated using the DTD. The DTD can also be used to create standard documents within a group, such as a group that is building an e-commerce site.

    In Chapter 5, we'll look at entities. Entities enable you to create reusable strings within a DTD.


    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      Authoring (2)
      Book Samples (1)
      Database Related (2)
      Development (7)
      Introduction to XML (10)
      Java and XML (1)
      Miscellaneous (5)
      Parsing (2)
      PHP and XML (0)
      Style Sheets (8)
      Web Services (5)

    New

    Hot