• home
  • forum
  • my
  • kt
  • download
  • Entities and Other Components

    Author: 2007-08-27 17:10:41 From:

    In tutorial 4, we examined the two principal components of a document type definition (DTD): elements and attributes. In this tutorial, we will look at some additional components that can be added to the DTD. The focus of this tutorial will be entities, which are used to represent text that can be part of either the DTD or the XML document. You can use a single entity to represent a lengthy declaration and then use the entity in the DTD. You can also use entities to make one common file that contains a set of standard declarations that can be shared by many DTDs.

    Entities are like macros in the C programming language in that they allow you to associate a string of characters with a name. This name can then be used in either the DTD or the XML document; the XML parser will replace the name with the string of characters. All entities consist of three parts: the word ENTITY, the name of the entity (called the literal entity value), and the replacement text¡ªthat is, the string of characters that the literal entity value will be replaced with. All entities are declared in either an internal or an external DTD.

    Entities come in several types, depending on where their replacement text comes from and where it will be placed. Internal entities will get their replacement text from within the DTD, inside their declaration. External entities will get their replacement text from an external file. Both internal and external entities can be broken down into general entities and parameter entities. General entities are used in XML documents, and parameter entities are used in DTDs.

    Internal general entities, internal parameter entities, and external parameter entities always contain text that should be parsed. Because external general entities go within the body of a document and because you might want to insert a nontext file (such as an image) into the body of the document, external general entities can be parsed or unparsed. External parsed general entities are used to insert XML statements from external files into the XML document. External unparsed general entities are used to insert information into the XML document that is not text-based XML and should not be parsed. Thus, we have five basic entity categories: internal general entities, internal parameter entities, external parsed general entities, external unparsed general entities, and external parameter entities.

    Figure 5-1 illustrates the source of the replacement text for each of the entity categories (the closed circles) and where the replacement text will go (the arrows).

    Figure 5-1. Source and destination of the replacement text for the five entity categories.

    Let's begin by looking at internal entities. An entity that is going to be used in only one DTD can be an internal entity. If you intend to use the entity in multiple DTDs, it should be an external entity. In this section, you'll learn how to declare internal entities, where to insert them, and how to reference them

    Internal general entities are the simplest among the five types of entities. They are defined in the DTD section of the XML document. First let's look at how to declare an internal general entity.

    Declaring an internal general entity

    The syntax for the declaration of an internal general entity is shown here:


    <!ENTITY name "string_of_characters">

    NOTE
    As you can see from the syntax line above, characters such as angle brackets(< >) and quotation marks (" ") are used specifically for marking up the XML document; they cannot be used as content directly. So to include such a character as part of your content, you must use one of .XML's five predefined entities. The literal entity values for these predefined entities are &amp;, &lt;, &gt;, &quot;, and &apos;. The replacement text for these literal entity values will be &, <, >, ", and '.

    You can create your own general entities. General entities are useful for associating names with foreign language characters, such as ¨¹ or ß, or escape characters, such as <, >, and &. You can use Unicode character values in your XML documents as replacements for any character defined in the Unicode standard. These are called character references.

    To use a Unicode representation in your XML document, you must precede the Unicode character value with &#. You can use either the Unicode characters' hex values or their decimal values. For example, in Unicode, ¨¹ is represented as xFC and ß is represented as xDF. These two characters' decimal values are 252 and 223. Thus, in your DTD you could create general entities for the preceding two characters as follows:


      <!ENTITY u_um "&#xFC">
      <!ENTITY s_sh "&#xDF">
      

    The two entities could also be declared like this:


      <!ENTITY u_um "&#252">
      <!ENTITY s_sh "&#223">
      

    Using internal general entities

    To reference a general entity in the XML document, you must precede the entity with an ampersand (&) and follow it with a semicolon (;). For example, the following XML statement references the two general entities we declared in the previous section:


      <title>Gr&u_um;&s_sh;</title>
      

    When the replacement text is inserted by the parser, it will look like this:


      <title>Gr¨¹ß</title>
      

    Internal general entities can be used in three places: in the XML document as content for an element, within the DTD in an attribute with a #FIXED data type declaration as the default value for the attribute, and within other general entities inside the DTD. We used the first location in the preceding example: (<title>Gr&u_um;&s_
    sh;</title>).

    The second place you can use an internal general entity is within the DTD in an attribute with a #FIXED data type declaration or as the default value for an attribute. For example, you can use the following general entities in your DTD declaration to create entities for several colors:


      <!ENTITY Cy "Cyan">
      <!ENTITY Lm "Lime">
      <!ENTITY Bk "Black">
      <!ENTITY Wh "White">
      <!ENTITY Ma "Maroon">
      

    Then if you want the value of the bgcolor attribute for tr elements to be White for all XML documents that use the DTD, you could include the following line in the previous DTD declaration:


      <!ATTLIST tr align (Left | Right | Center) 'Center'
              valign (Top | Middle | Bottom) 'Middle'
              bgcolor CDATA #FIXED "&Wh;">
      

    The internal general entities must be defined before they can be used in an attribute default value since the DTD is read through once from beginning to end. In this case, internal general entities for several colors have been created. The bgcolor attribute is declared with the keyword #FIXED, which means that its value cannot be changed by the user¡ªthe value will always be White. The color general entities could also be used as content for the elements in the body section of the XML document.

    You can use the internal general entity as a default value¡ªfor example, bgcolor CDATA "&Wh;". In this case, if no value is given, &Wh; is substituted for bgcolor when the XML attribute is needed in the document body, and that reference will be converted to White.

    NOTE
    You can use an internal general entity in a DTD for a #FIXED attribute, but the attribute value will be assigned in the XML document's body only when the attribute is referenced. You cannot use an internal general entity in an enumerated type attribute declaration because the general entity would have to be interpreted in the DTD, which cannot happen.

    The third place you can use internal general entities is within other general entities inside the DTD. For example, we could use the preceding special character entities as follows:


      <!ENTITY u_um "&#252>
      <!ENTITY s_sh "&#223">
      <!ENTITY greeting "Gr&u_um;&s_sh;">
      

    At this point, it's not clear whether greeting will be replaced with Gr&u_um;&s_sh; in the XML document's body and then converted to Gr¨¹ß or whether greeting will be replaced directly with Gr¨¹ß when the entity is parsed. The order of replacement will be discussed in the section "Processing Order" later in this chapter.

    CAUTION
    When you include general entities within other general entities, circular references are not allowed. For example, the following construction is not correct:


      <!ENTITY greeting "&hello;! Gr&u_um;&s_sh;">
      <!ENTITY hello "Hello &greeting;">
      
    In this case, greeting is referencing hello, and hello is referencing greeting, making a circular reference.


    Internal parameter entities are interpreted and replaced within the DTD and can be used only within the DTD. While you need to use an ampersand (&) when referencing general entities, you need to use a percent sign (%) when referencing parameter entities.

    NOTE
    If you need to use a quotation mark, percent sign, or ampersand in your parameter or general entity strings, you must use character or general entity references¡ªfor example, &#x22, &#x25, &#x26, or &quot;, and &amp;. (There is no predefined entity for the percent sign, but you could create a general or parameter entity for it.)

    Declaring an internal parameter entity

    The syntax for declaring an internal parameter entity is shown here:


    <!ENTITY % name "string_of_characters">

    As you can see, the syntax for declaring an internal parameter entity is only slightly different from that used for declaring internal general entities¡ªa percent sign is used in front of the entity name. (The percent sign must be preceded and followed by a white space character.)

    In Chapter 4, we created a sample DTD for a static HTML page. If you want to create a dynamic page, you will probably want to add forms and other objects to your DTD. There is a standard set of events associated with all of these objects, but instead of listing the events for every declaration of every object, you could use the following parameter entity in your DTD:


      <!ENTITY % events
          "onclick    CDATA       #IMPLIED
          ondblclick  CDATA       #IMPLIED
          onmousedown CDATA       #IMPLIED
          onmouseup   CDATA       #IMPLIED
          onmouseover CDATA       #IMPLIED
          onmousemove CDATA       #IMPLIED
          onmouseout  CDATA       #IMPLIED
          onkeypress  CDATA       #IMPLIED
          onkeydown   CDATA       #IMPLIED
          onkeyup     CDATA       #IMPLIED"
          >
      

    This code declares a parameter entity named events that can be used as an attribute for all of your objects that have these attributes.

    NOTE
    You could have also declared a parameter entity named Script, and then used it within the events parameter entity declaration, as shown here:


      <!ENTITY % Script "CDATA">
      <!ENTITY % events
               "onclick     %Script;        #IMPLIED
                ondblclick  %Script;       #IMPLIED
                
      >
      

    The Script parameter entity allows you to use data type names that are more readable than just using CDATA. Although this code is more readable, some XML tools (such as XML Authority) cannot accept parameter entities used in this way. Be aware of this limitation if you use this technique.

    Using internal parameter entities

    The events parameter entity will be used in the attribute declaration of the form objects and in other elements, such as body. To reference a parameter entity, you must precede the entity with a percent sign and follow it with a semicolon. For example, you could now make this declaration:


      <!ATTLIST body
          alink    CDATA  #IMPLIED
          text     CDATA  #IMPLIED
          bgcolor  CDATA  #IMPLIED
          link     CDATA  #IMPLIED
          vlink    CDATA  #IMPLIED
          %events;
          onload   CDATA  #IMPLIED
          onunload CDATA  #IMPLIED
        >
      

    In this case, the internal parameter entity %events; has been added to the body element's attribute declaration. The parameter entity events could be used in any declaration in which these events are allowed.

    Now would be a good time to introduce a new standard that is being created for HTML. This new standard is called XHTML; it is also represented in a new version of HTML (version 4.01). The World Wide Web Consortium (W3C) standards committee is currently working out the last details of the standard, which is all about doing what we've done in the last few chapters, XMLizing HTML. You can find information about this standard by visiting http://www.w3.org.

    Basically, the XHTML standard introduces two content models: inline and block. The inline elements affect individual text elements, whereas the block elements affect entire blocks of text. These two elements are then used as child elements for other elements.

    Inline entities and elements

    The XHTML standard provides the following declarations for defining a series of internal parameter entities to be used to define the inline elements:


      <!ENTITY % special "br
                          | span
                          | img">
      <!ENTITY % fontstyle "tt
                            | i
                            | b
                            | big
                            | small">
      <!ENTITY % phrase "em
                         | strong
                         | q
                         | sub
                         | sup">
      <!ENTITY % inline.forms "input
                               | select
                               | textarea
                               | label
                               | button">
      <!ENTITY % inline "a
                         | %special;
                         | %fontstyle;
                         | %phrase;
                         | %inline.forms;">
      <!-- Entities that can occur at block or inline level. -->
      <!ENTITY % misc "script
                       | noscript">
      <!ENTITY % Inline " (#PCDATA
                         | %inline;
                         | %misc; )*">
      

    This declaration fragment builds the final Inline parameter entity in small pieces. Notice that the Inline entity definition contains the inline and misc entities and uses the technique described in Chapter 4 for including an unlimited number of child elements in any order¡ªin this example, using (#PCDATA | %inline; | %misc; )*.

    In the example DTD created in Chapters 3 and 4, the p element was used to organize the content within a cell. Although that usage makes sense, the purpose of the p element is to make text that is not included in a block element (such as text within an h element) word-wrap properly. Therefore, putting the h element or any of the block elements within a p element is not necessary because text within a block element is already word-wrapped. On the other hand, if any of the inline elements are used outside of a block element, they should be placed inside a p element so that the text element wraps properly. Therefore, you could rewrite the definition for the p element as follows:


      <!ELEMENT p %Inline;>
      

    This shows exactly the way the definition for the p element appears in the XHTML specification.

    Block entities and elements

    The XHTML standard also declares a set of internal parameter entities that can be used in the declarations of the block elements. These internal parameter entities appear as follows:


      <!ENTITY % heading "h1
                          | h2
                          | h3
                          | h4
                          | h5
                          | h6">
      <!ENTITY % lists "ul
                        | ol">
      <!ENTITY % blocktext "hr
                            | blockquote">
      <!ENTITY % block "p
                        | %heading;
                        | div
                        | %lists;
                        | %blocktext;
                        | fieldset
                        | table">
      <!ENTITY % Block " (%block;
                        | form
                        | %misc; )*">
      

    Notice that the Block entity contains the block entity, the misc entity, and the form element and also includes an unlimited number of these child elements in any order. Using the Block parameter entity, the declaration for the body element becomes the following:


      <!ELEMENT body %Block;>
      

    As you can see, using the parameter entities, you can give your document a clear structure.

    Using parameter entities in attributes

    The XHTML standard also uses parameter entities in attributes, as we saw earlier with the events entity. You could use this events entity and two additional entities to create an internal parameter entity for attributes shared among many elements, as shown here:


      <!-- Internationalization attributes
          lang        Language code (backward-compatible)
          xml:lang    Language code (per XML 1.0 spec)
          dir         Direction for weak/neutral text
      -->
      <!ENTITY % i18n " lang     NMTOKEN  #IMPLIED
                        xml:lang NMTOKEN  #IMPLIED
                        dir      (ltr | rtl )  #IMPLIED">
      <!ENTITY % coreattrs
                    " id      ID       #IMPLIED
                      class   CDATA    #IMPLIED
                      style   CDATA    #IMPLIED
                      title   CDATA    #IMPLIED">
    
      <!ENTITY % attrs " %coreattrs;
                         %i18n;
                         %events;">
      

    The language entity i18n can be understood by XML and non-XML compliant browsers and is used to mark elements as belonging to a particular language.

    NOTE
    For more information about language codes, visit the Web site http://www.oasis-open.org/cover/iso639a.html.

    The attrs parameter entity can be used for the most common attributes associated with the HTML elements in the DTD. For example, the body element's attribute can now be written as follows:


      <!ATTLIST body  %attrs;
                      onload   CDATA  #IMPLIED
                      onunload CDATA  #IMPLIED>
      

    Rewriting the sample DTD using parameter entities

    Ideally, you want your XML Web documents to be compatible with the new XHTML standard. Using entities and with other changes, the DTD example from Chapter 4 can be rewritten as follows:


      <!-- Entities that can occur at block or inline level. ====-->
    
      <!ENTITY % misc " script
                       | noscript">
      <!ENTITY % Inline "(#PCDATA | %inline; | %misc;)*">
    
      <!-- Entities for inline elements ================-->
      <!ENTITY % special "br
                          | span
                          | img">
    
      <!ENTITY % fontstyle "tt
                            | i
                            | b
                            | big
                            | small">
    
      <!ENTITY % phrase "em
                         | strong
                         | q
                         | sub
                         | sup">
    
      <!ENTITY % inline.forms "input
                               | select
                               | textarea
                               | label
                               | button">
    
      <!ENTITY % inline "a
                         | %special;
                         | %fontstyle;
                         | %phrase;
                         | %inline.forms;">
    
      <!ENTITY % Inline  "(#PCDATA
                         | %inline;
                         | %misc;)*">
    
      <!-- Entities used for block elements ============-->
      <!ENTITY % heading "h1
                          | h2
                          | h3
                          | h4
                          | h5
                          | h6">
    
      <!ENTITY % lists "ul
                        | ol">
    
      <!ENTITY % blocktext "hr
                            | blockquote">
    
      <!ENTITY % block "p
                        | %heading;
                        | div
                        | %lists;
                        | %blocktext;
                        | fieldset
                        | table">
    
      <!ENTITY % Block " (%block;
                        | form
                        | %misc; )*">
    
      <!-- Mixed block and inline ========================-->
      <!-- %Flow; mixes block and inline and is used for list
           items and so on. -->
      <!ENTITY % Flow " (#PCDATA
                       | %block;
                       | form
                       | %inline;
                       | %misc; )*">
      <!ENTITY % form.content " #PCDATA
                               | p
                               | %lists;
                               | %blocktext;
                               | a
                               | %special;
                               | %fontstyle;
                               | %phrase;
                               | %inline.forms;
                               | table
                               | %heading;
                               | div
                               | fieldset
                               | %misc; ">
    
      <!ENTITY % events " onclick     CDATA  #IMPLIED
                           ondblclick  CDATA  #IMPLIED
                           onmousedown CDATA  #IMPLIED
                           onmouseup   CDATA  #IMPLIED
                           onmouseover CDATA  #IMPLIED
                           onmousemove CDATA  #IMPLIED
                           onmouseout  CDATA  #IMPLIED
                           onkeypress  CDATA  #IMPLIED
                           onkeydown   CDATA  #IMPLIED
                           onkeyup     CDATA  #IMPLIED">
    
      <!ENTITY % i18n " lang     NMTOKEN  #IMPLIED
                           xml:lang NMTOKEN  #IMPLIED
                           dir      (ltr | rtl )  #IMPLIED">
    
      <!-- Core attributes common to most elements
       id       Document-wide unique ID
       class    Space-separated list of classes
       style    Associated style info
       title    Advisory title/amplification
      -->
      <!-- Style sheet data -->
      <!ENTITY % StyleSheet "CDATA">
      <!ENTITY % coreattrs " id    ID   #IMPLIED
                           class CDATA  #IMPLIED
                           style CDATA  #IMPLIED">
    
      <!ENTITY % attrs " %coreattrs;
                            %i18n;
                            %events;">
    
      <!-- End Entity Declarations  ====================-->
      <!ENTITY % URI "CDATA">
      <!--a Uniform Resource Identifier, see [RFC2396]-->
      <!ELEMENT html  (head, body)>
      <!ATTLIST html  %i18n;
                      xmlns CDATA  #FIXED 'http://www.w3.org/1999/xhtml'>
    
      <!ELEMENT head  (title, base?)>
      <!ATTLIST head  %i18n;
                      profile CDATA  #IMPLIED>
    
      <!ELEMENT title  (#PCDATA )>
      <!ATTLIST title  %i18n; >
    
      <!ELEMENT base EMPTY>
      <!ATTLIST base  target CDATA  #REQUIRED >
    
      <!ELEMENT body  (basefont? ,  (p )? , table )>
      <!ATTLIST body  alink   CDATA  #IMPLIED
                      text    CDATA  #IMPLIED
                      bgcolor CDATA  #IMPLIED
                      link    CDATA  #IMPLIED
                      vlink   CDATA  #IMPLIED >
    
      <!ELEMENT basefont EMPTY>
      <!ATTLIST basefont  size CDATA  #REQUIRED >
    
      <!-- generic language/style container ==============-->
      <!ELEMENT a  (#PCDATA )>
      <!ATTLIST a  %attrs;
                   href   CDATA  #IMPLIED
                   name   CDATA  #IMPLIED
                   target CDATA  #IMPLIED >
    
      <!ELEMENT table  (tr )+>
      <!ATTLIST table  %attrs;
                       width       CDATA  #IMPLIED
                       rules       CDATA  #IMPLIED
                       frame       CDATA  #IMPLIED
                       align       CDATA  'Center'
                       cellpadding CDATA  '0'
                       border      CDATA  '0'
                       cellspacing CDATA  '0' >
    
      <!ELEMENT tr  (td+ )>
      <!ATTLIST tr  %attrs; >
    
      <!ELEMENT td  (cellcontent )>
      <!ATTLIST td  %attrs;
                    bgcolor  (Cyan|Lime|Black|White|Maroon ) 'White'
                    align   CDATA  'Center'
                    rowspan CDATA  #IMPLIED
                    colspan CDATA  #IMPLIED >
    
      <!ELEMENT cellcontent  (%Block; | p?)+>
      <!ATTLIST cellcontent  cellname CDATA  #REQUIRED >
    
      <!ELEMENT h1 %Inline;>
      <!ATTLIST h1  align CDATA  #IMPLIED
                    %attrs; >
      <!ELEMENT h2 %Inline;>
      <!ATTLIST h2  align CDATA  #IMPLIED
                    %attrs; >
      <!ELEMENT h3 %Inline;>
      <!ATTLIST h3  align CDATA  #IMPLIED
                    %attrs; >
      <!ELEMENT h4 %Inline;>
      <!ATTLIST h4  align CDATA  #IMPLIED
                    %attrs; >
      <!ELEMENT h5 %Inline;>
      <!ATTLIST h5  align CDATA  #IMPLIED
                    %attrs; >
      <!ELEMENT h6 %Inline;>
      <!ATTLIST h6  align CDATA  #IMPLIED
                    %attrs; >
      <!ELEMENT p %Inline;>
      <!ATTLIST p  %attrs; >
    
      <!-- Inline Element Declarations =================-->
    
      <!-- Forced line break -->
      <!ELEMENT br EMPTY>
      <!ATTLIST br  %coreattrs;
                    clear     CDATA  #REQUIRED >
    
      <!-- Emphasis -->
      <!ELEMENT em %Inline;>
      <!ATTLIST em  %attrs; >
    
      <!-- Strong emphasis -->
      <!ELEMENT strong %Inline;>
      <!ATTLIST strong  %attrs; >
    
      <!-- Inlined quote -->
      <!ELEMENT q %Inline;>
      <!ATTLIST q  %attrs;
                   cite  CDATA  #IMPLIED >
    
      <!-- Subscript -->
      <!ELEMENT sub %Inline;>
      <!ATTLIST sub  %attrs; >
    
      <!-- Superscript -->
      <!ELEMENT sup %Inline;>
      <!ATTLIST sup  %attrs; >
    
      <!-- Fixed-pitch font -->
      <!ELEMENT tt %Inline;>
      <!ATTLIST tt  %attrs; >
    
      <!-- Italic font -->
      <!ELEMENT i %Inline;>
      <!ATTLIST i  %attrs; >
    
      <!-- Bold font -->
      <!ELEMENT b %Inline;>
      <!ATTLIST b  %attrs; >
    
      <!-- Bigger font -->
      <!ELEMENT big %Inline;>
      <!ATTLIST big  %attrs; >
    
      <!-- Smaller font -->
      <!ELEMENT small %Inline;>
      <!ATTLIST small  %attrs; >
    
      <!-- hspace, border, align, and vspace are not in the strict
          XHTML standard for img. -->
      <!ELEMENT img EMPTY>
      <!ATTLIST img  %attrs;
                    align  CDATA  #IMPLIED
                    border CDATA  #IMPLIED
                    width  CDATA  #IMPLIED
                    height CDATA  #IMPLIED
                    hspace CDATA  #IMPLIED
                    vspace CDATA  #IMPLIED
                    src    CDATA  #REQUIRED >
    
      <!ELEMENT ul  (font? , li+ )>
      <!ATTLIST ul  %attrs;
                    type  CDATA  'text' >
    
      <!ELEMENT ol  (font? , li+ )>
      <!ATTLIST ol  type  CDATA  'text'
                    start CDATA  #IMPLIED
                    %attrs; >
    
      <!ELEMENT li  %Flow; >
      <!ATTLIST li  %attrs; >
    
      <!--================= Form Elements===============-->
      <!--Each label must not contain more than one field.
          Label elements shouldn't be nested.
      -->
      <!ELEMENT label %Inline;>
      <!ATTLIST label  %attrs;
                       for   IDREF  #IMPLIED >
    
      <!ENTITY % InputType "(text | password | checkbox |
          radio | submit | reset |
          file | hidden | image | button)">
    
      <!-- The name attribute is required for all elements but
           the submit and reset elements. -->
      <!ELEMENT input EMPTY>
      <!ATTLIST input  %attrs; >
    
      <!ELEMENT select  (optgroup | option )+>
      <!ATTLIST select %attrs;>
    
      <!-- Option selector -->
      <!ATTLIST select name     CDATA  #IMPLIED>
      <!ATTLIST select size     CDATA  #IMPLIED>
      <!ATTLIST select multiple  (multiple)  #IMPLIED>
      <!ATTLIST select disabled  (disabled)  #IMPLIED>
      <!ATTLIST select tabindex CDATA  #IMPLIED>
      <!ATTLIST select onfocus  CDATA  #IMPLIED>
      <!ATTLIST select onblur   CDATA  #IMPLIED>
      <!ATTLIST select onchange CDATA  #IMPLIED>
      <!ELEMENT optgroup  (option )+>
      <!ATTLIST optgroup  %attrs;
                          disabled  (disabled )  #IMPLIED
                          label    CDATA  #REQUIRED>
    
      <!ELEMENT option  (#PCDATA )>
      <!ATTLIST option  %attrs;
                        selected  (selected )  #IMPLIED
                        disabled  (disabled )  #IMPLIED
                        label    CDATA  #IMPLIED
                        value    CDATA  #IMPLIED >
      <!-- Multiple-line text field -->
      <!ELEMENT textarea  (#PCDATA )>
      <!ATTLIST textarea  %attrs; >
    
      <!ELEMENT legend %Inline;>
      <!ATTLIST legend  %attrs; >
    
      <!--=================== Horizontal Rule ============-->
      <!ELEMENT hr EMPTY>
      <!ATTLIST hr  %attrs; >
      <!--=================== Block-like Quotes ==========-->
      <!ELEMENT blockquote %Block;>
      <!ATTLIST blockquote  %attrs;
                            cite  CDATA  #IMPLIED >
      <!-- The fieldset element is used to group form fields.
        Only one legend element should occur in the content,
        and if present it should be preceded only by white space.
      -->
    
      <!ELEMENT fieldset
         (#PCDATA | legend | %block; | form | %inline; | %misc; )*>
      <!ATTLIST fieldset  %attrs; >
    
      <!ELEMENT script  (#PCDATA )>
      <!ATTLIST script  charset   CDATA  #IMPLIED
                        type      CDATA  #REQUIRED
                        src       CDATA  #IMPLIED
                        defer     CDATA  #IMPLIED
                        xml:space CDATA  #FIXED 'preserve' >
    
      <!-- Alternative content container for non-script-based
           rendering -->
    
      <!ELEMENT noscript %Block;>
      <!ATTLIST noscript %attrs; >
    
      <!ELEMENT button  (#PCDATA | p | %heading; | div | %lists; |
         %blocktext; | table | %special; | %fontstyle; |
         %phrase; | %misc; )*>
      <!ATTLIST button  %attrs;
                        name      CDATA  #IMPLIED
                        value     CDATA  #IMPLIED
                        type      (button | submit | reset )  'submit'
                        disabled  (disabled )  #IMPLIED
                        tabindex  CDATA  #IMPLIED
                        accesskey CDATA  #IMPLIED
                        onfocus   CDATA  #IMPLIED
                        onblur    CDATA  #IMPLIED >
    
      <!ELEMENT span %Inline;>
      <!ATTLIST span  %attrs; >
      <!--The font element is not included in the XHTML standard. -->
      <!ELEMENT font  (b )>
      <!ATTLIST font  color CDATA  #REQUIRED
                      face  CDATA  #REQUIRED
                      size  CDATA  #REQUIRED >
    
      <!ELEMENT form %form.content;>
      <!ELEMENT div %Flow;>
      <!ATTLIST div %attrs; >
      

    This might look like a completely different DTD, but it is essentially the same as the DTD we created in Chapter 4. Only one structural change has occurred: the block elements, such as the h1 element, have been moved out of the p element and now are child elements of the body element. Several elements have been added, including the form element itself and its child elements (button, label, select, and so on) and the font formatting elements, including i and b. Numerous additions have been made to the attributes, including language, id, and the scripting events.

    XML documents built using this new DTD will still use a table to format and contain all of the elements that will be displayed in the browser. However, in the new DTD, the declaration for the body element is different from that in our original DTD. In our original DTD, the a (anchor) element at the top of the page is a child element of the body element. However, this element is not a child element of the body element in the XHTML standard. As we have seen, the declaration for the body element in the XHTML standard is as follows:


      <!ELEMENT body %Block;>
      

    As we have discussed, the Block internal parameter entity is declared as follows:


      <!ENTITY % Block " (%block; | form | %misc;)*">
      

    Replacing %block; and %misc; results in the following code:


      <!ENTITY % Block " (p | %heading; | div | %lists; |
          %blocktext; | fieldset | table | form | script |
          noscript)*">
      

    Replacing %heading; and %blocktext; will give you the actual declaration for the body element, as shown here:


      <!ENTITY % Block " (p | h1 | h2 | h3 | h4 | h5 | h6 | div | ul |
          ol |  hr | blockquote  | fieldset | table |
          form | script | noscript)*">
      

    NOTE
    It would be worth your time to go through the DTD and replace the entities with their actual values. You may also find it interesting to download the latest version of the XHTML standard and do all of the replacements in that document, too.

    Creating this expanded declaration manually took some time, but any of the DTD tools could have done this work for you in just a few moments. For example, Figure 5-2 shows our sample XHTML DTD as it appears in XML Authority.

    Figure 5-2. The Body element of the XHTML DTD displayed in XML Authority.

    The child elements of the Body element are readily visible. (You can scroll down to see the complete list.)

    NOTE
    You do not have to include all of these child elements in your DTD to be compatible with the XHTML standard; instead, you can include only those elements that you need for your projects. If you want to be compliant with the standard, however, you cannot add elements to the body element that are not included in the standard.

    Notice that the a element is not a child element of the XHTML body element; it is actually a child element of the p element. Therefore, you cannot use the declaration included in the original DTD we discussed in Chapter 4, shown here:


      <!ELEMENT body (basefont? , a? , table)>
      

    In this declaration, the a element is a child element of the body element, which does not comply with the standard. To solve this problem, you will need to use the p element, as shown here:


      <!ELEMENT body (basefont? , (p)? , table)>
      

    While this declaration makes the DTD conform to the XHTML standard, it also means that any of the inline elements, not just the a element, can be used in the body element as long as they are contained within a p element.

    Many child elements that are included in the body element of the XHTML standard are not included in the example DTD. This is because you are using the table to hold most of the content and do not need most of these child elements. You can think of the XML documents defined by the example DTD as a subset of the XML documents defined by the more general XHTML DTD. The example DTD includes only the structure you need for your documents.

    The XHTML standard declaration for the table cell element (td) is shown here:


      <!ELEMENT td %Flow;>
      

    If you replace the Flow parameter entity and all of the parameter entities contained within %Flow; as you did earlier for the body element, your final td declaration will look like this:


      <!ELEMENT td #PCDATA | p | h1|h2|h3|h4|h5|h6| div | ul | ol |
          hr | blockquote | fieldset | table | form | a | br | span |
          img | tt | i | b | big | small | em | strong | q | sub |
          sup |input | select | textarea | label | button | script |
          noscript>
      

    As you can see, the Flow entity includes virtually everything. You can use a td element as a container for all of the block and inline elements, which is exactly what you want to do.

    In the example DTD, the following declaration is created for the td element and the cellcontent element:


      <!ELEMENT td (cellcontent)>
      <!ELEMENT cellcontent (%Block;)+>
      

    This declaration doesn't comply with the XHTML standard. The cellcontent element does not belong to the standard; it was created for marking up the text. When you use custom elements, such as the cellcontent element in this example, you will need to remove them using Extensible Stylesheet Language (XSL). Using XSL, you can transform the preceding definitions to be:


      <!ELEMENT td (%Block;)+>
      

    This declaration will be compliant with the XHTML standard. We'll have a detailed discussion about XSL in Chapter 12

    Because of the changes in the DTD, you will have to make some minor changes to the sample HelpHTM.htm document we created in Chapter 4. You will now have to delete all the p elements because the block elements are no longer child elements of the p elements. You will also have to add several p elements to wrap the a elements. Change the a element at the beginning of the document as shown here:


      <p><a name="Top"><!--Top tag--></a></p>
      

    Then wrap all the links in the lists using the p element. For example, you can wrap the first link in the HelpHTM.htm document as follows:


      <p>
          <a href="FirstTimeVisitorInfo.html" target="">
              First-Time Visitor Information</a>
      </p>
      

    If you do this and then reference the new DTD, the document is valid.

     

    The parameter entities have made the overall DTD more compact, but have they made it more readable? In general, grouping items into parameter entities can make the document more readable, but keep in mind that if you go too far and create too many parameter entities, it might be nearly impossible for a human to read your DTD. For example, most developers would consider the basic form objects (button, label, textArea, and so on) to be the primary child elements of a form element. However, you will need to dig through many layers of the XHTML DTD to discover that these elements are actually child elements of the form element.

    In the XHTML DTD, the form objects are defined in an internal parameter entity named inline.forms, which is included in the inline parameter entity. The inline entity is used in the Inline parameter entity, which in turn is used in the p element's declaration. The p element is included in the block parameter entity's declaration, and the block entity is included in the form.content parameter entity. Finally, the form.content entity is included in the form element's declaration, as shown here:


      <!ENTITY % inline.forms "input | select | textarea | label 
          | button">
      <!ENTITY % inline 
                "a | %special; | %fontstyle; | %phrase; | %inline.forms;">
      <!ENTITY % Inline "(%inline;| %misc;)*">
      <!ELEMENT p %Inline;>
      <!ENTITY % block
          "p | %heading; | div | %lists; | %blocktext; | fieldset |
           table">
      <!ENTITY % form.content "(%block; | % inline; | %misc;)*">
      <!ELEMENT form %form.content;>
      

    To use a form object such as select, you will need to include the following statement in your XML document:


      <form><p><select/></p></form>
      

    There is another path to the form objects. Notice that the block entity declaration includes a fieldset element. The fieldset element also contains the inline element, just as the p element did, as shown here:


      <!ENTITY % inline.forms "input | select | textarea | label | 
          button">
      <!ENTITY % inline 
          "a | %special; | %fontstyle; | %phrase; |
          %inline.forms;">
      <!ELEMENT fieldset (#PCDATA | legend | %block; | form |
          %inline; | %misc;)*>
      <!ENTITY % block
          "p | %heading; | div | %lists; | %blocktext; | fieldset |
          table">
      <!ENTITY % form.content "(%block; | %misc;)*">
      <!ELEMENT form %form.content;>
      

    To use a form object such as select in this case, you would include the following statement in your XML document:


      <form><fieldset><select/></fieldset></form>
      

    You can use an XML tool to view this relationship. An excellent tool for viewing the structure of an XML DTD is Near and Far, available at http://www.microstar.com. Without an XML tool, the parameter entities make the DTD nearly impossible to read. Try to strike a balance by using enough parameter entities to create reusable groups that make your DTD neater but not so many parameter entities that your DTD is unreadable.

    You must also be careful that the document is still valid and well formed once the parameter entity has been substituted. For example, consider the following declaration:


      <!ENTITY % Inline " (#PCDATA
                         | %inline;
                         | %misc;)*">
      

    As you can see, this declaration is missing the closing parenthesis. When the Inline parameter entity is substituted, it will create an invalid declaration. Be sure that all your components are properly nested, opened, and closed after the entities are substituted.

    A common problem when working with XML is finding errors in your XML documents and your DTDs. Often XML tools display cryptic error messages that leave you with no idea as to the real source of a problem. XML Notepad, which was used to write the code in this book, can be used for writing and debugging XML documents that have no external DTDs. XML Authority works well with DTDs and usually provides clear error messages that help you locate errors in your DTD. If you are working with an XML document that references an external DTD, Web Writer usually provides helpful error messages. All of these products provide trial versions. Try them all, and then choose the tools that best meet your needs. Be aware that sometimes a small error in a DTD could take a long time to track down (for example, using Block instead of block in the preceding DTD will cause an error that might take several hours to track down).


    In this section, we'll look at the three categories of external entities: external parsed general entities, external unparsed general entities, and external parameter entities. External entities can be used when more than one DTD uses the same entities. You can reduce the amount of time it takes to produce new DTDs by creating a repository of documents containing entity declarations.

    External parsed general entities enable you to store a piece of your XML document in a separate file. An external parsed general entity can be set equal to this external XML document. Using the external general entity, the external XML file can be referenced anywhere in your XML document.

    Declaring an external parsed general entity

    The syntax for declaring an external general entity is shown here:


    <!ENTITY name SYSTEM URI>

    Notice that the external general entity declaration uses a keyword following the entity name. This keyword can be SYSTEM or PUBLIC. The PUBLIC identifier is used when the document is officially registered. The SYSTEM identifier is used with unregistered documents that are located using a URI, which stands for Uniform Resource Identifier, to tell the parser where to find the object referenced in the declaration. Since we are now working with unregistered documents, we will use the SYSTEM identifier in the examples below.

    Using external parsed general entities

    External parsed general entities can be referenced in the document instance and in the content of another general entity. Unlike internal general entities, external parsed general entities cannot be referenced in an attribute value. To reference an external parsed general entity, you need to precede the entity with an ampersand and follow it with a semicolon, the same way you reference internal general entities. Let's look at how to use external parsed general entities in the XML document. Since our sample file HelpHTM.htm is a well-formed XML document, we can save it as Help.xml. To divide the Web page in this document into header, footer, left navigation bar, and body sections, add the following code to the Help.xml:


      <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
      <!DOCTYPE html SYSTEM  "StandXHTML.dtd" [
      <!ENTITY topheader SYSTEM "Topheader.htm">
      <!ENTITY leftnav SYSTEM "Leftnav.htm">
      <!ENTITY footer SYSTEM "Footer.htm">
      <!ENTITY body SYSTEM "Body.htm">
      ]>
      <html>
          <head>
              <title>Northwind Traders Help Desk</title>
          </head>
          <body text="#000000" bgcolor="#FFFFFF" link="#003399"
              alink="#FF9933"  vlink="#996633">
              &topheader;
              &leftnav;
              &body;
              &footer;
          </body>
      </html>
      

    Using this new DTD, the Body.htm file referenced in our sample Web help page would look like this:


      <html>
          <p><a name="Top"><!--Top tag--></a></p>
            <table border="5" frame="" rules="" width="100%" 
                cellspacing="0" cellpadding="0">
               <tr>
                   <td  colspan="2" align="Center">
                       <cellcontent cellname="Help Topic List ">
                           <h1 align="Center">Help Desk</h1>
                       </cellcontent>
                   </td>
               </tr>
               <tr  valign="Top" >
                   <td align="Left" >
                       <cellcontent cellname="First-Time Visitor">
                           <ul >
                           <font  size="3">
                              <b>For First-Time Visitors</b>
                           </font>
                           <li>
                           <p>
                           <a href="FirstTimeVisitorInfo.html"   
                              target=""> First-Time Visitor Information</a>
                           </p>
                           </li>
                           <li>
                           <p>
                           <a href="SecureShopping.html" target="">
                              Secure Shopping at Northwind Traders</a>
                           </p>
                           </li>
                           <li>
                           <p>
                           <a href="FreqaskedQ.htm" target="">
                              Frequently Asked Questions</a>
                           </p>
                           </li>
                           <li>
                           <p>
                           <a href="NavWeb.html" target="">
                              Navigating the Web</a>
                           </p>
                           </li>
                           </ul>
                       </cellcontent>
                   </td>
                   <td align="Left">
                       <cellcontent cellname="Shipping links">
                           <ul type="">
                               <font size="3">
                                   <b>Shipping</b>
                               </font>
                               <li>
                               <p> 
                                   <a href="Rates.htm" target="">Rates</a>
                               </p>
                               </li>
                               <li>
                                   <p>
                                   <a href="OrderCheck.htm" target="">
                                       Checking on Your Order</a>
                                   </p>
                               </li>
                               <li>
                               <p>
                                   <a href="Returns.htm" target="">
                                       Returns</a>
                               </p>
                               </li>
                           </ul>
                       </cellcontent>
                   </td>
              </tr>
          </table>
      </html>
      

    Similarly you can create three other external files: Topheader.htm, Leftnav.htm, and Footer.htm. All of the rules that apply to internal general entities also apply to the external parsed general entities. Only the declaration and the source of the replaced text are different.

    External unparsed general entities are similar to other entities, except that the XML parser will not try to parse the information within them. Essentially, the data within an external unparsed general entity is ignored by the XML parser and passed on to the application that is using the document in its original format. This is exactly what we want done for non-XML files such as images.

    Notations

    External unparsed general entities contain one additional component: notations. Notations are used by the application to identify the data in the external unparsed general entity or to identify what application needs to be used to interpret the data. For example, if the data contained in the entity is a GIF image file, the following notation would identify it:


      <!NOTATION GIF89a SYSTEM
          "-//Compuserve//NOTATION Graphic Interchange Format 89a//EN">
      

    It would be up to the application to determine how to interpret this information and present the image properly.

    Notations can be declared in two different ways. The first method is used when the notation is not public and is located at some URI. It uses the syntax shown here:


    <!NOTATION notation_name SYSTEM resource_URI>

    The second method is used for a notation that has been registered as public and given a unique ID. It uses the following syntax:


    <!NOTATION notation_name PUBLIC public_ID resource_URI>

    Examples of the two types of declarations are shown here:


      <!NOTATION GIF89a SYSTEM
          "-//Compuserver//NOTATION Graphic Interchange Format 89a//EN">
      <!NOTATION GIF SYSTEM "GIF">
      <!NOTATION BMP SYSTEM "MSPAINT.EXE">
      <!NOTATION GIF89a PUBLIC "-//Compuserve//NOTATION Graphic
          Interchange Format 89a//EN" "ps4prp.exe">
      

    Declaring an external unparsed general entity

    Once you have created a notation, you can use the notation to declare external unparsed general entities. The format for these declarations is similar to the declarations for external parsed general entities, except that in this case a notation appears at the end of the declaration. The NDATA keyword is used to associate the external unparsed general entity with a particular notation. The syntax for the declaration is shown here:


    <!ENTITY entity_name SYSTEM URI NDATA  notation_name>

    Using our second notation definition, you could create the following declaration:


      <!ENTITY image.topnav SYSTEM "topnav.gif" NDATA GIF>
      

    Now that you have defined the notation and then defined an external unparsed general entity that uses this notation, you will want to use this external unparsed general entity in your XML document body. For example, you might want to insert this GIF image at the top of a Web page.

    Using external unparsed general entities

    When you are using an external unparsed general entity as a value for an attribute in your XML document, you will want the XML parser to ignore the data returned by the entity. To accomplish this, you must tell the XML parser that you are referencing an external unparsed general entity in the declaration of the attribute. The ENTITY or ENTITIES keyword will be used in the attribute declaration to mark an attribute as containing an external unparsed general entity reference, as shown here:


      <!--Part of the DTD-->
      <!NOTATION gif SYSTEM "gif">
      <!NOTATION jpeg SYSTEM "jpg">
      <!NOTATION bmp SYSTEM "bmp">
      <!ENTITY image.topimage SYSTEM "topimage.gif" NDATA gif>
      <!ENTITY image.topnav1 SYSTEM "topnav1.gif" NDATA gif>
      <!ENTITY image.topnav2 SYSTEM "topnav2.gif" NDATA gif>
      <!ENTITY Welcome SYSTEM "Welcome.jpg" NDATA jpg>
      <!ELEMENT topimages EMPTY>
      <!ATTLIST topimages
          topimage ENTITY #FIXED "image.topimage"
          topnav ENTITIES "image.topnav1 image.topnav2">
      <!ELEMENT img EMPTY>
      <!ATTLIST img  %attrs;
                    align  CDATA  #IMPLIED
                    border CDATA  #IMPLIED
                    width  CDATA  #IMPLIED
                    height CDATA  #IMPLIED
                    hspace CDATA  #IMPLIED
                    vspace CDATA  #IMPLIED
                    src    ENTITY #REQUIRED 
                    type   NOTATION (gif|jpg|bmp) "jpg">
      <!--XML Body-->
      <topimages topimage="image.topimage" topnav="image.topnav1
          image.topnav2"></topimages>
      <img src = "Welcome"></img>
      

    This code declares two elements: topimages and img. The topimages element has two attributes associated with it: topimage and topnav. The img element is the one used in the DTD example discussed in the "Rewriting the sample DTD using parameter entities" section, except that here it contains the type attribute. The type attribute is a notation attribute, as it contains the keyword NOTATION. The items listed in the enumerated type must be defined in the DTD as notations, as is done in the above declaration.

    External parameter entities are just like internal parameter entities except that they retrieve the replacement text from external files.

    Declaring an external parameter entity

    The syntax for declaring an external parameter entity is similar to the declarations for internal parameter entities, except that the SYSTEM keyword or the PUBLIC keyword is used. The syntax for the declaration is shown here:


    <!ENTITY % name SYSTEM  "string_of_characters">

    To use the external parameter entity, you could place all of the parameter entities that were defined in the example DTD in a file named Parameter.dtd. To do so, you would add the following code to the XML document:


      <!ENTITY % parameterentities SYSTEM "Parameter.dtd">
      %parameterentities;
      <!--================ Document Structure=========================-->
      <!ELEMENT html  (head , body)>
      <!ATTLIST html  %i18n;
                      xmlns CDATA #FIXED 'http://www.w3.org/1999/xhtml'>
    
      <!--Rest of DTD here-->
      

    First we declare the parameterentities entity, which links to the external Parameter.dtd, and then we use parameterentities to insert this document into the XML document. This external parameter entity could be used to create several DTDs. External parameter entities are useful when parts of your DTD will be used by several other DTDs.

    It is important to understand exactly how the DTD will be processed, especially if it includes internal and external entities. We are most interested in the processing order of the different types of entities because the processing order will affect the final result of the DTD and the XML document if the DTD and the XML document include entities and the entities are substituted. Before we examine processing order, let's look at the rules for processing a document:

    • If a document contains more than one entity declaration using the same entity name, the first entity declaration that is encountered will be used. All subsequent declarations with the same name will be ignored.
    • If a document contains more than one attribute declaration using the same attribute name, the first attribute declaration that is encountered will be used. All subsequent declarations with the same name will be ignored.
    • If more than one element declaration has the same element name, a fatal error will be raised by the processor.

    Now that you know the rules for processing an XML document, let's look at the processing order that the processor follows:

    1. The internal subset of the DTD is read before everything else. This guarantees that any attribute or entity definitions listed in the internal subset will override any definitions in an externally referenced DTD. Developers can still use external DTDs, but they can override the declarations in the external DTDs.
    2. If the internal DTD contains external parameter entities, these entities will be replaced when the processor reaches them in the DTD. The internal DTD will be expanded to include the replacement text, and the replacement text will be processed. Once this is done, the rest of the internal DTD is processed. If the internal DTD contains additional external parameter entities, they will be replaced in the same manner when the processor reaches them. All general entities will be ignored at this step.
    3. Once the entire internal DTD is processed, any external DTDs referenced in the DOCTYPE declaration using the PUBLIC or SYSTEM keyword will be processed.
    4. Once the internal and external DTDs are processed and validated, the processor will replace general entities in the document body when they are referenced in the document.

    Thus, you can create general external DTD documents containing declarations that apply to a large set of applications. You can override entities and attributes in these external DTDs in an internal DTD because the internal DTD will be processed first. Notice that the DTD is validated before the general entities are replaced. This order explains why general entities can never be used in any part of your declarations that are being validated, such as the enumerated values for an attribute.


    The XML specification defines conditional sections of your DTD so that you can decide to include or exclude portions of your DTD. The conditional sections can occur only in the external subset of the DTD and in those external entities referenced from the internal subset. The syntax is shown here:


      <![ INCLUDE [
          <!--The declarations you want to include-->
      
      ]]>
      <![ IGNORE [
          <!--The declarations you want to ignore-->
          
      ]]>
      

    If you combine these conditional sections with parameter entities, you will have a way to include and exclude blocks of text by changing the values of the parameter entities. For example, if you wanted to include declarations that could be used for debugging your application, you could add the following declaration:


      <!ENTITY % debug "INCLUDE">
      <![ %debug; [
          <!--Debugging code here -->
      ]]>
      

    You could turn debugging off by changing the entity declaration as follows:


      <!ENTITY % debug "IGNORE">
      

    Entities provide a useful shorthand notation that allows you to assign strings (binary data) to a particular name. This name can then be inserted into either the DTD (parameter entities) or the XML document body (general entities).

    Using XML tools such as XML Authority or Near and Far, you can build DTDs from these entities. The tools will also help you view the structure of complicated documents. Entities used carefully can make DTDs more readable; too many entities can make your DTD readable only by using one of the XML tools.

    External entities enable you to include external files in your document. These files can be reusable declarations for your DTDs, reusable XML code for your XML document, and non-text information in your document body. By carefully planning the structure of your documents, how you are going to build them, and what information they will contain, you can create a set of reusable documents using entities and external DTDs.

    In Chapter 6, we will discuss four additional XML specifications: XLink, XPath, XPointer, and Namespaces. The first three specifications are used for placing links in your documents. Namespaces are used to prevent names from clashing when a DTD is imported.

     


    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      Authoring (2)
      Book Samples (1)
      Database Related (2)
      Development (7)
      Introduction to XML (10)
      Java and XML (1)
      Miscellaneous (5)
      Parsing (2)
      PHP and XML (0)
      Style Sheets (8)
      Web Services (5)

    New

    Hot