• home
  • forum
  • my
  • kt
  • download
  • XML Schemas

    Author: 2007-08-27 16:42:09 From:

    Up to now, we've been looking almost exclusively at document type definitions (DTDs) as a way of defining rules for an XML document. Although this is an excellent method, there are a few problems with DTDs. The most obvious problem is the fact that DTDs are written in their own special text format, not in XML. It would make a great deal of sense to create a document written in XML to define the rules of an XML document.

    In the XHTML sample we discussed in tutorial 5, data types were not that important¡ªall the document content was of the string data type. Often, however, you will have documents that contain several different data types, and you will want to be able to validate these data types. Unfortunately, DTDs are not designed for validating data types or checking ranges of values. DTDs also do not understand namespaces.

    To solve these problems, schemas were invented. Unlike DTDs, which have their own peculiar syntax, XML schemas are written in XML. In addition to providing the information that DTDs offer, schemas allow you to specify data types, use namespaces, and define ranges of values for attributes and elements. In this tutorial, you'll learn about XML schemas and how to use them in your XML documents. We'll look at the XML schema data types and their categories and then explore how to create simple and complex data types. Finally, we'll examine namespaces used in XML schemas.

    For the most part, XML documents fall into two categories: document-oriented and data-oriented. The document-oriented XML document contains text sections mixed with field data, whereas the data-oriented XML document contains only field data. The XHTML document we created in Chapter 5 is an example of a document-oriented XML document. Another example of a document-oriented XML document is a message such as the one shown here:


      <message priority="high"
          date="2000-01-11">
          <from>Jake Sturm</from>
          <to>Gwen Sturm</to>
          <subject>DNA Course</subject>
          <body>
              The new DNA course that we are offering is now complete. 
              It will provide a complete overview discussion of 
              designing and building DNA systems, including DNS, DNA,
              COM, and COM+. The course is also listed on the Web site, at 
              http://ies.gti.net.
          </body>
      </message>
      

    This message has a large text body, but it also contains attributes¡ªin this case, date and priority. The date attribute has a date data type, and the priority attribute has an enumerated data type. It will be useful to be able to validate that these attributes are correctly formatted for these two data types. Schemas will allow you do this.

    A data-oriented document looks like this:


      <bill>
          <OrderDate>2001-02-11</OrderDate>
          <ShipDate>2001-02-12</ShipDate>
          <BillingAddress>
              <name>John Doe</name>
              <street>123 Main St.</street>
              <city>Anytown</city>
              <state>NY</state>
              <zip>12345-0000</zip>
          </BillingAddress>
          <voice>555-1234</voice>
          <fax>555-5678</fax>
      </bill>
      

    This entire document contains data fields that will need to be validated. Validating data fields is an essential aspect of this type of XML document. We'll look at an example schema for a data-oriented document in the section "A Schema for a Data-Oriented XML Document" later in this chapter.

    Up to this point, we've been looking only at document-oriented XML documents that contain only one data type (the string data type) because DTDs work best with document-oriented XML documents that contain only string data types. Because schemas allow you to validate datatype information, it's time now to take a look at data types as they are defined in the schema specification.

    The term data type is defined in the second schema standard, which can be found at http://www.w3.org/TR/xmlschema-2/. A data type represents a type of data, such as a string, an integer, and so on. The second schema standard defines simple data types in detail, and that's what we'll look at in this section.

    In a schema, a data type has three parts: a value space, a lexical space, and a facet. The value space is the range of acceptable values for a data type. The lexical space is the set of valid literals that represent the ways in which a data type can be displayed¡ªfor example, 100 and 1.0E2 are two different literals, but both denote the same floating point value. A facet is some characteristic of the data type. A data type can have many facets, each defining one or more characteristics. Facets specify how one data type is different from other data types. Facets define the value space for the data type.

    There are two kinds of facets: fundamental and constraining. Fundamental facets define the data type, and constraining facets place constraints on the data type. Examples of fundamental facets are rules specifying an order for the elements, a maximum or minimum allowable value, the finite or infinite nature of the data type, whether the instances of the data type are exact or approximate, and whether the data type is numeric. Constraining facets can include the limit on the length of a data type (number of characters for a string or number of bits for a binary data type), minimum and maximum lengths, enumerations, and patterns.

    We can categorize the data types along several dimensions. First, data types can be atomic or aggregate. An atomic data type cannot be divided. An integer value or a date that is represented as a single character string is an atomic data type. If a date is presented as day, month, and year values, the date is an aggregate data type.

    Data types can also be distinguished as primitive or generated. Primitive data types are not derived from any other data type; they are predefined. Generated data types are built from existing data types, called basetypes. Basetypes can be primitive or generated data types. Generated types, which will be discussed later in the chapter, can be either simple or complex data types.

    Primitive data types include the following: string, Boolean, float, decimal, double, timeDuration, recurringDuration, binary, and uri. In addition, there is also the timeInstant data type that is derived from the recurringDuration data type. Among these primitive data types, two of them are specific to XML schemas: timeDuration, and recurringDuration. The timeInstant data type is also specific to XML. Let's have a look at them here.

    The timeInstant data type represents a combination of date and time values that represent a specific instance of time. The pattern is shown here:


      CCYY-MM-DDThh:mm:ss.sss
      

    CC represents the century, YY is the year, MM is the month, and DD is the day, preceded by an optional leading sign to indicate a negative number. If the sign is omitted, a plus sign (+) is assumed. The letter T is the date/time separator, and hh, mm, and ss.sss represent the hour, minute, and second values. Additional digits can be used to increase the precision of fractional seconds if desired. To accommodate year values greater than 9999, digits can be added to the left of this representation.

    The timeInstant representation can be immediately followed by a Z to indicate the Universal Time Coordinate (UTC). The time zone information is represented by the difference between the local time and UTC and is specified immediately following the time and consists of a plus or minus sign (+ or -) followed by hh:mm.

    The timeDuration data type represents some duration of time. The pattern for timeDuration is shown here:


      PyYmMdDThHmMsS
      

    Y represents the number of years, M is the number of months, D is the number of days, T is the date/time separator, H is the number of hours, M is the number of minutes, and S is the number of seconds. The P at the beginning indicates that this pattern represents a time period. The number of seconds can include decimal digits to arbitrary precision. An optional preceding minus sign is allowed to indicate a negative duration. If the sign is omitted, a positive duration is assumed.

    The recurringDuration data type represents a moment in time that recurs. The pattern for recurringDuration is the left-truncated representation for timeInstant. For example, if the CC century value is omitted from the timeInstant representation, that timeInstant recurs every hundred years. Similarly, if CCYY is omitted, the timeInstant recurs every year.

    Every two-character unit of the representation that is omitted is indicated by a single hyphen (-). For example, to indicate 1:20 P.M. on May 31 of every year for Eastern Standard Time that is 5 hours behind UTC, you would write the following code:


      --05-31T13:20:00-05:00
      

    New simple data types can be created by using simpleType elements. A simplified version of a DTD declaration required for the simpleType element is shown below. (For a complete declaration, see the schema specification at http://www.w3.org/IR/xmlschema-2/.)


      <!ENTITY % ordered ' (minInclusive | minExclusive) | (maxInclusive | 
          maxExclusive) | precision | scale '>
      <!ENTITY % unordered 'pattern | enumeration | length | maxlength |
          minlength | encoding | period'>
      <!ENTITY % facet '%ordered; | %unordered;'>
      <!ELEMENT simpleType ((annotation)?, (%facet;)*)>
      <!ATTLIST simpleType
          name     NMTOKEN        #IMPLIED
          base      CDATA         #REQUIRED
          final     CDATA           ''
          abstract (true | false) 'false'
          derivedBy (list | restriction | reproduction) 'restriction'>
      <!ELEMENT annotation (documentation)>
    
      <!ENTITY % facetAttr 'value CDATA #REQUIRED'>
      <!ENTITY % facetModel '(annotation)?'>
      <!ELEMENT maxExclusive %facetModel;>
      <!ATTLIST maxExclusive %facetAttr;>
      <!ELEMENT minExclusive %facetModel;>
      <!ATTLIST minExclusive %facetAttr;>
    
      <!ELEMENT maxInclusive %facetModel;>
      <!ATTLIST maxInclusive %facetAttr;>
      <!ELEMENT minInclusive %facetModel;>
      <!ATTLIST minInclusive %facetAttr;>
    
      <!ELEMENT precision %facetModel;>
      <!ATTLIST precision %facetAttr;>
      <!ELEMENT scale %facetModel;>
      <!ATTLIST scale %facetAttr;>
    
      <!ELEMENT length %facetModel;>
      <!ATTLIST length %facetAttr;>
      <!ELEMENT minlength %facetModel;>
      <!ATTLIST minlength %facetAttr;>
      <!ELEMENT maxlength %facetModel;>
      <!ATTLIST maxlength %facetAttr;>
    
      <!-- This one can be repeated. -->
      <!ELEMENT enumeration %facetModel;>
      <!ATTLIST enumeration %facetAttr;>
      <!ELEMENT pattern %facetModel;>
      <!ATTLIST pattern %facetAttr;>
      <!ELEMENT encoding %facetModel;>
      <!ATTLIST encoding %facetAttr;>
      <!ELEMENT period %facetModel;>
      <!ATTLIST period %facetAttr;>
      <!ELEMENT documentation ANY>
      <!ATTLIST documentation source CDATA #IMPLIED>
      <!ELEMENT documentation ANY>
      <!ATTLIST documentation 
                source   CDATA #IMPLIED
                xml:lang CDATA #IMPLIED>
      

    As you can see, the simpleType element, which represents a simple data type, can be either ordered or unordered. An ordered type can be placed in a specific sequence. Positive integers are ordered¡ªthat is, you can start at 0 and continue to the maximum integer value. Unordered data types do not have any order, and would include data types such as a Boolean that cannot be placed in a sequence. Using the preceding DTD, you can create your own simple data types. These simple data types can then be used in your schemas to define elements and attributes.

    Unordered data types include Boolean and binary data types. All of the numeric data types are ordered. Strings are ordered, but when you are defining your own string data types, they will be defined with the unordered elements.

    For each data type, numerous possible child elements can be used to define the simpleType element. Each child element will contain an attribute with the value for the child element and an optional comment. The child elements define facets for the data types you create.

    Let's look now at how to create simple data types using ordered and unordered facets.

    Using ordered facets

    Notice that in the previous code listing, ordered facets consist of the following facets: maxExclusive, minExclusive, maxInclusive, minInclusive, precision, and scale. The value of maxExclusive is the smallest value for the data type outside the upper bound of the value space for the data type. The value of minExclusive is the largest value for the data type outside the lower bound of the value space for the data type. Thus, if you wanted to have an integer data type with a range of 100 to 1000, the value of minExclusive would be 99 and the value of maxExclusive would be 1001. The simple data type could be declared as follows:


      <simpleType name="limitedInteger" base="integer">
          <minExclusive = "99"/>
          <maxExclusive = "1001"/>
      </simpleType>
      

    The minInclusive and maxInclusive facets work in the same way as minExclusive and maxExclusive, except that the minInclusive value is the lower bound of the value space for a data type, and the maxInclusive is the upper bound of the value space for a data type. Our simple data type could be rewritten as follows:


      <simpleType name="limitedInteger" base="integer">
          <minInclusive = "100"/>
          <maxInclusive = "1000"/>
      </simpleType>
      

    Precision is the number of digits that will be used to represent a number. The scale, which must always be less than the precision, represents the number of digits that will appear to the right of the decimal place. For example, a data type that does not go above but includes 1,000,000 and that has two digits to the right of the decimal place (1,000,000.00) has a precision of 9 (ignore commas and decimals) and a scale of 2. The declaration would look as follows:


      <simpleType name="TotalSales" base="integer">
         <minInclusive = "0"/>
         <maxInclusive = "1000000"/>
         <precision = "9"/>
         <scale = "2"/>
      </simpleType>
      

    If you had left out the maxInclusive facet, numbers up to 9,999,999 would have been valid. If you had needed a value less than 1,000,000, the following declaration would have been sufficient:


      <simpleType name="TotalSales" base="integer">
         <precision = "8"/>
         <scale = "2"/>
      </simpleType>
      

    Now that you have learned how to use ordered facets to create simple data types, let's look at how to use unordered facets to create simple data types.

    Using unordered facets

    In the previous code, you can see that unordered facets are made up of the following facets: period, length, maxLength, minLength, pattern, enumeration, and encoding.

    For time data types, you can use the period facet to define the frequency of recurrence of the data type. The period facet is used in a timeDuration data type. For example, if you wanted to create a special holiday data type that includes recognized U.S. holidays, you could use the following declaration:


      <simpleType name="holidays" base="date">
         <annotation>
            <documentation>Some U.S. holidays</documentation>
         </annotation>
         <enumeration value='--01-01'>
            <annotation>
               <documentation>New Year's Day</documentation>
            </annotation>
         </enumeration>
         <enumeration value='--07-04'>
            <annotation>
               <documentation>Fourth of July</documentation>
            </annotation>
         </enumeration>
         <enumeration value='--12-25'>
            <annotation>
               <documentation>Christmas</documentation>
            </annotation>
         </enumeration>
      </simpleType>
      

    When you use the length facet, the data type must be a certain fixed length. Using length, you can create fixed-length strings. The maxLength facet represents the maximum length a data type can have. The minLength facet represents the smallest length a data type can have. Using minLength and maxLength, you can define a variable-length string that can be as small as minLength and as large as maxLength.

    The pattern facet is a constraint on the value space of the data type achieved by constraining the lexical space (the valid values). The enumeration facet limits the value space to a set of values. The encoding facet is used for binary types, which can be encoded as either hex or base64. In addition to containing a facet, simple data types also contain a set of attributes that can be used to define the data type. Let's now take a look at these attributes.

    Attributes for simple data types

    Notice in the code below that the simpleType element has the following attributes: name, base, abstract, final, and derivedBy. The name attribute can be either a built-in type or a user-defined type. The base attribute is the basetype that is being used to define the new type. The final attribute is discussed in detail later in this chapter. The abstract attribute of a data type is beyond the scope of this book. For more information about this attribute, refer to the schema specification.

    The derivedBy attribute can be set to list, restriction, or reproduction. The list value allows you to create a data type that consists of a list of items separated by space. For example, you can use the following declaration to create a list data type:


      <simpleType name='StringList' base='string' derivedBy='list'/>
      

    This data type can then be used in an XML document to create a new list type, as shown here:


      <myListElement xsi:type='StringList'>
         This is not list item 1.
         This is not list item 2.
         This is not list item 3.
      </myListElement>
      

    By using xsi, you overrode the default declaration of the myListElement and made it a StringList data type. Since a StringList data type contains a list of strings, you can now use a list of strings as content for the myListElement. The xsi namespace will be discussed in more detail later in the chapter.

    Up to this point, we have been discussing the XML schema 2 specification, which covers simple data types. The XML schema 1 specification covers all the general issues involving schemas and also covers complex data types. Let's now take a look at the complex data types described in the first schema specification.


    A data type can either be simple or complex. Simple data types include the data types discussed in the previous section. Complex data types contain the child elements that belong to an element, as well as all the attributes that are associated with the element. If you visit http://www.w3.org/TR/xmlschema-1/, you'll find the first schema specification. Combining this specification with that for simple data types, you will have the complete schema specification. Our discussion of complex data types in this section will include an explanation of all of the elements of a schema, a sample DTD for a schema, and numerous examples showing how to create a schema and complex data types.

    The schema was generated from the XHTML DTD we created in Chapter 5. You can open XHTMLschema.xsd in XML Authority version 1.2 or higher.

    Because schemas are written as well-formed XML documents, you can also view the schema in any other XML tools, such as XML Spy or Microsoft XML Notepad. For example, Figure 7-1 shows the schema as it would appear in XML Spy.

    Figure 7-1. The schema in XML Spy.

    As you can see in Figure 7-1, a schema has a well-defined structure. This structure includes a root element named schema, with one or more element child elements. These element elements can have complexType child elements; the complexType elements can in turn have annotation, group, attributeGroup, and attribute child elements. Clearly, this schema is a well-formed XML document.

    In Figure 7-1, you can also see that the essential components of a schema are element, complexType, and simpleType elements. Essentially, a schema is all about associating data types with element elements. A portion of the source code from XHTMLschema.xsd is shown below.


      <schema targetNamespace = "XHTMLschema.xsd"
          xmlns = "http://www.w3.org/xmlschema">
          <element name = "html">
              <complexType content = "elementOnly">
                  <annotation>
      <documentation>
      a Uniform Resource Identifier, see [RFC2396]
      </documentation>
                  </annotation>
              <group>
                  <sequence>
                      <element ref = "head"/>
                      <element ref = "body"/>
                  </sequence>
              </group>
              <attributeGroup ref = "i18n"/>
              </complexType>
          </element>
          <element name = "head">
              <complexType content = "elementOnly">
              <group>
                  <sequence>
                      <element ref = "title"/>
                      <element ref = "base" minOccurs = "0"
                          maxOccurs = "1"/>
                  </sequence>
              </group>
              <attributeGroup ref = "i18n"/>
              <attribute name = "profile" type = "string"/>
              </complexType>
          </element>
          <element name = "title">
              <complexType content = "textOnly">
                  <attributeGroup ref = "i18n"/>
              </complexType>
          </element>
          <element name = "base">
              <complexType content = "empty">
                  <attribute name = "target" use = "required"
                      type = "string"/>
              </complexType>
          </element>
          <element name = "atop">
              <complexType content = "elementOnly">
                  <sequence>
                      <element ref = "p"/>
                      <element ref = "a"/>
                  </sequence>
              </complexType>
          </element>
          <element name = "body">
              <complexType content = "elementOnly">
                  <group>
                      <sequence>
                          <element ref = "basefont" minOccurs = "0"
                              maxOccurs = "1"/>
                          <element ref = "table"/>
                      </sequence>
                  </group>
                 <attribute name = "alink" type = "string"/>
                 <attribute name = "text" type = "string"/>
                 <attribute name = "bgcolor" type = "string"/>
                 <attribute name = "link" type = "string"/>
                 <attribute name = "vlink" type = "string"/>
              </complexType>
          </element>
          
          <element name = "h1">
              <complexType content = "elementOnly">
                  <sequence>
                      <group ref = "Inline" />
                  </sequence>
                  <attributeGroup ref = "attrs"/>
                  <attribute name = "align" type = "string"/>
              </complexType>
          </element>
          
      </schema>
      

     

    This particular version of the schema does not use anything like the entities in a DTD¡ªeverything is listed out here. Schemas do provide components that are similar to parameter entities, which will be discussed later in this chapter. Comments located within the schema element are contained within documentation elements. The schema element is the root for the document. The schema element and other elements and attributes will be discussed in detail in the next section.

    The schema specification provides a fairly complex DTD that can be used to define every possible schema. This DTD is designed to work with a wide range of possible schemas and to cover every possible condition. Here we'll work with a simplified DTD that presents a subset of the schema specification DTD. Any schema that conforms to the simplified DTD will also conform to the schema specification DTD.

    A simplified DTD for schemas is shown below. (For the full DTD, visit http://www.w3.org/TR/xmlschema-1 to see the schema specification.)


      <!ENTITY % xs-datatypes PUBLIC 'datatypes' 'Datatypes.dtd'>
         %xs-datatypes;
      <!ELEMENT schema  ((include | import | annotation )*,
                         (element, simpleType, complexType, 
                          attributeGroup, group, notation)*>
      <!ATTLIST schema  targetNamespace CDATA  #IMPLIED
                        version         CDATA  #IMPLIED
                        xmlns           CDATA  #REQUIRED
                        xmlns:dt        CDATA  #REQUIRED >
      <!ELEMENT element  ((annotation)?, (complexType | simpleType)?,
                          (unique | key | keyref)*)>
      <!ATTLIST element  type      CDATA  #IMPLIED
                         name      CDATA  #IMPLIED
                         ref       CDATA  #IMPLIED
                         minOccurs  (1 | 0 ) #IMPLIED
                         maxOccurs CDATA  #IMPLIED
                         id        ID     #IMPLIED
                         nullable (true | false ) 'false'
                         default CDATA #IMPLIED
                         fixed   CDATA #IMPLIED >
      <!ELEMENT complexType  (((annotation)? , (%ordered;, %unordered;)* |
          (element | all | choice | sequence | group | any )*, 
          (attribute | attributeGroup) , anyAttribute )>
      <!ATTLIST complexType  content  
                (mixed | empty | textOnly | elementOnly ) #REQUIRED
                name CDATA #REQUIRED
                derivedBy "(restriction|extension|reproduction)" #IMPLIED
                base CDATA #IMPLIED
                id    ID   #IMPLIED 
                final
                block>
      <!ELEMENT group ((annotation)?, (all | choice | sequence)*)>
      <!ATTLIST group
                  minOccurs   CDATA                '1'
                  maxOccurs   CDATA                #IMPLIED
                  order       (choice | seq | all) 'seq'
                  name        CDATA                #IMPLIED
                  ref         CDATA                #IMPLIED
                  id          ID                   #IMPLIED> 
      <!ELEMENT all ((annotation)?, (element | group | any | 
                     choice | sequence)*)>
      <!ATTLIST all minOccurs CDATA #FIXED '1'
                    maxOccurs CDATA #FIXED '1'
                    id         ID   #IMPLIED>
      <!ELEMENT choice ((annotation)?, (element | group | any | choice | 
                         sequence)*)>
      <!ATTLIST choice minOccurs CDATA '1'
                    maxOccurs CDATA #IMPLIED
                    id         ID   #IMPLIED>
      <!ELEMENT sequence ((annotation)?, (element | group | any |
                           choice | sequence)*)>
      <!ATTLIST sequence minOccurs CDATA '1'
                    maxOccurs CDATA #IMPLIED
                    id         ID   #IMPLIED>
      <!ELEMENT attribute  ((annotation)?, (simpleType)? )>
      <!ATTLIST attribute  type       CDATA  #IMPLIED
                           default    CDATA  #IMPLIED
                           fixed      CDATA  #IMPLIED
                           name       CDATA  #REQUIRED
                           minOccurs  (0|1)    '0' 
                    maxOccurs  (0|1)     '1' >
      <!ELEMENT attributeGroup ((annotation)?, 
           (attribute | attributeGroup)*, 
           (anyAttribute)?)>
      <!ELEMENT anyAttribute EMPTY>
      <!ATTLIST anyAttribute
                namespace    CDATA   '##any'>
      <!ELEMENT unique ((annotation)?, selector, (field)+)>
      <!ATTLIST unique name     CDATA       #REQUIRED
                         id       ID        #IMPLIED
                         uniqueAttrs>
      <!ELEMENT key    ((annotation)?, selector, (field)+)>
      <!ATTLIST key     name     CDATA       #REQUIRED
                         id       ID         #IMPLIED
                         keyAttrs>
    
      <!ELEMENT keyref ((annotation)?, selector, (field)+)>
      <!ATTLIST keyref  name     CDATA       #REQUIRED
                         id       ID         #IMPLIED
                         refer   CDATA       #REQUIRED>
      <!ELEMENT any EMPTY>
      <!ATTLIST any
                  namespace       CDATA                  '##any'
                  processContents (skip|lax|strict)      'strict'
                  minOccurs       CDATA                  '1'
                  maxOccurs       CDATA                  #IMPLIED>
      <!ELEMENT selector (#PCDATA)>
      <!ELEMENT field (#PCDATA)>
      <!ELEMENT include EMPTY>
      <!ATTLIST include schemaLocation CDATA #REQUIRED>
      <!ELEMENT import EMPTY>
      <!ATTLIST import namespace      CDATA #REQUIRED
                       schemaLocation CDATA #IMPLIED>
      

    This DTD includes all the essential elements of a schema and also includes the data types' DTD. All the schema elements that will be defined in this chapter are listed. Notice that the elements you saw in XML Spy are now much more visible. The DTD uses a set of elements and attributes to define the structure of a schema document. The principal elements of a schema are simpleType, datatype, enumeration, schema, annotation, complexType, element, attribute, attributeGroup, and group. We've already looked at the first three elements; we'll examine the remaining elements next.

    The schema element corresponds to the root element defined in a DTD. In a schema, all element elements are child elements of the schema root element. We will discuss the attributes of the schema element in the section on namespaces in this chapter.

     

    NOTE


    Technically speaking, the DTD for a schema in the specification does not require that the schema element be the root element. The usual definition of a schema does have a schema element as the root, however.

    The annotation element is used to create comments within the complexType element. Comments are contained within one of two possible child elements of the annotation element: appinfo and documentation. The documentation element is used for human-readable comments. The appinfo elements are used for application-readable comments, as shown here:


      <annotation>
          <appinfo>
              The machine-readable comment goes here. 
          </appinfo>
      </annotation>
      

    Notice that the comment is content of the annotation element, which means that it is not enclosed in the usual comment symbols (<!--¡­ -->). When the annotation element is an allowable child element for an element, it will always be the first child element.

    You can think of the complexType element as equivalent to a combination of the attributes and the child element list enclosed in parentheses in the element element declaration used in a DTD¡ªessentially, it defines the child elements and attributes for an element element. The complexType element will define the element elements, attributes, or a combination that will be associated with an element element that has attributes or child elements. The simplified DTD in "A DTD for Schemas" declared the complexType element as follows:


      <!ELEMENT complexType  ( ((annotation)?, (%ordered;, %unordered;)*|
          (element | all | choice | sequence | group | any )*, 
          (attribute | attributeGroup), anyAttribute)>
      <!ATTLIST complexType  content  
                (mixed | empty | textOnly | elementOnly)  #REQUIRED
                name CDATA #REQUIRED
                derivedBy "(restriction|extension|reproduction)"> #IMPLIED
                base CDATA #IMPLIED
                id    ID   #IMPLIED >
      

    The complexType element can contain three types of elements in the following order: comment, element, and attribute. The comment is located in the annotation element. Element information is usually defined using element or group elements. You can also use choice, sequence, any, or all elements to define the attributes within a complexType, as described later in this section. Attributes can be defined using the attribute, attributeGroup, or group elements.

    The schema in "A DTD for Schemas" uses what is called an embedded complexType declaration¡ªthe declaration is embedded in the element declaration. The following fragment shows the complexType element embedded within the element element:


      <element name = "title">
          <complexType content = "textOnly">
             <attributeGroup ref = "i18n"/>
          </complexType>
      </element>
      

    The complexType declarations have a scope, specifying where the data type can be seen in the document. Embedded datatype declarations can be seen only within the element in which they are embedded¡ªthat is, they have local scope. Thus, the title element can see the complexType element declared inside of it, but this complexType declaration is not visible from anywhere else in the document. You can also declare complexType elements outside of an element element. The complexType elements declared outside the element element are visible to the entire document and have document scope. You can reference a document scope element using the ref attribute. The document scope complexType elements will be discussed in detail later in this chapter.

    NOTE
    As we have mentioned, the schema element can contain element, simpleType, complexType, atttributeGroup, and group elements as child elements. When any of these elements are child elements of the schema element, they also have document scope.

    The content attribute can be textOnly, mixed, elementOnly, or empty. If the content consists of only text and no elements, you can use textOnly. For both text and elements, you would use mixed. If the content is only elements, you would use elementOnly. When there is no content, you can use empty.

    The ref attribute is used to reference document scope elements. The ref attribute can be used with attributeGroup, element, and group elements. When used with the attributeGroup element, it can reference only simpleType elements.

    When an element element is included as the content of the complexType element, it represents a child element. Thus, the following code declares a child element of h1:


      <element name = "h1">
          <complexType content = "mixed">
              <element ref = "a"/>
               
      

    Notice that the ref attribute is used to reference the name of the child element, in this case, a.

    You can also use the minOccurs and maxOccurs attributes with the child element to specify its occurrence, as shown here:


      <element name = "h1">
          <complexType content = "mixed">
              <element ref = "a" minOccurs = "0" maxOccurs = "1"/>
      
      

    We'll discuss the minOccurs and maxOccurs attributes in the next section. When you use an element that uses the ref attribute, it's as if the element that is being referenced is substituting the element that contains the ref attribute.

    As shown in the code in "A DTD for Schemas," the simplified DTD declaration for an element element is as follows:


      <!ELEMENT element  ((annotation)?, (complexType | simpleType)?,
                          (unique | key | keyref)*)>
      <!ATTLIST element  type      CDATA  #IMPLIED
                         name      CDATA  #IMPLIED
                         ref       CDATA  #IMPLIED
                         minOccurs  (1 | 0 )  #IMPLIED
                         maxOccurs CDATA  #IMPLIED
                         id        ID     #IMPLIED
                         nullable (true | false ) 'false'
                         default CDATA #IMPLIED
                         fixed CDATA #IMPLIED >
      

    The name attribute is the name of the element. The name attribute must follow all the rules defined for DTD element names. You can define your element using a complexType element, a simpleType element, or a type attribute. The type attribute and either the simpleType element or the complexType element are mutually exclusive. If you are declaring a data type, then one and only one of these must be used for the datatype declaration to be valid.

    The type attribute

    The type attribute associates either a simple or complex data type with an element. As we've seen, simple data types are either the predefined simple data types or simple data types you define based on these predefined simple data types. Complex data types can be used to associate attributes, elements, or a combination of both to an element. For example, you can declare the simple data type String24 and associate it with the customerName element, as shown here:


      <simpleType name="String24" base="string">
          <maxLength= "24"/ >
          <minLength = "0"/>
      </simpleType>
      <element name = "customerName" type = "String24"/>
      

    In this case, you have created a data type named String24 that has a length between 0 and 24 characters. This data type is then used in the element declaration, which means that the customerName element will be a string that is between 0 and 24 characters.

    The customerName declaration uses document scope, meaning that all elements in the document can see the String24 data type. The type attribute can be used to assign either a complex or a simple data type with document scope to an element.

    The minOccurs and maxOccurs attributes

    Notice that the minOccurs and maxOccurs attributes are also used in the DTD declaration for an element element to specify the number of occurrences of an element. When working with DTDs, we used the markers *, ?, and + to indicate the number of times a particular child element could be used as content for an element. For attributes, we used #IMPLIED for optional attributes, #REQUIRED for required attributes, #FIXED for attributes that had a fixed default value, and a default value when the attribute was optional. In schemas, both elements and attributes use the minOccurs and maxOccurs attributes. The minOccurs and the maxOccurs attributes are also used with the group element; the choice, sequence, and all elements that are contained within the group element; and the any element.

    When used with elements, the minOccurs and maxOccurs attributes specify the number of occurrences of the element. For example, if an element has a minOccurs value of 0, the element is optional. You can also declare an element to occur one or more times by setting a maxOccurs attribute to 1 or * respectively. The default value for minOccurs is 1, and maxOccurs has no default value.

    When used with attributes, minOccurs and maxOccurs indicate whether the attribute is required. The maxOccurs attribute defaults to 1 unless it is specified or minOccurs is greater than 1. If minOccurs is set to 0 for an attribute and the default maxOccurs is equal to 1, you can have between 0 and 1 occurrences of this attribute. Thus, an attribute with minOccurs set to 0 is optional. If minOccurs is set to 1, the attribute is required. The default for minOccurs is 0, but it's better to specify a value for it in your schema. The minOccurs and maxOccurs attributes can be set only to 0 or 1. For example, the following declaration makes the target attribute required:


      <attribute name = "target" minOccurs = "1" maxOccurs = "1" 
          type = "string"/>
      

    Notice that the attributes we have discussed in this section can also be used to define other elements such as the attribute element.

    Attributes were declared in the simplified DTD in "A DTD for Schemas" as follows:


      <!ELEMENT attribute  ((annotation)?, (simpleType)?)>
      <!ATTLIST attribute  type       CDATA  #IMPLIED
                           default    CDATA  #IMPLIED
                           fixed      CDATA  #IMPLIED
                           name       CDATA  #REQUIRED
                           minOccurs  (0|1)    '0' 
                           maxOccurs  (0|1)    '1' >
      

    In schemas, attributes are the association of a name with a particular simple data type. The attribute element is not included in the schema element, and therefore can only be used as a child element of the complexType or attributeGroup element. This means that all attribute elements will have local scope.

    You can use the attribute element within a complexType element that has either local or document scope. As we'll see in the next section, you can group attribute elements together in an attributeGroup element. The name attribute must follow the same naming conventions as attribute names for DTDs.

    You can use either a default attribute or a fixed attribute with attribute elements, but not both for the same attribute element. Unlike in DTDs, the fixed and default values are not linked to an attribute as optional or required¡ªyou can choose to make any attribute have a fixed value or a default value. A fixed value cannot be changed. The value of the default attribute will be the default value if one is not supplied for the attribute. The following declarations show the usage of default and fixed attributes:


      <attribute name = "myAttribute" minOccurs = "1" fixed = "preserve" 
          type = string"/>
      <attribute name = "align" minOccurs = "0" default = "Center" 
          type = "string"/>
      

    As you can see in the simplified DTD for schemas in "," there is nothing equivalent to the DTD parameter entity used for attributes in schemas. Schemas do, however, allow you to create something similar to a parameter entity for attributes by using the attributeGroup element. Attribute groups declared using the attributeGroup element can have either document-level scope or local scope. (The element can be included in the declaration of the schema element or in the declaration of the complexType element.) In the original version of the schema, all attributes were defined without using the attributeGroup element.

    The sample DTD we created in Chapter 5 included a parameter entity named attrs. You can define an attribute group named attrs in your schema as follows:


      <schema>
          <attributeGroup name="attrs">
              <attribute name = "id" type = "ID"/>
              <attribute name = "class" type = "string"/>
              <attribute name = "style" type = "string"/>
              <attribute name = "lang" type = "NMTOKEN"/>
              <attribute name = "xml:lang" type = "NMTOKEN"/>
              <attribute name = "dir">
                  <simpleType source = "ENUMERATION">
                      <enumeration value = "ltr"/>
                      <enumeration value = "rtl"/>
                  </simpleType>
              </attribute>
              <attribute name = "onclick" type = "string"/>
              <attribute name = "ondblclick" type = "string"/>
              <attribute name = "onmousedown" type = "string"/>
              <attribute name = "onmouseup" type = "string"/>
              <attribute name = "onmouseover" type = "string"/>
              <attribute name = "onmousemove" type = "string"/>
              <attribute name = "onmouseout" type = "string"/>
              <attribute name = "onkeypress" type = "string"/>
              <attribute name = "onkeydown" type = "string"/>
              <attribute name = "onkeyup" type = "string"/>
              <attribute name = "href" type = "string"/>
              <attribute name = "name" type = "string"/>
              <attribute name = "target" type = "string"/>   
          </attributeGroup>
      
      </schema>
      

    You can use attrs as follows:


      <element name = "option">
          <type content = "textOnly">
              <attributeGroup ref = "attrs"/>
              <attribute name = "selected"/>
              
      

    Thus, you declare the attributeGroup element as a child element of the schema element to create a document scope group of attributes. You can then reference the document scope attributeGroup element in a type element by including an attributeGroup element in the type element with the ref attribute set equal to the name of the document scope group. As you can see, this greatly simplifies the schema.

    NOTE
    Attribute groups can contain only simple data types.


    The group element enables you to group elements in the same way you use parentheses when declaring elements in a DTD. The group element also enables you to create something similar to DTD parameter entities. The order of the elements in the group element can vary as defined by the order attribute.

    The declaration for a group element looks like this:


      <!ELEMENT group ((annotation)?, (all | choice | sequence)*)>
      <!ATTLIST group
                  minOccurs   CDATA                  '1'
                  maxOccurs   CDATA                #IMPLIED
                  order       (choice | seq | all)   'seq'
                  name        CDATA                #IMPLIED
                  ref         CDATA                #IMPLIED
                  id            ID                 #IMPLIED>
      

    A group element can optionally contain an annotation element (comments) and must contain an all, a choice, or a sequence element. These elements define the order and usage of the elements in the group, and are examined in detail in the next section. Notice that group elements do not include attributes¡ªthey are used only for grouping elements.

    The minOccurs and maxOccurs attributes indicate how many times the group element can occur. They replace the markers (*, ?, and +) in the DTD.

    The choice, sequence, and all elements

    The choice element indicates a choice of elements in the group element¡ªits function is the same as the bar (|) in the DTD. The DTD declaration <!ELEMENT select (optGroup | option )+> would thus become the following schema declaration:


      <group minOccurs = "1" maxOccurs = "*">
          <choice>
              <element ref = "optGroup"/>
              <element ref = "option"/>
          </choice>
      </group>
      

    The sequence element indicates that the elements must appear in the sequence listed and that each element can occur 0 or more times. When using sequence, you can use minOccurs and maxOccurs as attributes for the elements to specify the number of allowable occurrences of an element element in the group. Using sequence is the same as using the comma separator in the DTD with subgroups that are enclosed in parentheses with occurrence operators. In its simplest form, a sequence element can consist of only one element. For example, the DTD declaration <!ELEMENT optGroup (option )+> would look like this in a schema declaration:


      <group minOccurs = "1" maxOccurs = "*">
          <sequence>
              <element ref = "option"/>
          </sequence>
      </group>
      

    The DTD declaration <!ELEMENT ol (font? , li+ )> would look like this as a schema declaration:


      <group >
          <sequence>
              <element ref = "font" minOccurs = "0" 
                      maxOccurs = "1"/>
              <element ref = "li" minOccurs = "1" maxOccurs = "*"/>
          </sequence>
      </group>
      

    The all element indicates that all the element and group elements listed in the schema must be used, in any order. Each element element in an all group element must have minOccurs and maxOccurs attributes set to 1. The minOccurs and maxOccurs attributes cannot be used for the group element when you are using all; they can be used only for the element elements in the group. A group element declared with order equal to all must not be a subgroup of another group element. For example, because every HTML document must have one and only one head and body, you could declare them in your schema as follows:


      <group >
          <all>
              <element ref = "head" minOccurs= "1" maxOccurs= "1"/>
              <element ref = "body" minOccurs= "1" maxOccurs= "1"/>
          </all>
      </group>
      

    Local embedded groups

    When you include the group element declaration within a complexType element, you are embedding the group element declaration inside the complexType element. If you define the group element within the complexType element, that group element is not visible from anywhere else within the schema document¡ªit has local scope. If you are going to use a group element to contain only one element, it makes sense to use a local group.

    Wildcards, particles, and compositors

    According to the schema specification, the term particle refers to content of an element element that contains only other elements, groups, and wildcards¡ªin other words, no text. Wildcards include several different ways to use the any keyword.

    NOTE
    The any keyword in schemas is similar to the ANY keyword in DTDs, except that in a schema any refers only to element and group elements. In a DTD, ANY refers to text and elements. Text content is not part of the any keyword in the schema specification.

    One way to reference all element or group elements within the specified namespace and schema as the complexType element is to use the any keyword. If this keyword is used within a complexType element, it indicates that any element or group in the schema in the same namespace as the complexType element could be included within this complextype element. If the value for the namespace is ##targetNamespace, all of the elements within the current document will be used.

    Another possibility is to reference all element and group elements in a namespace other than the one the complexType element is in. In this case, you would use the following declaration:


      <any namespace="##name_of_namespace"/>
      

    In a schema, the order element and the minOccurs and maxOccurs attributes together define what is called a compositor. A compositor for a given group element will specify whether elements in the group provide the following conditions:

    • A sequence of the elements that are permitted or required by the specified particles
    • A choice between the elements permitted or required by the specified particles
    • A repeated choice among the elements permitted or required by the specified particles
    • A set of the elements required by the specified particles

    A more precise definition can now be created: a group consists of two or more particles plus a compositor.

    Document scope groups

    Just as you can create complexType elements that have document scope, you can create group elements that have document scope. The same basic rules apply¡ªthat is, including a name attribute and declaring the elements as child elements of the schema element. Thus, you could create the following global group elements:


      <schema>
          <group name = "Block" minOccurs = "1" maxOccurs = "*">
              <choice>
                  <element ref = "p"/>
                  <element ref = "h1"/>
                  <element ref = "h2"/>
                  <element ref = "h3"/>
                  <element ref = "h4"/>
                  <element ref = "h5"/>
                  <element ref = "h6"/>
                  <element ref = "div"/>
                  <group >
                      <choice>
                          <element ref = "ul"/>
                          <element ref = "ol"/>
                      </choice>
                  </group>
                  <element ref = "hr"/>
                  <element ref = "blockquote"/>
                  <element ref = "fieldset"/>
                  <element ref = "table"/>
                  <element ref = "form"/>
                  <element ref = "script"/>
                  <element ref = "noscript"/>
              </choice>
          </group>
      
      </schema>
      

    This declaration states that any element that uses this group must include at least one of these elements as its content. The content of the element can also be any number of copies of the elements in any order. This declaration is identical to the DTD declaration shown here:


      <!ENTITY % Block " (%block; | form | %misc;)*">
      <!ELEMENT noscript %Block;>
      

    Thus, document scope group elements allow you to create something similar to the parameter entities in DTDs that contained element declarations. You can now use the group element as follows:


      <element name = "noscript">
          <complexType content = "elementOnly">
              <group ref = "Block"/>
              <attributeGroup ref = "attrs"/>
          </complexType>
      </element>
      <element name = "blockquote">
          <complexType content = "elementOnly">
              <group ref = "schemaBlock"/>
              <attributeGroup ref = "attrs"/>
              <attribute name = "cite" type = "string"/>
          </complexType>
      </element>
      

    Now that we've covered group and attributeGroup elements, we can examine document scope complexType elements in more detail. If you have a grouping of attributes and elements that will be used by more than one element element, you can create a document scope complexType element. You declare the document scope complexType element exactly as you declare the embedded complexType element, except the declaration will include the name attribute and will not be within the content of an element element¡ªthat is, it will be outside an element element declaration. Thus, it will be declared as a child element of the schema element. For example, all the h elements share a common set of child elements and attributes. You could declare a global complexType element and use it as shown here:


      <schema>  
          <complexType name= "standardcontent" content = "mixed">
              <element ref = "a"/>
              <element ref = "br"/>
              <element ref = "span"/>
              <element ref = "img"/>
              <element ref = "tt"/>
              <element ref = "i"/>
              <element ref = "b"/>
              <element ref = "big"/>
              <element ref = "small"/>
              <element ref = "em"/>
              <element ref = "strong"/>
              <element ref = "q"/>
              <element ref = "sub"/>
              <element ref = "sup"/>
              <element ref = "input"/>
              <element ref = "select"/>
              <element ref = "textarea"/>
              <element ref = "label"/>
              <element ref = "button"/>
              <element ref = "script"/>
              <element ref = "noscript"/>
              <attribute name = "align" type = "string"/>
              <attributeGroup ref = "attrs"/>
          </complexType>
          <element name= "h1" type ="standardcontent"/>
          <element name= "h2" type ="standardcontent"/>
          
      </schema>
      

    You can also extend a complexType element using the base attribute. For example, the li element uses all the preceding content and several other elements. You can extend the complexType element example as follows:


      <complexType name = "licontent" base = "standardcontent" 
          derivedby = "extension">
          <element ref = "p"/>
          <element ref = "h1"/>
          <element ref = "h2"/>
          <element ref = "h3"/>
          <element ref = "h4"/>
          <element ref = "h5"/>
          <element ref = "h6"/>
          <element ref = "div"/>
          <element ref = "ul"/>
          <element ref = "ol"/>
          <element ref = "hr"/>
          <element ref = "blockquote"/>
          <element ref = "fieldset"/>
          <element ref = "table"/>
          <element ref = "form"/>
      </complexType>
      <element name= "li" type ="licontent"/>
      
      

    To extend a complexType element, you need to use the base and derivedBy attributes of the complexType element. The base attribute identifies the source of the element and can be either #all, a single element, or a space-separated list. The derivedBy attribute can be set to restriction, extension, or reproduction. When you are adding elements or attributes to a complexType element, the derivedBy attribute should be set to extension.

    The restriction value for the derivedBy attribute allows you to add restrictions to the element elements included within the complexType element. For element elements included in the original complexType element, you can restrict the number of occurrences of an element or replace a wildcard with one or more elements. For example, if you wanted to restrict the a element to one or more occurrences and remove the br element in a new complexType element, based on the standardcontent type defined in the preceding example, you could write the following code:


      <type name = "licontent" base = "standardcontent" 
          derivedby = "restriction">
          <element name = "a"  minOccurs = "1"/>
          <element name ="br" maxOccurs ="0"/>
      </type>
      

    For attributes, you can add or fix defaults or restrict the attribute's simple data type definition.

    If you set the derivedBy attribute to reproduction, the new element is identical to the type it is derived from. Essentially, reproduction indicates neither restriction nor extension.

    If the value for the final attribute for the complexType element is not empty, the complexType cannot be extended, restricted, or reproduced. A complexType that is derived by extension, restriction, or reproduction also cannot be extended, restricted, or reproduced. The block attribute allows you to block extension, restriction, or reproduction. If you set the block attribute to restriction, the complexType element cannot be used to create a new complex type by restriction.

    The example that has been used up to this point has been a document-oriented XML document with no data types besides string. To see how the other data types work, in this section we'll create an example using the Northwind Traders database. (This database can be found in Microsoft Access, Visual Studio, and Microsoft SQL Server 7.) For the Customer and Categories tables, you could create the schema shown below.


      <?xml version ="1.0"?>
      <schema targetNamespace = "http://www.northwind.com/Category"
          xmlns = http://www.w3.org/1999/XMLSchema
          xmlns:Categories = "http://www.northwind.com/Category">
      <simpleType name="String15" source="string"
              <maxLength= "15" />
              <minLength = "0"/>
          </simpleType>
          <simpleType name="String5" base="string">
              <maxLength= "5"/ >
              <minLength = "0"/>
          </simpleType>
          <simpleType name="String30" base="string">
              <maxLength= "30" />
              <minLength = "0"/>
          </simpleType>
          <simpleType name="String60" base="string">
              <maxLength= "60" />
              <minLength = "0"/>
          </simpleType>
         <simpleType name="String10" base="string">
              <maxLength= "10" />
              <minLength = "0"/>
          </simpleType>
          <simpleType name="String24" base="string">
              <maxLength= "24" />
              <minLength = "0"/>
          </simpleType>
         <simpleType name="String40" base="string">
              <maxLength= "40" />
              <minLength = "0"/>
          </simpleType>
    
          <element name = "Categories">
              <complexType content = "elementOnly">
                  <group>
                  <sequence>
                  <element ref = "Categories.CategoryID"
                           minOccurs = "1" maxOccurs = "1" />
                  <element ref = "Categories.CategoryName"
                           minOccurs = "1" maxOccurs = "1" />
                  <element ref = "Categories.Description"
                           minOccurs = "0" maxOccurs = "1" />
                  <element ref = "Categories.Picture" minOccurs = "0"
                           maxOccurs = "1"/>
                  </sequence>
                  </group>
              </complexType>
          </element>
    
          <element name = "Categories.CategoryID" type = "integer">
              <annotation>
                  <documentation>Number automatically assigned to a new
                                 category
                  </documentation>
              </annotation>
          </element>
    
          <element name = "Categories.CategoryName" type = "String15">
              <annotation>
                  <documentation>Name of food category</documentation>
              </annotation>
          </element>
    
          <element name = "Categories.Description" type = "string"/>
          <element name = "Categories.Picture" type = "binary">
              <annotation>
                  <documentation> Picture representing the food category
                  </documentation>
              </annotation>
          </element>
    
          <element name = "Customers">
              <complexType content = "elementOnly">
                  <group>
                      <sequence>
                          <element ref = "Customers.CustomerID"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.CompanyName"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.ContactName"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.ContactTitle"
                                   minOccurs = "0" maxOccurs = "1"/>
                          <element ref = "Customers.Address"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.City" minOccurs = "1"
                                   maxOccurs = "1"/>
                          <element ref = "Customers.Region"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.PostalCode"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.Country"
                                   minOccurs = "1" maxOccurs = "1"/>
                          <element ref = "Customers.Phone" minOccurs = "1"
                                   maxOccurs = "1"/>
                          <element ref = "Customers.Fax" minOccurs = "0"
                                   maxOccurs = "1"/>
                      </sequence>
                  </group>
              </complexType>
          </element>
    
          <element name = "Customers.CustomerID" type = "CustomerIDField">
              <annotation>
                  <documentation>
                      Unique five-character code based on customer name
                  </documentation>
              </annotation>
          </element>
    
          <element name = "Customers.CompanyName" type = "String5"/>
          <element name = "Customers.ContactName" type = "String40"/>
          <element name = "Customers.ContactTitle" type = "String30"/>
          <element name = "Customers.Address" type = "String60">
              <annotation>
                  <documentation>Street or post-office box</documentation>
              </annotation>
          </element>
    
          <element name = "Customers.City" type = "String15"/>
          <element name = "Customers.Region" type = "String15">
              <annotation>
                  <documentation>State or province</documentation>
              </annotation>
          </element>
    
          <element name = "Customers.PostalCode" type = "String10"/>
          <element name = "Customers.Country" type = "String15"/>
          <element name = "Customers.Phone" type = "String24">
              <annotation>
                  <documentation>
                      Phone number includes country code or area code
                  </documentation>
              </annotation>
          </element>
          <element name = "Customers.Fax" type = "String24">
              <annotation>
                  <documentation>
                      Fax number includes country code or area code
                  </documentation>
              </annotation>
          </element>
      </schema>
      

    Notice that Categories and Customers have been used as prefixes to identify what objects the elements belong to. If you look in the Northwind Traders database, you'll see that the field data types and the lengths for character data types match those in the database. The comments that were included in the Northwind Traders database were also used in the schema. You can see that it's fairly easy to convert a database table into a schema.

    Now that we have discussed schemas, we'll need to cover namespaces and schemas. In the following section, we'll examine how to use namespaces in schemas.


    In Chapter 6, we looked at using namespaces for DTDs. Namespaces can be read and interpreted in well-formed XML documents. Unfortunately, DTDs are not well-formed XML. If you use a namespace in a DTD, the namespace cannot be resolved. Let's look at the following DTD as an example:


      <!DOCTYPE doc [
      <!ELEMENT doc (body)>
      <!ELEMENT body EMPTY>
      <!ATTLIST body bodyText CDATA #REQUIRED>
      <!ELEMENT HTML:body EMPTY>
      <!ATTLIST HTML:body HTML:bodyText CDATA #REQUIRED>
      ]>
      

    A valid usage of this DTD is shown here:


      <doc><body bodyText="Hello, world"/></doc>
      

    The following usage would be invalid, however, because the HTML:body element is not defined as a child element of the doc element:


      <doc><HTML:body bodyText="Hello, world"/></doc>
      

    As far as the DTD is concerned, the HTML:body element and the body element are two completely different elements. A DTD cannot resolve a namespace and break it into its prefix (HTML) and the name (body). So the prefix and the name simply become one word. We want to be able to use namespaces but to be able to separate the prefix from the name. Schemas enable us to do this.

    We could write a similar schema with a namespace to identify schema elements. For example, let's create a schema named NorthwindMessage.xsd, as shown here:


      <schema targetNamespace="http://www.northwindtraders.com/Message"
          xmlns:northwindMessage="http://www.northwindtraders.com/Message"
          xmlns ="http://www.w3.org/1999/XMLSchema">
      <include schemaLocation=  
          "http://www.northwindtraders.com/HTMLMessage.xsd"/>
          <element name="doc">
              <group>
                  <option>
                      <element ref="northwindMessage:body"/>
                      <element ref="northwindMessage:HTMLbody"/>
                  </option>
              </group>
          </element>
          <element name="body">
              <attribute name="bodyText" type="northwindMessage:TextBody"/>
          </element>
      </schema>
      

    NOTE
    The schema namespace is not assigned a prefix, so it is the default namespace. All elements without a prefix will belong to the schema namespaces. When elements that are defined in the schema are used, the schema's namespace must be used, as was done with the body and HTMLbody elements. In schemas, the body and HTMLbody elements can be separated from their namespace prefixes and properly identified.

    The included file, HTMLMessage.xsd, would look like this:


      <xsd:schema targetNamespace:xsd="http://www.northwindtraders.com/    Message"
          xmlns:northwindMessage="http://www.northwind.com/Message"
          xmlns ="http://www.w3.org/1999/XMLSchema">
          <xsd:simpleType name="TextBody" base="string"
              minLength="0"
              maxLength="20"/>
          <xsd:element name="HTMLbody">
              <xsd:attribute name="bodyText" type="string"/>
          </xsd:element>
      </xsd:schema>
      

    NOTE
    In this case, we did assign a prefix to the schema namespace and used this prefix throughout the document. You can use either method, but keep in mind that defaults can sometimes be harder for people to interpret.

    As you can see, namespaces play a major role in schemas. Let's look at the different elements included in this example. Both documents include a targetNamespace. A targetNamespace defines the namespace that this schema belongs to (http://www.northwindtraders.com/Message). Remember, a namespace uses a URI as a unique identifier, not for document retrieval (although an application could use the namespace to identify the associated schema). It will be up to the application to determine how the namespace is used. The include element has a schemaLocation attribute that can be used by an application or a person to identify where the schema is located.

    The HTMLMessage.xsd file is included in the NorthwindMessage.xsd by using the include element. For one schema to be included in another, both must belong to the same namespace. Using the include element will result in the included document being inserted into the schema in place of the include element. Once the insertion is complete, the schema should still be a well-formed XML document. A top-level schema can be built that includes many other schemas.

    Notice that the simpleType TextBody is declared in the included document but used in the top-level documents. This separation makes no difference, as both documents will be combined into one document by the processing application.

    The XML document that is based on the schema is referred to as an instance document. This instance document will have only a reference to the top-level schema. The instance document will need to use only the namespace of the top-level schema. Thus, the instance document for our example schema would look like this:


      <?xml version="1.0"?>
      <northwindMessage:doc xmlns: northwindMessage=  
          "http://www.northwindtraders.com/Message" >
          <body bodyText="Hello, world"/>
      </northwindMessage:doc>
      

    You could also have the following instance document:


      <?xml version="1.0"?>
      <northwindMessage:doc xmlns:northwindMessage= 
          "http://www.northwindtraders.com/Message">
          <HTMLbody bodyText="<h1>Hello, world</h1>"/>
      </northwindMessage:doc>
      

    As far as the instance document is concerned, all the elements come from the top-level schema. The instance document is unaware of the fact that HTMLbody element actually comes from a different schema because the schema resolves all the different namespaces.

    When you use the include element, you insert the entire referenced schema into the top-level schema and both documents must have the same targetNamespace attribute. You might also want to create schema documents that contain simpleType and complexType declarations that you can use in multiple schemas. If the multiple schemas have a different targetNamespace, you cannot use the include element for a document shared between them. Instead of using the include element, you can use the import element. If you use the import element, you can reference any data type created in the imported document and use, extend, or restrict the data type, as shown here:


      <schema targetNamespace="http://www.northwindtraders.com/Message"
          xmlns:northwindMessage="http://www.northwindtraders.com/Message"
          xmlns:northwindType="http://www.northwindtraders.com/Types"
          xmlns ="http://www.w3.org/1999/XMLSchema">
          <import schemaLocation="http://www.northwindtraders.com/        HTMLTypes.xsd"/>
          <element name="doc">
              <element ref="northwindMessage:body"/>
          </element>
          <element name="body">
              <attribute name="bodyText" type="northwindType:TextBody"/>
          </element>
      </schema>
      

    The HTMLTypes.xsd file might look like this:


      <xsd:schema targetNamespace:xsd="http://www.northwindtraders.com/    Types"
          xmlns:northwindMessage="http://www.northwindtraders.com/Types"
          xmlns ="http://www.w3.org/1999/XMLSchema">
          <xsd:simpleType name="TextBody" base="string"
              minLength="0"
              maxLength="20"/>
      </xsd:schema>
      

    In the top-level schema, we associated a namespace called http://www.northwindtraders.com/Types with the prefix northwindType. Using the import element, we can associate that namespace with a schema location. The application will determine how to use the schemaLocation attribute. Once you have done this, you can use the data type. An instance document for this schema is shown here:


      <?xml version="1.0"?>
      <northwindMessage:doc xmlns:northwindMessage= 
          "http://www.northwindtraders.com/Message" >
          <body bodyText="Hello, world"/>
      </northwindMessage:doc>
      

    Once again, as far as the instance document is concerned, it does not matter where the data types are defined¡ªeverything comes from the top-level document.

    Using namespaces, we have managed to build a schema from other schemas and include data types from other schemas. In both of these cases, the instance document uses only the top-level schema. You can also declare an element as being one particular data type and then override that data type in the instance document. Consider the following top-level schema:


      <schema targetNamespace="http://www.northwindtraders.com/Message"
          xmlns:northwindMessage="http://www.northwindtraders.com/Message"
          xmlns="http://www.w3.org/1999/XMLSchema">
      <include schemaLocation= 
          "http://www.northwindtraders.com/HTMLMessage.xsd"/>
          <element name="doc" type="Body"/>
          <complexType name="Body">
              <element name="body">
                  <attribute name="bodyText" type="string"/>
              </element>
          </complexType>
          <complexType name="HTMLBodyCT">
              <element name="HTMLBody">
                  <complexType>
                      <element name="h1" type="string" content="text"/>
                  </complexType>
              </element>
          </complexType>
      </schema>
      

    This schema has defined the doc element as being a Body data type, and the doc element will contain a body child element that has a bodyText attribute. Now suppose you also want to be able to create messages from other body data types, such as the HTMLBodyCT data type defined in the schema. You could do this by creating a group element with choices.

    Another option is to declare the schema as above and then substitute the HTMLBodyCT data type for the Body data type in the instance document. To do this, you will need to reference the schema instance namespace in the instance document. To use the HTMLBodyCT data type, you would need to create an instance document such as this:


      <?xml version="1.0"?>
      <northwindMessage:doc xmlns:northwindMessage= 
          "http://www.northwindtraders.com/Message"
          xmlns:xsi="http://www.w3.org/1999/XMLSchema/instance"
          xsi:type="HTMLBodyCT" >
          <HTMLBody> 
              <h1>"Hello, world"</h1>
          </HTMLBody>
      </northwindMessage:doc>
      

    In this example, you have used the xsi:type attribute to reference a type defined in the schema (HTMLBodyCT). The xsi:type is part of the schema instance namespace and is used to override an element's type with another type that is defined in the schema. In this example, you have now redefined the doc element as being of HTMLBodyCT data type instead of a Body data type. You could also have defined the HTMLBodyCT data type in a separate schema and used the include element in the top-level schema.

    Schemas enable you to associate data types with attributes, create your own data types, and define the structure of your document using well-formed XML. Schemas are used to define elements that are associated with a name and a type. The type is either a data type or one or more attributes or elements. Elements can be grouped together in group elements, and attributes can be grouped together in attributeGroup elements. The group and attributeGroup elements can either be used locally or they can have document level scope.

    Schemas provide many advantages over DTDs; namely, they use namespaces, they utilize a wide range of data types, and they are written in XML. It's likely that schemas will gradually replace DTDs over the next few years. Schemas will be discussed in more detail when we look at BizTalk in Chapter 8 and the Document Object Model in Chapter 11.


    discuss this topic to forum

    relation tutorial

    No relevant information

    Category

      Authoring (2)
      Book Samples (1)
      Database Related (2)
      Development (7)
      Introduction to XML (10)
      Java and XML (1)
      Miscellaneous (5)
      Parsing (2)
      PHP and XML (0)
      Style Sheets (8)
      Web Services (5)

    New

    Hot