The structure of an XML document can be defined by two standards. The first standard is the XML specification, which defines the default rules for building all XML documents. You can see the specification at the following Web site: http://www.w3.org/TR/1998/REC-xml-19980210. Any XML document that meets the basic rules as defined by the XML specification is called a well-formed XML document. An XML document can be checked to determine whether it is well formed¡ªthat is, whether the document has the correct structure (syntax). For example, one of the rules for a well-formed document is that every XML element must have a begin tag and an end tag. If an element is missing either tag in an XML document, the document is not well formed. Whether an XML document conforms to the XML specification can be easily verified by an XML-compliant computer application such as Microsoft Internet Explorer 5.
The second standard, which is optional, is created by the authors of the document and defined in a document type definition (DTD). When an XML document meets the rules defined in the DTD, it is called a valid XML document. A valid XML document can be checked for proper content. For example, suppose you have created an XML DTD that constrains the body element to only one instance in the entire document. If the document contained two instances of the body element, it would not be valid. Thus, using the DTD and the rules contained in the XML specification, an application can verify that an XML document is valid and well formed. Schemas are similar to DTDs, but they use a different format. DTDs and schemas are useful when the content of a group of documents shares a common set of rules. Computer applications can be written that produce documents that are valid according to the DTD and well formed according to the current XML standard.
Many industries are currently writing standard DTDs and schemas. These standards will be used to create XML documents that will share information among the members of the industry. For example, a committee of members from the medical community could determine the essential information for a patient and then use that information to build a patient record DTD. Patient information could be sent from one medical facility to another by writing applications that create messages containing an XML document built according to the patient record DTD. When an XML patient message was received, the patient record DTD would then be used to verify that the patient record was valid¡ªthat is, that it contained all of the required information. If the XML patient message was invalid, the message would be sent back to the sending facility for correction. The patient record DTD and schema could be stored in a repository accessible through the Internet, allowing any medical facility to check the validity of incoming XML documents. One of the goals of BizTalk is to create a repository of schemas.
In this tutorial, we will begin the process of creating an XML document that can be used to build Internet applications. Ideally, you will want to create an XML document that can be read as an XML document by an XML-compliant browser, as an HTML document using style sheets for non-XML-compliant browsers that understand cascading style sheets (CSS), and as straight HTML for browsers that do not recognize CSS or XML.
We will focus here on the process of creating a well-formed document. We'll review the rules that must be met by a well-formed document and create a well-formed document that can be used to display XML over the Internet in any HTML 4-compliant Web browser. In tutorial 4, you'll learn how to create a DTD for this well-formed document, and in tutorial 5, we will rework the DTD to make it more concise.
The most basic components of an XML document are elements, attributes, and comments. To make it easier to understand how these components work in an XML document, we will look at them using Microsoft XML Notepad. XML Notepad is included in the Microsoft Windows DNA XML Resource Kit, which can be found at Microsoft's Web site ( msdn.microsoft.com/vstudio/xml/default.asp).
Elements are used to mark up the sections of an XML document. An XML element has the following form:
<ElementName>Content</ElementName> |
The content is contained within the XML tags.
Although XML tags usually enclose content, you can also have elements that have no content, called empty elements. In XML, an empty element can be represented as follows:
<ElementName/> |
NOTE
The <ElementName/> XML notation is sometimes called a singleton. In HTML, the empty tag is represented as <ElementName></ElementName>.
In a patient record XML document, for example, PatientName, PatientAge, PatientIllness, and PatientWeight can all be elements of the XML document, as shown here:
<PatientName>John Smith</PatientName> <PatientAge>108</PatientAge> <PatientWeight>155</PatientWeight> |
This PatientName element marks the content John Smith as the patient's name, PatientAge marks the content 108 as the patient's age, and PatientWeight marks the content 155 as the patient's weight. Elements provide information about the content in the document and can be used by computer applications to identify each content section. The application can then manipulate the content sections according to the requirements of the application.
In the case of the patient record document, the content sections could be placed into fields for a new record in a patient database or presented to a user in text boxes in a Web browser. The elements will determine what fields or text boxes each content section belongs in¡ªfor example, the content marked by the PatientName element will go into the PatientName field in the database or in the txtPName text box in the Web browser. Using elements, the presentation, storage, and transfer of data can be automated.
Nesting elements
Elements can be nested. For example, if you wanted to group all the patient information under a single Patient element, you might want to rewrite the patient record example as follows:
<Patient> <PatientName>John Smith</PatientName> <PatientAge>108</PatientAge> <PatientWeight>155</PatientWeight> </Patient> |
When nesting elements, you must not overlap tags. The following construction would not be well formed because the </Patient> end tag appears between the tags of one of its nested elements:
<Patient> <PatientName>John Smith</PatientName> <PatientAge>108</PatientAge> <PatientWeight>155</Patient> </PatientWeight> |
Thus XML elements can contain other elements. However, the elements must be strictly nested: each start tag must have a corresponding end tag.
Naming conventions
Element names must conform to the following rules:
- Names consist of one or more nonspace characters. If a name has only one character, that character must be a letter, either uppercase (A-Z) or lowercase (a-z).
- A name can only begin with a letter or an underscore.
- Beyond the first character, any character can be used, including those defined in the Unicode standard (http://www.unicode.org/).
- Element names are case sensitive; thus, PatientName, PATIENTNAME, and patientname are considered different elements.
For example, the following element names are well formed:
Fred _Fred Fredd123 FredGruß |
These element names would not be considered well formed:
Fred 123 -Fred 123 |
Here the first element name contains a space, the second begins with a dash, and the third begins with a numeral instead of a letter or an underscore.
An attribute is a mechanism for adding descriptive information to an element. For example, in our patient record XML document, we have no idea whether the patient's weight is measured in pounds or kilograms. To indicate that PatientWeight is given in pounds, we would add a unit attribute and specify its value as LB:
<PatientWeight unit="LB">155</PatientWeight> |
Attributes can be included only in the begin tag, and like elements they are case-sensitive. Attribute values must be enclosed in double quotation marks (").
Attributes can be used with empty elements, as in the following well-formed example:
<PatientWeight unit="LB"/> |
In this case, this might mean that the patient weight is unknown or it has not yet been entered into the system.
An attribute can be declared only once in an element. Thus, the following element would not be well formed:
<PatientWeight unit="LB" unit="KG">155</PatientWeight> |
This makes sense because the weight cannot be both kilograms and pounds.
Comments are descriptions embedded in an XML document to provide additional information about the document. Comments in XML use the same syntax as HTML comments and are formatted so that they are ignored by the application processing the document, as shown here:
<!-- Comment text --> |
Before we begin using the basic components of an XML document to create Web applications, we must cover some basics of HTML documents. Unlike XML, HTML markup does not always define content within the markup. For example, HTML includes a set of tags that do not contain anything, such as <hr>, <img>, and <br>. These elements do not have end tags in HTML; if you include end tags with these elements, the Web browser will ignore them.
For the most part, elements and attributes in HTML can be divided into two groups: physical and logical. A logical HTML element or attribute is similar to an XML element. Logical elements and attributes describe the format of the content enclosed within the tags. For example, here the text Hello, world should be displayed with a font size of 3:
<font size="3">Hello, world</font> |
The actual size of the font will depend on the browser settings and the user's preferences. With logical elements, the Web browser will use the markup elements and attributes to identify what the content is and then display the content accordingly.
Physical elements and attributes do not give the user any options as to how content is displayed¡ªthey define exactly what the content will look like. Rewriting our font size example using a physical attribute, we have:
<font size="12 pt">Hello, world</font> |
The Hello, world text will now always be displayed in 12 point type, regardless of the user's preferences. The attribute no longer defines the content as being of a certain format that the application will interpret; it simply sets the attribute to a value that the application will use.
When you develop XML applications, you will want to define elements and attributes that give the user more control, such as logical HTML elements and attributes. These elements and attributes will be used by the application to identify the content contained within the element. Once the application understands what the content is, it can determine how to use the content based on user preferences (for example, setting the default size 3 text to 14 point text in the browser), the structure of the database (for example, in one corporation a customer's last and first names might be saved as a single entity called CustomerName, and in another corporation the same information might be saved as LName and FName), and so on. As we create Web applications using XML throughout this book, we will use logical elements and attributes whenever possible.
The main problem we will have with building Web applications using XML is that most people are working with browsers that only understand HTML. We'll need some way to get the non-XML browsers to view XML code as HTML code so that the pages will render properly in the browser. When cascading style sheets (CSS) were introduced, they also faced the same problem: how to render documents properly in non-CSS browsers. The ingenious solution that was used for CSS documents can also be used for XML. Let's take a look at how CSS can work in both browsers that understand style sheets and browsers that do not.
When the CSS standard was created, a great number of people were still using browsers that did not support style sheets (many still are). Web developers need to be able to create Web applications using style sheets for the new browsers and yet still have these applications present properly in browsers that do not support style sheets.
This might sound like an impossible task, but it is actually quite simple. Web browsers will ignore any tag or attribute they do not recognize. Thus, you could put the following tag in your HTML code without causing any errors:
<Jake>This is my tag</Jake> |
The browser will ignore the <Jake> tag and output This is my tag to the browser. Taking advantage of this browser characteristic is the key to using style sheets.
A style sheet is a document that defines what the elements in the document will look like. For example, we can define the <h1> tag as displaying the normal font at 150 percent of the default h1 size, as shown here:
<style>
h1 {font: normal 150%}
</style>
|
NOTE
The style definitions do not have to be contained in a separate document; you can place the style definitions in the same document as the HTML code that will use these definitions. To support both CSS browsers and non-CSS browsers, it's recommended that the style sheet be referred to as a separate document.
In browsers that support style sheets, this definition will display all content within <h1> tags in the document in the normal font at 150 percent of the default h1 size. If the style definition is saved in a document named MyStyle.css, you can use this style in your HTML page by including the following line of code:
<link rel=MyStyle.css> |
Browsers that do not support style sheets will not know what the <link> tag is, nor will they know what the rel attribute is. These browsers will simply ignore the <link> tag and use the default settings associated with the h1 element. Browsers that do support style sheets will recognize the tag, access and read the style sheet, and present the h1 elements as defined in the style sheet (unless the style definition is overridden locally in the HTML page).
A detailed discussion of style sheets is beyond the scope of this book. The specification for the latest version for CSS can be found at http://www.w3.org/TR/REC-CSS2/. You can use style sheets to do much more than simply override the standard HTML tags.
To "XMLize" your HTML code, that is to convert HTML to XML, you will begin by creating a Web document using XML elements that will default to standard HTML tags. To do this, you will have to close all HTML tags. For example, if you use the tag <br> that does not have an end tag in the document, you will have to add one, as shown here:
<br></br> |
Because the Web browser does not know what the </br> tag is, it will ignore it. You could not use the following empty element XML notation, however, because non-XML browsers would not be able to identify the tag:
<br/> |
Adding end tags is one of several tasks that will need to be performed to create a well-formed document¡ªin other words, the first step in XMLizing your HTML is to make the document well formed.
HTML contains many features that can make it a difficult language to use. For example, the following code would work but probably would not give you the results you wanted:
<h1>Hello, world <p>How are you today?</p> |
The missing end tag </h1> is added implicitly at the end of the document, meaning that both lines would be presented in the h1 style, not just the first line. Another problem with HTML is that there is no easy way to create an application to validate HTML documents to find errors such as the one shown above.
Keeping your document well formed will help prevent these types of problems. When you create a document, you will need to make sure that tags are positioned correctly so that you get the results you want. With these HTML basics in mind, you are ready to build an XML Web application using XML Notepad.
The information needed to create a complete XML document that will work in an XML-compliant browser such as Internet Explorer 5 is presented over the course of several chapters in this book. We will begin in this chapter by building a well-formed XML document that will always default to standard HTML. Any browser can read this document. We will use XML Notepad to create an XML Web document using a simple user interface.
XML Notepad enables us to focus on working with elements, attributes, and comments and to properly position them in the document structure. XML Notepad will handle the complexities of writing the XML document in a well-formed manner. In the section "The Final XML Document" later in this chapter, you'll find a review of the code created by XML Notepad. To create the initial document structure, follow these steps:
- To open XML Notepad, choose Program Files from the Start menu, and then choose Microsoft XML Notepad.
- XML Notepad will be displayed with a root element, which will contain all the other elements of the XML document. Every XML document must have a single root element to be well formed. Click on the root element (Root_Element), and rename it html.
- We will create two main child elements for our HTML document, body and head. Change the name of the default child element (Child_Element) to head.
- To add a second child element, click on head and choose Element from the Insert menu. Name the new child element body.
Figure 3-1 shows XML Notepad after you've made these changes.
Figure 3-1. XML Notepad, showing the root element and two child elements.
In this example, we will build a simple help desk Web page that uses a table to properly place the page elements in the Web browser. The Web page is shown in Figure 3-2.
Figure 3-2. Sample help desk Web page.
The table consists of two rows and two columns, for a total of four table cells, as shown in Figure 3-3. Notice that the title spans the two columns in the first row.
Figure 3-3. The four table cells.
In the following section, we'll create a generic template for producing Web pages that follow this design. These pages will use tables for formatting text and lists for presenting information.
To complete the head element, follow these steps:
- To add a child element to the head section, click on the head element and choose Child Element from the Insert menu. Name the new child element title.
- To add another child element, click on title and choose Element from the Insert menu. Name this element base.
- In HTML, the base element has an attribute named target. The target attribute defines the default page to which a link will go when no link is specified in an a element. To add an attribute to this element, click on base and choose Attribute from the Insert menu. Name the attribute target.
The completed head element is shown in Figure 3-4.
Figure 3-4. The completed head element in XML Notepad.
- Choose Source from the View menu to display the source code, shown in Figure 3-5.
Figure 3-5. Source code for the completed head element.
As you can see, this source code looks a lot like HTML. This document meets the requirements for a well-formed XML document, but it can also be used as an HTML document with a little work. Three of the elements are empty elements: <title/>, <base target=""/>, and <body/>. XML uses the singleton format to denote an empty element, which is not recognized by HTML. To modify these elements so that HTML Web browsers can read them, they should be written as follows:
<title></title> <base target=""></base> <body></body> |
We could leave the title and body elements as singletons since a Web browser reading this as an HTML document will simply ignore them. However, we don't want a Web browser to ignore the empty base element because it has the target attribute associated with it. The base element is supposed to be empty because it exists only as a container for its target attribute. We should change the base element to <base target=""></base>, but this cannot be done in XML Notepad. If you edit the document in a regular text editor and change this element, XML Notepad will change it back to the singleton when it reads the file.
We can prevent XML Notepad from converting the element back to a singleton by adding a comment to the element. To do so, click on base and choose Comment from the Insert menu, and then add the following comment value in the right pane of XML Notepad:
Default link for page |
The source code will now look as follows:
<base target=""><!--Default link for page--></base> |
This added comment solves the empty element problem without having to resort to any ugly hacks.
These problems are caused by the fact that HTML doesn't understand singletons. You will encounter these difficulties when you XMLize currently existing HTML document structures.
Now that we have completed the head section, we can next make the body section. The body section will contain the information that will be displayed in the browser. To complete the body element, follow these steps:
- Add the following attributes to the body element: text, bgcolor, link, alink, and vlink. Then add the following child elements: basefont, a, and table.
- Click on vlink, and add the following comment below the attribute:
Default display colors for entire body
Figure 3-6 shows the modified body element.
Figure 3-6. The body element, with attributes and elements added.
Completing the basefont element
To complete the basefont element, you need to add a size attribute and the following comment:
Size is default font size for body; values should be 1 through 7 (3 is usual browser default). |
Once again, the comment solves the problem of the empty tag.
NOTE
Although we have placed constraints on the possible values for size, you will not be able to verify whether the correct values are used unless you create a DTD. You'll learn how to do this in Chapter 4.
Completing the a element
This a element will act as an anchor for the top of the page. Add the following attributes to the a element: name, href, and target, and then add the following comment to the name attribute:
Anchor for top of page |
Completing the table element
To complete the table element, follow these steps:
- Add the following attributes to the table element: border, frame, rules, width, align, cellspacing, and cellpadding. Add the following comment below cellpadding:
Rules/frame is used with border.
- Next you will need to add a tr child element to the table element to create rows for the table. The result is shown in Figure 3-7.
Figure 3-7. Adding a tr element to the table element.
- Add the following attributes to the tr element: align, valign, and bgcolor.
- Add a td child element to the tr element. The td element represents a table cell. Each cell will contain the content that will go into the Web page.
- Add the following attributes to the td element: rowspan, colspan, align, valign, and bgcolor. Then add the following comments:
Either rowspan or colspan can be used, but not both. Valign: top, bottom, middle
Next we will add a child element to td named CellContent. CellContent is not recognized by HTML, so HTML Web browsers will ignore the tag. We will use CellContent to identify the type of information being stored in the cell. This information can be used later by applications to help identify the content in the Web site.
The CellContent element will contain a series of tags that can be used as a template to create the content that will go into the cell. To keep things organized, h1, h2, h3, and h4 headers could be used. To keep this example simple, we will use only an h1 child element. Below each header will be a paragraph. Within the paragraph will be a series of elements that can be arranged as necessary to build the cell.
Completing the CellContent element
To complete the CellContent element, follow these steps:
- Add an attribute named cellname below the CellContent element.
- Add an h1 child element to the CellContent element, and add an align attribute to the h1 element.
- Add a p child element to the CellContent element, and add an align attribute to the p element. Add the following comments to the p element:
All of the elements below can be used in any order within the p element. You must remove the li elements from ul and ol if they are not used.
- Add the following child elements to the p element: font, font, img, br, a, ul, and ol. Two font elements are needed because one will be used to create sections of text that are marked to be displayed in a different font than the default font and one will be used with the b element to display any content within the b tags in boldface.
- Click on p, and then choose Text from the Insert menu to create an object that you can use for adding content to the p element.
- In the first font element, add the following attributes: face, color, and size. In the second font element, add the same attributes and a b child element.
- In the img element, add the following attributes: src, border, alt, width, height, lowsrc, align, hspace, and vspace. Add the following comments after vspace:
Border is thickness in pixels. Align = right, left The hspace and vspace attributes represent padding between image and text in pixels.
- The br element prevents text wrapping around images. Add an attribute to the br element named clear. Add the following comment after the clear attribute:
Clear = left, right, all; used to prevent text wrapping around an image
Figure 3-8 shows what the structure of your XML document should look like at this point.
Figure 3-8. The structure of the img and br elements.
- Add a type attribute to the ul element, and add the following comment:
Type: circle, square, disk
- To create text that appears in boldface at the top of the list as a heading, click on the font element that contains the b element, and choose Copy from the Edit menu. Click on the ul element, and then choose Paste from the Edit menu. Next add an li child element to the ul element. An li element represents an item in the list. Copy the font element that does not contain the b element into the li element. Copy the a element into the li element. Add a text object to the li element.
- Finally, add the following attributes to the ol element: type and start. Add the following comment:
Type: A for caps, a for lowercase, I for capital roman numerals, i for lowercase roman numerals, 1 for numeric
- Copy the font element that contains the b element from the p element into the ol element. Copy the li element from the ul element into the ol element.
Figure 3-9 shows the completed CellContent element.
Figure 3-9. XML Notepad, showing the CellContent element.
NOTE
Several elements in Figure 3-9, such as the img and br elements, are collapsed.
You have now created a basic XML template that can be used to build Web pages. In the next section, we will build a Web help page using this XML document.
You can insert values for elements, text, and attributes in the right pane of the XML Notepad just as you did when you entered values for the comments. Save the document we have just created as Standard.xml. Next choose Save As from the File menu and save the file as Help.htm. You can now begin to add values to the template.
You can also add copies of existing elements if you do not alter the overall structure of the document. For the most part, the structure will be maintained as long as a new element is added at exactly the same level in the tree as the item that was copied. Thus, we could add many copies of the li element if all of the new li elements are located under a ul element or the ol element. You could not position an li element under any other element without changing the structure of the document.
Now that we have created the template, we can use it to build a Web document. We can now add content for the elements and values for the attributes. To add values to the head and body elements, follow these steps:
- Expand the body element, and enter the following value for the title element of the head element: Northwind Traders Help Desk.
- Next add values for the body element attributes as shown in the following table:
Attribute Value Text #000000 Bgcolor #FFFFFF Link #003399 Alink #FF9933 Vlink #996633 - Expand the a element and give the name attribute a value of Top.
- Enter the values for the table element attributes shown in the following table:
Attribute Value Border 0 Width 100% Cellspacing 0 Cellpadding 0
Completing the first row
As shown in Figure 3-3, the first row of our sample table contains the title centered on the page. To accomplish this, follow these steps:
- Expand the tr element, and set its valign attribute to Center. Then expand the td element and set its align attribute to Center.
- For the colspan attribute of the td element, enter 2 (meaning that the title will span the two columns).
- Expand the CellContent element, and enter Table Header for the cellname attribute. Enter Help Desk for the value of the h1 element and Center for the value of its align attribute.
Figure 3-10 shows what the document should look like at this point. You can now collapse this tr section because we have finished with this row.
Figure 3-10. XML Notepad, showing the completed first row.
Completing the second row
To add a second row, follow these steps:
- Click on tr, and then choose Duplicate Subtree from the Insert menu. This will add another tr element, complete with all of its subtrees. Expand the new tr element, and set its valign attribute to Top.
- We need two cells in the second row to allow two sets of hyperlink lists in two separate columns. To accomplish this, click on the td element and choose Duplicate Subtree from the Insert menu to add a second td element and all of its subtrees.
- We'll begin by working with the first td element. For the align attribute of the first td element, enter Left. Expand the CellContent element, and enter Help Topic List for the cellname attribute. Expand the p element, expand the ul element, and then expand the font element. Enter 3 for the size attribute. For the b element, enter For First-Time Visitors.
- Because we want to make hyperlinks to help pages, we will use the a element. Expand the li element, and then expand the a element and enter the value First-Time Visitor Information. For the href attribute of the a element, enter FirstTimeVisitorInfo.htm.
- Click on li, and then choose Duplicate Subtree from the Insert menu to add an li element and all of its subtrees. Expand the new li element, and then expand the a element and enter the value Secure Shopping at Northwind Traders. For the href attribute of this a element, enter SecureShopping.htm.
- Click on li, and choose Duplicate Subtree from the Insert menu to add a third li element. Expand this li element, expand the a element, and enter the value Frequently Asked Questions. Enter the value FreqAskedQ.htm for the href attribute.
- Click on li, and choose Duplicate Subtree from the Insert menu to add a fourth li element. Expand this li element, expand the a element, and enter the value Navigating the Web. Enter the value NavWeb.htm for the href attribute. Figure 3-11 shows the document at this point.
- Expand the second td element, and set its align attribute to Left. Expand the CellContent element, and enter the value Shipping Links for the cellname attribute. Expand the p, ul, and font elements, and enter the value Shipping for the b element.
Figure 3-11. XML Notepad, showing the completed first list.
- Expand the li element, expand the a element, and enter the value Rates. Enter the value Rates.htm for the href attribute.
- Click on li, and choose Duplicate Subtree from the Insert menu to insert a second li element. Expand the new li element, expand the a element, and enter the value Checking on Your Order. For the href attribute, enter the value OrderCheck.htm.
- Click on li, and choose Duplicate Subtree from the Insert menu to insert a third li element. Expand the new li element, expand the a element, and enter the value Returns. For the href attribute, enter the value Returns.htm.
Figure 3-12 shows the completed second row.
Figure 3-12. XML Notepad, showing the completed second list.
Many of the available elements in our template have not been used¡ªfor example, we have not used the ol elements, several h1 elements, the base element, and so on. There's no need to keep these elements, and some of them will affect the output when the document is viewed as HTML. Go through the template and delete any elements that have not been used. When you have finished, save the document. When you view the document in a Web browser, it should look like Figure 3-2.
We've gone through quite a bit of work building this standard template and using it to create a simple Web page. The obvious question is, "Have you gained anything by doing this?" In this section, we'll look at some of the advantages of the standard template.
Our ultimate goal is to be able to write computer applications that catalog, present, and store content in documents. Ideally, these applications should perform these tasks as an automatic process, without human intervention. You should always be able to create a computer application that automatically processes a well-formed XML document.
Ordinary HTML documents cannot be processed automatically because they are not well formed. It is extremely difficult to create HTML code according to a uniform standard. If you sketch out a design for a Web page and give it to ten different Web developers, it is likely they will create ten documents containing completely different HTML code. Even with a standard, it is likely that the code will still differ.
An automated computer application needs a standard format to work with. If every HTML document can have only a certain set of tags and these tags can appear only in a certain order, you can write an application to process the content. You could define a set of rules and pass them to your developers. In our sample XML template, we used XML and the XML Notepad to define these rules.
These rules could have simply been written in a document, but you would then have no way to verify that the ten developers all built their HTML pages according to the rules. By defining the rules as XML, you can quickly verify whether the document meets the requirements by verifying whether the document is well formed (which it must be if it is built in an XML editor). You will also need a DTD to check all the rules. (You'll learn how to build this DTD in Chapter 4.) Using XML Notepad to create the document in XML thus helps prevent errors when an application reads the document.
The elements of our sample Web page could be stored in a database. You could then create tables and fields based on the information stored in the database. Because an XML-aware computer application can identify the content of each element, the application can automatically put the correct element in the correct table and field.
You can also define the content in any manner you see fit. In our sample document, we added a CellContent element. You could have added numerous elements throughout the document to identify the content of each section. You could also have added attributes to existing elements. For example, you could have defined the ul element as follows:
<ul type="" comment=""></ul> |
These additional attributes and elements can then be used by an application to catalog the content of your documents. Imagine using these tags to build the search indexes for your Web site.
These extra tags and attributes also make the document much more readable to humans. When you are designing a Web site, you can define the content of different elements rather than just drawing what the page should look like. Certain components, such as the navigation bars at the top and sides of the page and the footer section, are likely to be shared by many pages. These components can be identified and can be added to the standard template. The developer will need to change only the elements on the page that differ from one page to the next.
In our sample template, you created elements that could be used to build a Web document. When it came time to add a new row, you copied the row structure and pasted a new row into the document. This new row had the entire structure already built into it. The same technique was used to duplicate several other elements, including the li element, the font element, the p element, and the a element.
Reusing elements that contain attributes and child elements guarantees that the entire document will be uniform. When you are building documents, this uniformity will help ensure that you are following the rules for the document. Reusable elements will also make it easier to build the entire document since you are building the document from predefined pieces. For example, it would be easy to include the additional h elements by reusing the p element. You would only need to insert the h2, h3, and h4 elements and copy and paste three p elements. In this example, you are reusing the p element. Figure 3-13 illustrates this.
Figure 3-13. XML Notepad, showing added h2, h3, and h4 elements.
Other programs are available that allow you to view XML documents. Some of these applications will work with DTDs and are discussed in Chapter 4. For viewing and editing an XML document, you can use XML Spy (http://xmlspy.com). Figure 3-14 shows the final Help.htm file displayed in XML Spy.
Figure 3-14. The Help.htm file displayed in XML Spy.
You can also view XML documents using XML Pro (http://www.vervet.com). XML Pro provides a window that lists the elements you can insert. Figure 3-15 shows the final Help.htm file displayed in XML Pro.
Figure 3-15. The Help.htm file displayed in XML Pro.
You can download trial versions of these programs from their Web sites. Use the tool that works best for you.
To be well formed, your XML document must meet the following requirements:
- The document must contain a single root element.
- Every element must be correctly nested.
- Each attribute can have only one value.
- All attribute values must be enclosed in double quotation marks or single quotation marks.
- Elements must have begin and end tags, unless they are empty elements.
- Empty elements are denoted by a single tag ending with a slash (/).
- Isolated markup characters are not allowed in content. The special characters <, &, and > are represented as >, &, < in content sections.
- A double quotation mark is represented as ", and a single quotation mark is represented as &apos in content sections.
- The sequence <[[ and ]]> cannot be used.
- If a document does not have a DTD, the values for all attributes must be of type CDATA by default.
Rules 1 through 6 have been addressed in this chapter. If you need to use the special characters listed in rules 7 and 8, be sure to use the appropriate replacement characters. The sequence in rule 9 has a special meaning in XML and so cannot be used in content sections and names. We will discuss this sequence in Chapter 5. The CDATA type referred to in rule 10 consists of any allowable characters. In our sample document, the values for the attributes must contain characters, which they do.
XML Notepad does not add the XML declaration to an XML document. The XML declaration is optional, and should be the first line of the XML document if provided. The syntax for the declaration is shown here:
<?xml version="version_number" encoding="encoding_declaration" standalone="standalone status"?> |
The version attribute is the version of the XML standard that this document complies with. The encoding attribute is the Unicode character set that this document complies with. Using this encoding, you can create documents in any language or character set. The standalone attribute specifies whether the document is dependent on other files (standalone = "no") or complete by itself (standalone = "yes").
The final XML document is shown here:
<?xml version="1.0" standalone="yes"?> <html> <head> <title>Northwind Traders Help Desk</title> <base target=""><!--Default link for page--></base> </head> <body text="#000000" bgcolor="#FFFFFF" link="#003399" alink="#FF9933" vlink="#996633"> <!--Default display colors for entire body--> <a name="Top"><!--Anchor for top of page--></a> <table border="0" frame="" rules="" width="100%" align="" cellspacing="0" cellpadding="0"> <!--Rules/frame is used with border.--> <tr align="" valign="Center" bgcolor=""> <td rowspan="" colspan="2" align="Center" valign="" bgcolor=""> <!--Either rowspan or colspan can be used, but not both.--> <!--Valign: top, bottom, middle--> <CellContent cellname="Table Header"> <h1 align="Center">Help Desk</h1> </CellContent> </td> </tr> <tr align="" valign="Top" bgcolor=""> <td rowspan="" colspan="" align="Left" valign="" bgcolor=""> <CellContent cellname="Help Topic List"> <p align=""> <ul type=""> <font face="" color="" size="3"> <b>For First-Time Visitors</b> </font> <li> <a href="FirstTimeVisitorInfo.htm" target=""> First-Time Visitor Information </a> </li> <li> <a href="SecureShopping.htm" target=""> Secure Shopping at Northwind Traders </a> </li> <li> <a href="FreqAskedQ.htm" target=""> Frequently Asked Questions </a> </li> <li> <a href="NavWeb.htm" target=""> Navigating the Web </a> </li> </ul> </p> </CellContent> </td> <td rowspan="" colspan="" align="Left" valign="" bgcolor=""> <CellContent cellname="Shipping Links"> <p align=""> <ul type=""> <font face="" color="" size=""> <b>Shipping</b> </font> <li> <a href="Rates.htm" target=""> Rates </a> </li> <li> <a href="OrderCheck.htm" target=""> Checking on Your Order </a> </li> <li> <a href="Returns.htm" target=""> Returns </a> </li> </ul> </p> </CellContent> </td> </tr> </table> </body> </html> |
The final document looks basically like an HTML document and works like one, but it meets the criteria for being a well-formed XML document. Notice that all of the tags nest properly, all tags are closed, and the root element (<html></html>) encloses all the other elements.
You could have written all of this XML code manually, but it would have been more difficult and you would have been more likely to make a syntax error. There are various XML editors such as XML Authority, XML Instances, and XML Spy that allow you to focus on the structure of your document and the elements that will go into your document without being concerned about the syntax. Of course, once you have finished with the XML editor, you should review the final document to verify that the XML code is actually what you want.
Well-formed XML documents can be created by using elements, attributes, and comments. These components define content within the document. Using these definitions, applications can be created that will manipulate the content.
The requirements for well-formed documents that have been addressed in this chapter include having a single root element, having properly nested elements, having no duplicate attribute names in an element, and enclosing all attribute values in single and double quotation marks. Using XML editors to create XML documents will allow you to focus on defining the structure of your document, the first step in building a well-formed XML document.
To make a class of documents with the same format as the one we created in this chapter, you will want to create a DTD to validate the entire class of documents. In the next chapter, you will learn how to create DTDs, and you will create one specifically for this document.
discuss this topic to forum
