www.w3.org/MarkUp/

XHTML (eXtensible Hypertext Markup Language), which is based on XML, took the place of HTML as the recommended markup more than a year ago (see www.w3.org/MarkUp/ for more information). In fact, the most current Web markup standard is XHTML 1.0. Very few designers, however, are using XHTML, because of the vast confusion XML—and by extension XHTML—has caused Web site builders.

But XHTML is a great intermediate step—a bridge between HTML and XML that not only is useful but can help Web builders conquer their fears of XML.

HTML is an application of SGML (Standard Generalized Markup Language). SGML is what is known as a metalanguage; its purpose is to create other document markup methods such as HTML. SGML is complicated and syntactically strict, but its applications span industry and business. When HTML was created, most of SGML's complexity and strictness were discarded. The idea that forged HTML was a simple language that worked with Internet protocols.

HTML focuses on the structure of a document. Originally designed for text, it was never meant to be a language of design. HTML included very few facilities for formatting documents, though some tags (such as <font> and <table>) were added to make documents more user-friendly. But the increasing popularity of the Web meant that designers had to do more: Everyone wanted sites that were attractive and interactive. Too often, HTML was stretched far beyond its limits, resulting in complex code that was difficult to maintain. And despite the introduction of Cascading Style Sheets (CSS), which are intended to separate document formatting from presentation, browser support has been extremely problematic, forcing Web developers to rely on only HTML for presentation. This has also resulted in extremely bloated Web browsers containing plenty of forgiveness checking of poorly written markup.

The fact is, a fair amount of the HTML created today does not conform to HTML's rules. This is not necessarily a problem in today's browsers, but it will certainly create problems in the newer user agents such as PDAs, cell phones, and other devices.

HTML 4.0 sought to solve some of these problems by reexamining document structure. Initiatives in HTML 4.0 included:

* Adherence to one of three document type definitions, or DTDs—Strict, Transitional, and Frameset—which form the basis of XHTML 1.0 (see "The XHTML DTDs" on the next page).
* Greater accessibility for people with disabilities.
* Separation of presentation and content via style sheets.
* An awareness of the growing need to internationalize documents.

In all cases, XML's focus is data. The markup is simply meant as a way for user agents to take that data and do something with it. Presentation is left to style sheets and not included in the document itself. The key to XML is that you can customize the language to suit your needs by combining the tags, DTDs, and other elements you create.

Figure 1: In this example of XML markup, the tags are both machine-readable and intelligible to humans.

<?xml version="1.0" standalone="yes"?>
<AddressBook>
<entry>
<name>Jon E. Persen</name>
<address>4445 East Hilltop Road</address>
<city>Soulville</city>
<state>CA</state>
<zip>000000</zip>
</entry>
</AddressBook>

If HTML is where we're starting and XML is the ideal, how do we get there from here? Facing the problems inherent to HTML, W3C members studied HTML in the context of XML's paradoxical strictness and flexibility. What they came up with is XHTML, a reformulation of HTML into an XML application.

In other words, HTML 4.1 has been reworked to conform to XML syntax rules. XHTML is an XML document with an HTML vocabulary—which is why it's readable across platforms as well as on past and present browsers. Figure 2 shows an XHTML 1.0 markup. You'll immediately see some differences. As in Figure 1, there's an XML declaration at the top. Just below that, you'll find the document type declaration, which describes the document conforming to the XHTML 1.0 Transitional DTD. In the opening HTML tag, there's the attribute xmlns (XML Namespace), which in this case describes the XHTML namespace. (In XML and XHTML documents, all elements belong to a particular namespace, which is like a list with a unique name. The idea is that elements can be used in other documents and that the tags you define won't conflict with other people's tags.)

Figure 2: This sample document points to the XHTML 1.0 Transitional DTD and the XHTML namespace.

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1l/DTD/transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Welcome to Danny's Web Site</title>
</head>
<body>
<h1>Hi There!</h1>
<p>Welcome to my Web site. Here you can:</p>
<ul>
<li>Read about me</li>
<li>See pictures of me and my family</li>
<li>Listen to my favorite music</li>
</ul>
<p>Cool! Let's <a href="next.html">get started</a></p>
</body>
</html>

When you look at the document itself, however, you see familiar HTML tags, because XHTML uses HTML vocabulary. Are there any differences at all? Actually, there are quite a few, thanks to the influence of XML. But you shouldn't have much trouble if you keep a few rules in mind. Some of the most significant syntax rules in XHTML 1.0 include:

* All elements and attribute names must be lowercase: <p align="right"> ... </p>.
* All nonempty elements must have closing tags: <li> ... </li>.
* All empty elements must terminate with a trailing slashes: <br />.
* All attribute values must be quoted: <div align="center">.

In HTML, most of these issues are arbitrary, and several methods are acceptable. In XHTML, there are no exceptions to any rules.

XHTML, like HTML 4.0 and XML, asks its authors to separate formatting from presentation using content. Thus, you can employ style sheets via CSS or XSLT (XML Stylesheet Language Transformations). With the resulting streamlined syntax, XHTML becomes very attractive as a means of marking up documents for use in devices such as PDAs.

 

Document Type Definitions (DTDs) specify markup rules a particular types of documents so that they can be understood by user agents. Validating a document means checking its markup against a DTD and reporting errors.

For a document to conform to XHTML 1.0, it must conform to one of the three DTDs first described in the HTML 4.0 standard:

Strict DTD.
A DTD that excludes the presentation attributes and elements that W3C expects to phase out as support for style sheets matures.

Transitional DTD.
A DTD that includes the aforementioned presentation attritubes and elements.

Frameset DTD.
This DTD is typically used for documents with frames and is identical to the Transitional DTD except that in frameset documents, the FRAMESET element replaces the BODY element. The three DTDs are very similar, and the W3C recommends that you use the Strict DTD if possible. Because the Transitional DTD is the most forgiving, however, it's likely to be the most used DTD for some time.

In terms of best practices, XHTML is an opportunity for all Web designers to learn markup methods that adhere to standards and that ease you into the world of extensibility. XHTML can open your horizons to other XML applications; you'll be able to look at WML