The X stands for Extensible
XHTML, a mixture of HTML and eXtensible Markup Language (XML), is the successor to HTML 4, the previous standard in Web languages.
Before we examine how XHTML differs from HTML, let's explore how it came about. Cast your mind back to the mid to late 1990s. 'Twas a time of heady enthusiasm. The Internet industry was booming and more and more people were creating Web documents and wanting them to look good.
Responding to this demand, the Web standards setters continued to expand the set of tags allowed in HTML to accommodate demands for presentational control, and HTML documents tended to get very messy. HTML 4 was not very demanding in how its code was formed, and browsers were designed to be very forgiving of mistakes. They would try to render buggy pages the best they could.
This encouraged Web developers to create sloppy documents that were riddled with malformed code and tag soup. Browsers, in turn, were forced to dedicate more and more resources to deciphering bad code, and thus, we ended up with browser software that was very big and bloated.
An example of tag soup is the code for the following navigation menu for the Harvard University Web site:
<table bgcolor="#cdd397" border="0" cellpadding="0" cellspacing="0" width="650">
<tbody><tr>
<td valign="top"><img src="images/shield3.gif" alt="Harvard University shield" border="0" height="25" width="117"></td>
<td valign="top"><a href="http://www.harvard.edu/"><img src="images/home2.gif" alt="Home" name="nav01" border="0" height="25" width="47"></a></td>
<td><img src="images/ nav_bullet.gif " border="0" height="25" width="14"></td>
<td><a href="http://www.harvard.edu/admissions/" onmouseover="imgOn('nav02')" ;="" onmouseout="navOff('nav02')"><img src="images/admissions.gif" alt="Admissions" name="nav02" border="0" height="25" width="166"></a></td>
<td><img src="images/nav_bullet.gif" border="0" height="25" width="14"></td>
<td><a href="http://atwork.harvard.edu/" onmouseover="imgOn('nav03')" ;="" onmouseout="navOff('nav03')"><img src="images/employment.gif" alt="Employment" name="nav03" border="0" height="25" width="80"></a></td>
<td><img src="images/nav_bullet.gif" border="0" height="25" width="14"></td>
<td><a href="http://lib.harvard.edu/" onmouseover="imgOn('nav04')" ;="" onmouseout="navOff('nav04')"><img src="images/libraries.gif" alt="Libraries" name="nav04" border="0" height="25" width="59"></a></td>
<td><img src="images/nav_bullet.gif" border="0" height="25" width="14"></td>
<td><a href="http://www.harvard.edu/museums/" onmouseover="imgOn('nav05')" ;="" onmouseout="navOff('nav05')"><img src="images/museums.gif" alt="Museums" name="nav05" border="0" height="25" width="64"></a></td>
<td><img src="images/nav_bullet.gif" border="0" height="25"
width="14"></td>
<td><a href="http://www.harvard.edu/arts/" onmouseover="imgOn('nav06')" ;="" onmouseout="navOff('nav06')"><img src="images/arts.gif" alt="Arts" name="nav06" border="0" height="25" width="33"></a></td>
</tr>
</tbody></table>
While Harvard is a fine institution, this code sample has all the trademark ingredients of tag soup. To name a few:
|
|
|
|
1. HTML table elements are used to control or contain the list of links.
2. JavaScript-powered rollovers are used (the "onmouseover"-"onmouseout" attributes).
3. Presentational attributes like bgcolor and border are used in the HTML table tag. |
|
|
|
|
With more and more sites coding in this method, what was needed was a way to enforce clean and logical markup that could evolve with new technology.
Along Comes XHTML
This sad state of affairs might have continued ad infinitum had new Web-enabled devices such as cell phones and PDAs not arrived on the market. These devices simply didn't have the resources to interpret bad code. (And why should they?) Instead, these devices did what the browsers could not, and created a demand for the adoption of XHTML.
XML is a meta-language used to create other markup languages. It's a markup language like HTML, but much stricter in its rules for creating valid code. The World Wide Web Consortium (W3C) decided to combine XML with HTML, creating XHTML 1.0.
XHTML looks much like HTML, but its rules for code are much stricter.
While these constraints place some more demands on the Web coder (at least initially), in the end it makes for markup that works in alternative browsers (allowing for use in various technologies and ensuring greater accessibility) and which will stand the test of time by working in future browsers as well.
As most modern browsers know how to read XHTML, there is little reason not to make the move from HTML to XHTML. And if this is your first dive into Web encoding, starting with XHTML is a great way to develop good coding practices that will benefit you in the long run.
Writing Web Pages in XHTML
Let's look at a basic XHTML page:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml" />
<title>My Great Web Page</title>
</head>
<body>
<h1>My Great Web Page</h1>
<p>Aliquam libero velit, elementum eu, venenatis id, consequat eget, diam. Suspendisse posuere, sem sed condimentum sagittis, tortor est viverra augue, quis fermentum neque nulla eget nulla.</p>
<p>Nulla aliquet varius velit. Etiam id nibh ac diam aliquet fringilla.</p>
</body>
</html>
You'll note that it looks a lot like an HTML document, though let's take a look at its basic structure.
At the top of the document, we've specified the character set the document is encoded in (here it is 8-bit Unicode). It includes a document type (DOCTYPE) declaration, which lets the browser know that it is a XHTML 1.0 Strict document. (More on that in a minute!)
You'll also note, that unlike the HTML documents we've look at so far, all the paragraphs include a closing tag. Unlike HTML, XHTML requires that all tags be closed. This is just one example of the ways in which XHTML is stricter than HTML. We'll get into the specifics in a bit.
Now, let's start learning XHTML by examining the necessary elements of the document.
Discerning DOCTYPES
All XHTML documents must have a DOCTYPE. You'll recall from Lecture One that the document type declaration tells the browser what version of XHTML (or HTML) you're using, and thus how to interpret and render the page. This tag is the beginning of the XHTML document, and there are three basic types of XHTML documents you can declare:
XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML
1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The strict DOCTYPE is used when you want clean code in which presentation (controlled by style sheets) is separated from content.
XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML
1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The transitional DOCTYPE is used when you still want to include presentational elements within the XHTML code (for example, when you are working with browsers that don't understand CSS).
XHTML 1.0 Frameset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML
1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
The Frameset DOCTYPE is used when you want to include frames in your
Web page.
A note about frames: Frames allow for the display of more than one Web page at a time. In the 1990s, the use of frames was a usual occurrence as the technique allowed Web designers to place a Web documents like navigation, header/logo and/or content in their own respective areas or frames.
However, due to their poor accessibility (screenreaders can't process more than one Web document at a time) and the advancement in CSS support by mainstream browsers, the use of frames is discouraged.
Declaring Encoding
Let's briefly identify the different elements above the head of the XHTML page. Above the DOCTYPE information, you will have noticed some description of character encoding:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Character encoding is a way of representing characters (letters, numbers, and other symbols) as numbers that a computer can understand. For example, in ASCII, the uppercase letter "A" is represented by the number 65. Text files like XHTML documents are saved with a certain character encoding, and browsers interpret the file depending on the kind of encoding it thinks was used. XHTML documents require that you declare what kind of character encoding you are using. If you declare the wrong kind of encoding, then your page may be rendered as a bunch of nonsense.
So how do you know what kind of character encoding to use? For most Web pages, 8-bit Unicode (or UTF-8) is more than adequate, as it accommodates all of the Latin characters, as well as many non-Latin characters (like Greek and Chinese). (Unicode is the successor to—and is backwards compatible with—ASCII. Unicode supports many more characters than ASCII, which only represents 128 characters.) Most relevant applications (text editors and modern browsers) support UTF-8. To declare the encoding in an XHTML document, include the following line at the top of your document:
<?xml version="1.0" encoding="UTF-8"?>
Note: This XML prolog is an optional line of code, meaning you don't
need to put it into your Web document. This is good because some Web
servers can't process this line if it's in your document.
Some—not all—hosts have a programming language called PHP set to
process files with a .html file extension in addition to .php files.
The syntax for starting a block of coding in PHP is similar to this
XML prolog, which can cause confusion for the Web server.
So, if you are finding your Web page isn't showing up on your page and
there's an error on line 1 of your document, try removing this line
and seeing if that fixes the problem.
For the HTML element within an XHTML document, you assign the attribute
xmlns:
<html xmlns="http://www.w3.org/1999/xhtml " xml:lang="en" lang="en">
This attribute with its value sets the XML namespace, letting the
browser know how to handle any conflicts with repeating HTML element
names. This isn't a problem with HTML 4 elements, but with XHTML
document—which is an HTML document wrapping into an XML structure—it's a very real possibility.
Finally, you'll have noticed a space for meta information:
<meta http-equiv="Content-Type" content="application/xhtml+xml" />
This is a method to let the browser know what kind of data is being sent
to the browser. Most of the time this kind of information is set on the
server-side, which sends the Web document to your browser. However, if
you don't have access to properly set that on the server, this element
does that for you. (Typically I don't bother with anything on the
server-side, if I can get away with it, as that makes me more of a geek
than I need to be.)
Closing Tags
Unlike HTML, which lets you open certain elements like <p>, <img>, or <li> without explicitly closing them, in XHTML, you must close all of your tags.
For example, this would be invalid in XHTML:
<ul>
<li>Here is a list item.
<li>Here is another.
</ul>
You must close all of your list items, like this:
<ul>
<li>Here is a list item.</li>
<li>Here is another.</li>
</ul>
Case Sensitivity
While HTML is indifferent to what case your tags are in, XHTML requires that all tags be in lowercase.
For example, you might have an HTML document that you want to convert to XHTML, and all of the tags are in uppercase, like in this:
<A HREF="about.html">About</A>
In order to make this bit of code valid for XHTML, make sure to change the tags to all lower-case:
<a href ="about.html">about</a >
Proper Tag Nesting
Another requirement of XHTML is that all your tags must be properly nested inside of one another. That is, if you open a second tag inside of a first tag, that second tag must be closed before the first one is. The following is an example of improper nesting:
<p>Nest your markup <em>right!</p></em>
Note that the <em> tag was opened after the <p> tag, but closed after the <p> tag was closed. This code should have been marked up like this:
<p>Nest your markup <em>right!</em></p>
Create a basic XHTML page from the Word document found in your lesson3
download folder in the semantic directory (SemanticSchool.doc), making
sure to follow all of the rules. Make sure to construct the document
correctly (declare your DOCTYPE and encoding),
and follow all of the XHTML rules.
When you're done, you can attach the CSS document in your semantic
folder to style it:
<head>
<title>Semantic School</title>
<link rel="stylesheet" type="text/css" href="css/semantic.css" />
</head>
You can check your results against this
page. The finished (validated) version is also available in
the lesson3 download folder, semantic.html.
More XHTML Rules
Closing Tags and Closing Empty Elements
Some tags, like <br> (line breaks) or <hr> horizontal rules, don't actually contain any content. They are known as empty elements. The <p> tag wraps a paragraph, and <li> encloses a list item. But <hr> isn't there to indicate the role of some text—all it does is show where a horizontal line should be placed. So, it doesn't really make sense for it to have a closing tag, since there isn't going to be any content between those tags.
How, then, should you close empty elements in XHTML? You are allowed to add a closing tag to all of these elements, but to save you time and effort, you can also close the opening tag itself (that is, make it "self-closing"), by adding a space and a forward slash ("/") at the end of the tag.
This, for example, is incorrect XHTML:
<p>There once was a man from the sticks <br>
Who liked to compose limericks. <br>
But he failed at the sport, <br>
For he wrote 'em too short. <br>
</p>
This is right, as we are self-closing all of the line breaks:
<p>There once was a man from the sticks <br />
Who liked to compose limericks. <br />
But he failed at the sport, <br />
For he wrote 'em too short. <br />
</p>
Dealing with Many Attributes
XHTML also makes some demands regarding attributes. Attributes are those additional terms and values that can be added to modify elements, such as by adding a width or height to a table cell.
1. Attribute Values Must Be Quoted
HTML allowed for unquoted attribute values, when the values are simply numbers:
<table cellpadding=5>
In XHTML, you must place quotes around all attribute values, whether they are a number or string:
<table cellpadding="5">
2. Do Not Minimize Attributes
HTML allowed for certain attribute to be abbreviated, or minimized. If you wanted to put a check mark in a form's checkbox, you could just code the following:
<input type="checkbox" checked />
In XHTML, however, all attributes must have a value. So, you would have to change the code to the following:
<input type="checkbox" checked ="checked" />
3. The Name and ID Attributes
The name attribute has been used with certain elements (like anchors and form elements) to provide an identifier for the element that can be referenced in other parts of the document. The name attribute, however, is being replaced with the id attribute. The difference between the name and id attribute is that any id value can only be used once per document. For example, suppose you mark up an anchor like this:
<a id="part_one">
That means that you can't call another anchor "part_one" within that document. So, you are forced to give a unique identifier these elements.
Unfortunately, not all browsers yet understand the id attribute. So, for now, to ensure backwards-compatibility, use both the id and name attribute, like this:
<a id= "part_one" name= "part_one">
A bit of a pain, but it is necessary to ensure that your page is fully compatible with all browsers.
4. Some Elements Not Allowed
Strict XHTML specified by the XHTML 1.0 Strict DOCTYPE touts complete separation of presentation from content. There are certain tags and attributes in HTML that are explicitly intended to control the look of page elements, which, according to strict XHTML standards, should be controlled exclusively by CSS. These presentational markup items are thus not allowed in strict XHTML.
Elements not allowed in strict XHTML include:
|
|
|
|
- center
- font
- iframe (which creates a frame that contains another document)
- strike (puts a line through the middle of text)
- u (underlines text)
|
|
|
|
|
Attributes not allowed in strict XHTML:
|
|
|
|
- align (although it is allowed in the following table elements: col, colgroup, tbody, td, tfoot, th, thead, and tr)
- language
- background
- bgcolor
- border (allowed in table elements)
- height (allowed in image and object tags)
- hspace
- name (use id instead)
- noshade
- nowrap
- target
- text, link, vlink, and alink
- vspace
- width (allowed in image tags, and the following table tags: object, table, col, and colgroup)
|
|
|
|
|
In your lesson3 folder, you will find a folder called whisky. Within
this folder is a file called whiskytours.html, also viewable here.
Convert the page to
Strict
XHTML. Make sure you pay attention to case, to which attributes
are allowed, and close all of your elements!
You can check your results against whiskytours_final.html in your
whisky folder, also viewable here.
Troubleshooting Web Page Structure
How and Where to Test
While looking at your page in a browser does not necessarily alert you to coding errors, getting your page to look right in a browser is certainly an essential step in Web page design. Unfortunately, completing this step is often easier said than done.
The most frustrating thing about designing Web pages is the varying ways in which different browsers renders the same code. Even the same browser in a different operating system can interpret a page differently. Despite the existence of Web standards that state how browsers should interpret and render code, different browsers comply with these standards to varying degrees. Internet Explorer, the browser that currently makes up the majority of the computer user base, is, ironically enough, notorious for its non-compliance with Web standards, though 7.0 is a major step up from its predecessor.
What this means, practically speaking, is that a page that looks great in one browser may appear "broken," or look just awful, in another, even if it is technically valid. The Web designer is forced to test her site in various browsers (on various platforms) and employ various hacks and workarounds to get the site to look relatively consistent across browsers. Hopefully, as Web standards compliance improves, the need for such rigorous testing and use of hacks will lessen.
In the meantime, it is still imperative that you test your XHTML documents on various browsers, various browser versions, and, ideally, on different operating systems even if the page validates (conforms to Web standards).
It is best to start with browsers that more strictly adhere to Web standards, fix what is necessary to get the page working, then move on to less compliant browsers. If you start off by accommodating your code browsers that don't interpret the code correctly to begin with, you tend to end up with code that is not clean and not compliant.
Start coding Web designs in a browser like Firefox, Safari, and Opera, which are very standards-compliant. Internet Explorer is known to flout Web standards (though 7 is much better than 6 and below), so you'll want to test your pages on a couple of versions of IE once you get them working in Firefox, Safari, or Opera.
|
Test your work in Firefox, Safari, or Opera first... then Internet
Explorer. |
Getting Access to Different Browsers (on the Cheap)
Being able to test different browsers on different platforms can be a challenge. Internet Explorer 6 and up, for example, is not available for Mac or Linux, and you generally can't install more than one version of Internet Explorer on Windows. And seeing how pages look on different operating systems can be problematic for the majority of us who only use one.
There are, however, a few workarounds to see your pages in various browsers:
|
|
|
|
-
Install a virtual machine or emulator. Using Parallels for OS X or Wine for Linux allows you to run Windows programs while running your native operating system. (And if you use Linux, check out the IEs4Linux project, which installs several versions of Internet Explorer on systems that run Wine.)
-
Install multiple versions of IE on Windows. It can be done! While configuring Windows to run more than one version of IE is beyond the scope of this lecture, you're encouraged to check out the following page and learn how to do this on your own: http://www.positioniseverything.net/articles/multiIE.html
-
Use a Web-based service. Browsershots.org or browsercam.com provides you with screenshots of your page running in various browsers, on various operating systems. While this doesn't give you an interactive view of you page on these various systems, you do get an idea of how it appears.
|
|
|
|
|
Validating XHTML
Once you're satisfied with your design, you'll want to validate your XHTML code to make sure it complies with all of the requirements and rules that we have established so far. A validator is a tool that checks the code for proper XHTML syntax by making sure that all tags are closed and properly nested. If there aren't any syntax errors, then the page is said to validate.
More than likely (particularly when you first start coding pages) your pages won't validate the first time around—it's surprisingly easy to forget a tag or a quote—but the validators are specific as to what the error is and where it is located:
At first, it may seem impossible to validate your code (particularly if you are used to writing malformed HTML), but it becomes easier as you get more used to what clean, syntactical code looks like.
Plenty of freely available validating tools are available:
|
|
|
|
-
The W3C has a markup validation service at http://validator.w3.org. You simply upload your document, or provide its address, and it quickly gives you a validation report.
-
Many robust text and HTML editors include validators (try HTML Kit for XP, or TextMate for OS X).
-
WYSIWYG editors like Dreamweaver also come with their own
built-in validators.
|
|
|
|
|
|
Gaining the coveted seal of approval from a validator. |
Another valuable tool worth mentioning is HTML Tidy. This is a tool
that actually fixes badly formed markup: it adds closing tags,
changes mismatched tags, adds quotes to attribute values, and properly
nests tags that are not nested.
Stand-alone tidy applications are available for various platforms,
and there are online versions available as well. HTML Tidy also comes
with many HTML and text editors. Learn more about it at: http://infohound.net/tidy/
Fixing Broken Images and Links
In addition to checking your page in a browser and validating its markup, you'll also want to make sure your page doesn't contain any broken links or images. You'll probably want to publish your page in order to check your links (as, ultimately, you'll want to make sure all of the linked documents and files are located on the server where they are supposed to be).
Finding broken images is relatively easy: It will be obvious when
you look at the page in a browser whether it pulls up the images or
not.
Finding broken links is bit trickier. You could click on all
of your hyperlinks, one by one, to check that they link to the right
document. This can be quite tedious, however, when you have dozens
of pages to check, each with dozens of links per page.
Thank goodness there are tools that check your links for you: The W3C has a link checker on its site, at http://validator.w3.org/checklink. And Dreamweaver also includes link-checking tools.
|
A large, disorganized site may turn
up many broken links, as shown here in the Dreamweaver Link Checker
report. |
Once you find a broken image or link, you must fix it. If the link is pointing to a page outside of your site, it may be that it no longer exists or has moved elsewhere. You might also want to check that such links are referenced as absolute links (make sure that its address is prefaced by "http://" in the anchor tag).
For broken links that reference documents on your local site, you'll have to check to make sure that (1) the address wasn't simply mistyped, (2) the referenced document or link exists, and (3) that the referenced document or link is in the location specified by the hyperlink. It's all too easy to move a document or file within a Web site, and then forget to change all of the links that reference it.
Check out your page in various browsers, and don't stop until your page validates.
As you've learned, creating valid XHTML pages is essential to preventing user errors or problems when making future changes to your Web page. In the following video tutorial, you can watch the validation process in action and how to fix common validation problems:
Running Time: 11:14. Read
the Transcript