Link to USGS home page
Internal USGS Access Only

The Case for Valid html

The Web is being held back by sloppy markup. What is the role for validation services? How much can be done to automate fixup of bad markup? What roles should ISPs, Content providers, vendors of editing tools, and W3C take?

From " Shaping the Future of HTML" a WC3 sponsored conference announcement.

Strangely enough, there are almost no WYSIWYG or conversion tools that produce completely valid html (For example, Word "SaveAs html" output is awful). This has come about because the guidelines suggest that browsers should be very forgiving in what they accept. As a result even badly constructed html will not usually crash browsers, and they will display as much of a page as they can make sense of. This leeway has provided little incentive to producing html that is valid.

Reasons for Producing Valid Html (when it seems to display OK anyway):

Craftsmanship
If one is expending the effort to provide a product or service, strong attention to quality can only enhance the result.
Inclusiveness
Bad html which looks OK in Netscape and IE may still break some other browser. The best way to make sure that all browsers can display a page is to stick to the standards for html. We have to make sure everyone can read our information. It is also easier to use valid html than to acquire a copy of all the browsers that might be in use: IE, Netscape, Opera (popular in Europe), Safari (an Apple product), etc...
Longevity
A case study:

An early version of Netscape was quite forgiving about unclosed quotation marks in anchors. But then Netscape released a new version, with all sorts of new bells and whistles that everyone seemed to want, so people downloaded it en masse. And when those people went to sites whose authors hadn't bothered to check that all their quotes were closed, they found big chunks of pages missing and links not working. All because Netscape had changed the implementation of their parser, and the tolerance for unclosed quotes in the earlier version wasn't a designed feature, it was simply an accidental artifact of a particular implementation decision.

Validation is the only practical way to catch errors like this.

-- quoted from Eric Bohlman <ebohlman@netcom.com>, March 1998

Portability
If one wanted to import html into an html editor, or to convert it to same other format, or even to upgrade the version of html using some automated procedure, the only way to be sure that the file will be read/translated correctly is to use valid html. As an example, conversion of html to xml currently requires valid html.
Accessibility
Valid html is a level 2 WAI accessibility requirement. You probably won't be able to purchase a screen reader to see how invalid html might be rendered.

A good overview on the reasons for producing valid html is HTML Standards Compliance - Why Bother? By Alan Richmond

The USGS is an information repository. For throw-away pages such as advertisements, announcements, or schedules, use whatever tools are easiest. Reports and other scientific data make the USGS more akin to a library than a promotional site. For these purposes the quality of the html matters.

Tools for Validating HTML

There are many "half-witted" HTML validators out there. If the product does not use SGML and DTDs, it is not a full-fledged validation product and can miss errors.

For Windows
"CSE HTML Validator" ($69), and "A Real Validator" ($25).
For Unix
Validate.
Online
Although less convenient, several online validators are available:

HTML Repair

A later section of this course, HTML Converters, covers methods for obtaining valid HTML from various sources.

slide 16

Lab Exercise on html


[up]
"Mastering a Web Site" online course
Created and maintained by Lorna Schmid and David Boldt.
http://water.usgs.gov/usgs/training/webmaster/valid_html.html    
Last modified: Sun May 8 17:58:52 EDT 2005