"This Page Is Valid HTML 4.01 Strict!" means less than you may expect.
Many people probably think that their documents conform to the HTML specification if they pass the W3C validator. But this is not true and the validator does not make such a claim.
If you take a closer look at the FAQ you read:
It only means that a tool (not necessarily without flaws) has found the page to comply with a specific set of rules.
[...]
The Validator is based on James Clark's nsgmls SGML parser.
nsgmls is a generic SGML parser that has no knowledge about the HTML specification - except for the HTML DTD (the specific set of rules
) which is referenced by the document's document type declaration (the <!DOCTYPE ..> line of the document). But the DTD does not have the expressive power to formalize all aspects of the HTML specification.
A simple example is the href attribute. The HTML specification says that this attribute must contain a URI as defined in RFC2396. But there is no way to enforce this in the DTD. The DTD declares the value as CDATA, which basically means an arbitrary text. And this is what nsgmls can verify.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<title>Illegal HREF</title>
</head>
<body>
<p><a href= "http://www.example.org/Håkon">This is not a valid URI</a></p>
</body>
</html>
The document above passes the W3C validator,but does not conform to the HTML specification.
There is an important difference between valid and conforming. The term valid is probably defined in the specification of SGML (I say probably because I have not read the SGML specification) and covers only a subset of the rules that a document must fulfill in orderto conform to the HTML specification.
Does anyone know of a tool that does a complete check of all normative rules in the HTML specification?
- Written on July, 03 2005 at 18:51
You are reading the (archived) weblog of Benjamin Niemann. This weblog has been closed, no new articles will be posted here.
If you can read german, you may have a look at my new weblog.