Categories
Google Mozilla Web Development

Adventures With document.documentElement.firstChild

Here’s an interesting DOM test-case I ran across inadvertently yesterday.

For the purpose of this post assume the following markup:

< !DOCTYPE html>
<html>
<!– i broke the dom –>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>Testcase</title>
</head>
<body>
<p>Something</p>
</body>
</html>

If I use document.documentElement.firstChild I don’t get consistent behavior. In Firefox and IE I get the <head/> element, which is what I was initially expecting. In WebKit (Safari/Chrome) and Opera. I get the HTML comment which I wasn’t.

I think WebKit and Opera are technically correct on this as the DOM Level 2 specs state:

firstChild of type Node, read only
The first child of this node. If there is no such node, this
returns null.

A COMMENT_NODE is a node and therefore should have been first. As for the position of the comment, the document is valid HTML5 and also is valid as XHTML 1.0 Strict and HTML 4 Strict. My interpretation is that this means indeed the comment is the first valid node in the documentElement.

One of the reasons why I even thought to use document.documentElement.firstChild is that I saw Google doing it the other day for the new asynchronous tracking code for Google Analytics (currently in beta). Originally the code was:

var _gaq = _gaq || [];
  _gaq.push([‘_setAccount’, ‘UA-XXXXX-X’]);
  _gaq.push([‘_trackPageview’]);
 
  (function() {
    var ga = document.createElement(‘script’);
    ga.src = (‘https:’ == document.location.protocol ? ‘https://ssl’ :
        ‘http://www’) + ‘.google-analytics.com/ga.js’;
    ga.setAttribute(‘async’, ‘true’);
    document.documentElement.firstChild.appendChild(ga);
  })();

It has now been updated to prevent this problem. I don’t know if I was the first to report it or if it was already known by the Google engineers. The code, still in beta is now:

var _gaq = _gaq || [];
  _gaq.push([‘_setAccount’, ‘UA-XXXXX-X’]);
  _gaq.push([‘_trackPageview’]);
 
  (function() {
    var ga = document.createElement(‘script’); ga.type = ‘text/javascript’; ga.async = true;
    ga.src = (‘https:’ == document.location.protocol ? ‘https://ssl’ : ‘http://www’) + ‘.google-analytics.com/ga.js’;
    (document.getElementsByTagName(‘head’)[0] || document.getElementsByTagName(‘body’)[0]).appendChild(ga);
  })();

The new code seems a bit more resilient. They also got rid of the longhand ga.setAttribute in favor of just ga.async and added the type attribute.

There is a test case for anyone who wants to try it. I haven’t found a relevant Mozilla bug.

10 replies on “Adventures With document.documentElement.firstChild”

Thanks for writing this! Its good to get this documented. We were aware of the issue…we just underestimated the number of sites with this with comments above the head. We made the wrong tradeoff. The new code isn’t as terse, but it should cover pretty much any page you throw at it.

The DOM spec is the wrong place to look for this, since your question is really what the DOM should look like. The relevant spec there would be the one that covers how to convert HTML source into a DOM: the HTML parsing spec. There isn’t one at the moment, though HTML5 is working on it. So currently behavior is undefined. Not sure what the HTML5 draft proposes for the behavior.

In particular, whitespace before is treated magically in various UAs and in the HTML5 draft last I checked; comments may or may not be depending. It’s interesting that you didn’t expect firstChild to be the textnode coming before the comment; why not?

@google analytics: The new code I think is pretty solid since it leaves little to chance. I’m pretty sure even I won’t manage to break it, at least from the implementers standpoint.

@Boris: My expectations were admittedly based on previous experience more than specs. Not sure if I’ve ever run across a situation exactly like this before.

@Matt: I’m aware of document.head, though I don’t foresee using that for quite some time when 99% of the world runs a web browser that will work on. At the rate we’re going that’s 2029.

If you need to access document.head more than once it is probably worth adding it (if it doesn’t already exist):

if (!document.head) document.head = document.getElementsByTagName(“head”)[0];

This should always work in HTML pages, even if the script occurs before the tag.

This post is a bit confusing on planet mozilla, as the HTML is partly parsed instead of displayed, so the comment is missing.

Leave a Reply

Your email address will not be published. Required fields are marked *