Full SPDY Ahead

For those not keeping score, Twitter, and Facebook have both come out publicly in favor of SPDY. Twitter is already using it in production. It sounds like Facebook will be soon. Mozilla implemented it in Firefox. Opera has SPDY. Google, the author of SPDY is using it in production.

This leaves Microsoft and Apple as the holdouts. Microsoft’s HTTP + Mobility is SPDY at it’s core. Microsoft hasn’t started supporting SPDY in any products, but it seems inevitable at some point. They are a holdout in implementation but not opposed to SPDY it seems.

Apple is the last major holdout. SPDY hasn’t been announced for iOS 6 or Mac OS X 10.8. As far as I’m aware Apple hasn’t made any statement suggesting support or opposition to SPDY. However I can’t see why they would oppose it. There’s nothing for them to disapprove of, other than it’s not using their IP. I’d be surprised if they don’t want to implement it.

However given SPDY is a rather backwards compatible thing to support, I don’t see this holding back adoption. Nginx is adding support for SPDY (thanks to WordPress creator Automattic), and Google is working on mod_spdy for Apache. That makes adoption for lots of large websites possible.

While the details of SPDY and the direction it will go are still in flux, it seems nearly certain that SPDY is the future of the web. Time to start digging into how to adopt it and ease the transition. The primary concerns I see are as follow:

  1. TLS Required – While not explicitly required, SPDY essentially builds on TLS and virtually any real world application needs it. This means purchasing SSL certificates for any website you wish to use SPDY with. Some have argued performance and scalability, but Google, Facebook and Twitter use SSL extensively on commodity hardware.
  2. IP Address – Unless you use Server Name Indication (SNI), which almost no websites do because of compatibility, you need an IP address for every hostname that you use TLS with. That means until IPv6 is widely adopted, it will be putting further strain on the remaining IPv4 pool.

Both of the above concerns increase complexity and cost of building websites at scale and for those who are on a very tight budget (the rest of us will manage). Because of this, I don’t think we’ll see a 100% SPDY or HTTP 2.0 web for quite some time. Don’t expect SPDY for shared hosting sites anytime soon.

In a world of increasing surveillance and user data being integrated into everything, the benefits of TLS will be realized. Both Facebook and Twitter acknowledge it’s importance in preventing user data from getting into the wrong hands.

I, For One, Welcome Our New SPDY overlord.

HTTP Status 451 – The HTTP Status At Which Requests Burn

Tim Bray is proposing a new HTTP status, 451 for:

…when resource access is denied for legal reasons. This allows server operators to operate with greater transparency in circumstances where issues of law or public policy affect their operation. This transparency may be beneficial both to these operators and to end users.

This is awesome and I 100% support this idea. I’d even like to see governments mandate that 451′s be used. Of course North Korea, Iran, and China would never follow along, but hopefully most western countries would.

It’s also a very fitting tribute to Ray Bradbury, the author of Fahrenheit 451 who recently passed away.

BitTorrent For HTTP Failover

There is a proposal circulating around the web to create a X-Torrent HTTP header for the purpose of pointing to a torrent file as an alternative way to download a file from an overloaded server. I’ve been an advocate of implementing BitTorrent in browsers in particular Firefox since at least 2004 according to this blog, and like the idea in principal but don’t like the implementation proposed.

The way the proposal would work is a server would send the X-Torrent HTTP header and if the browser chose to use the BitTorrent it would do that rather than leach the servers bandwidth. This however fails if the server is already overloaded.

Unnecessary Header

This is also a little unnecessary since browsers automatically send an Accept-Encoding Requests header which could contain support for torrents removing the need for a server to send this by default. Regardless the system still fails if the server is overloaded.

Doesn’t Failover

A nicer way would be to also utilize DNS which is surprisingly good at scaling for these types of tasks. It’s already used for similar things like DNSBL and SPF.

Example

Assume my browser supports the BitTorrent protocol and I visit the following url for a download:

http://dl.robert.accettura.com/pub/myfile.tar.gz

My request would look something like this:

Get: /pub/myfile.tar.gz
Host: dl.robert.accettura.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5
Accept: */*
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate,torrent
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://robert.accettura.com/download

The servers response would look something like this:

Date: Sun, 18 Jan 2009 00:25:54 GMT
Server: Apache
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-bittorrent

The content would be the actual torrent. The browser would handle as appropriate by opening a helper application or handling it internally. If I didn’t have torrent in my Accept-Encoding header, I would have been served via HTTP like we are all accustomed.

Now what happens if the server is not responding? A fallback to the DNS level could be done.

First take the GET and generate a SHA1 checksum for the GET, in my example that would be:

438296e855494825557824b691a09d06a86a21f1

Now to generate a DNS Query in the format [hash]._torrent.[server]:

438296e855494825557824b691a09d06a86a21f1._torrent.dl.robert.accettura.com

The response would look something like a Base64 encoded .torrent file broken up and served as TOR or TXT records. Should the string not fit in one record (I think the limit is 512 bytes) the response could be broken up into multiple records and concatenated by the client to reassemble.

Odds of a collision with existing DNS space is limited due to the use of a SHA1 hash and the _torrent subdomain. It coexists peacefully.

Downside

The downside here is that if your server fails your DNS is going to take an extra query from any client capable of doing this. There is slight latency in this process.

Upside/Conclusion

The upside is that DNS scaling has come a long way and is rarely an issue for popular sites and web hosts. DNS can (and often is) cached by ISP’s resulting in an automatic edge CDN thanks to ISP’s. ISP’s can also mitigate traffic on their networks by caching on their side (something I also suggested in 2004).

BitTorrent may be used for illegal content, but so is HTTP. I think costs for ISP’s and websites could be significantly cut by making BitTorrent more transparent as a data transfer protocol.

W3C On DTD Perversions

According to the W3C Systeam’s blog, there’s a lot of poorly designed software out there. It’s pretty rare that something has a legitimate need to pull down a DTD in order to work. They should never be requesting it on a very frequent basis. It’s a very cachable asset. The post includes some pretty impressive stats too:

..up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven’t changed in years.

They also make a few requests which really all developers should follow. Here’s my summary:

  • Cache as much as possible, to minimize your impact on others (not to mention improve your performance).
  • Respect caching headers
  • Don’t fetch what you don’t need
  • Identify yourself. Don’t use a generic UA.
  • Try not to suck.

Snoopy’s Relative Redirect Bug

Snoopy is a PHP class that automates many common web browsing functions making it easier to fetch and navigate the web using PHP. It’s pretty handy. I found an interesting bug recently and diagnosed it this afternoon.

If you navigate to a 301 or 302 redirect in a subdirectory you can get something like this:

HTTP/1.1 302
Date: Sat, 13 Oct 2007 20:26:46 GMT
Server: Apache/1.3.33 (Unix)
Location: destination.xml
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

The key thing to pay attention to here is Location: destination.xml. Say your initial request was to:

http://somesite.tld/directory/request.xml

Our next request based on the redirect should be to:

http://somesite.tld/directory/desination.xml

Instead what Snoopy is doing is appending to the hostname, resulting in an incorrect request:

http://somesite.tld/request.xml

This is correct in cases where the first character of a redirect location contains a “/”. In this case it does not, which makes it incorrect. The following patch I wrote corrects this behavior. As far as I can tell (I haven’t read every word of the spec, but many chunks over the years) the HTTP 1.1 specs RFC 2616 only dictate that URI be provided, it doesn’t seem to require full url’s. See comments for follow up discussion on the specs. My conclusion is that it’s best practice but not required to use absolute uri’s). I wouldn’t call this a very common practice, but it does exist in the wild.

— Snoopy.class.php    2005-11-08 01:55:33.000000000 -0500
+++ Snoopy-patched.class.php    2007-10-13 16:10:38.000000000 -0400
@@ -871,8 +871,18 @@
                                // look for :// in the Location header to see if hostname is included
                                if(!preg_match("|\:\/\/|",$matches[2]))
                                {
                                        // no host in the path, so prepend
                                        $this->_redirectaddr = $URI_PARTS["scheme"]."://".$this->host.":".$this->port;
+                                       // START patch by Robert Accettura
+                                       // Make sure to keep the directory if it doesn’t start with a ‘/’
+                                       if($matches[2]{0} != ‘/’)
+                                       {
+                                               list($urlPath, $urlParams) = explode(‘?’, $url);
+                                               $urlDirPath = substr($urlPath, 0, strrpos($urlPath, ‘/’)+1);
+                                               $this->_redirectaddr .= $urlDirPath;
+                                       }
+                                       // END patch by Robert Accettura
+
                                        // eliminate double slash
                                        if(!preg_match("|^/|",$matches[2]))
                                                        $this->_redirectaddr .= "/".$matches[2];

Code provided in this post is released under the same license as Snoopy itself (GNU Lesser General Public License).

Hopefully that solves this problem for anyone else who runs across it. It also teaches a good lesson about redirects. I bet this isn’t the only code out there that incorrectly handles this. Most redirects don’t do this, but there are a few out there that will.