Snoopy is a PHP class that automates many common web browsing functions making it easier to fetch and navigate the web using PHP. It’s pretty handy. I found an interesting bug recently and diagnosed it this afternoon.
If you navigate to a 301 or 302 redirect in a subdirectory you can get something like this:
HTTP/1.1 302 Date: Sat, 13 Oct 2007 20:26:46 GMT Server: Apache/1.3.33 (Unix) Location: destination.xml Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8
The key thing to pay attention to here is Location: destination.xml
. Say your initial request was to:
http://somesite.tld/directory/request.xml
Our next request based on the redirect should be to:
http://somesite.tld/directory/desination.xml
Instead what Snoopy is doing is appending to the hostname, resulting in an incorrect request:
http://somesite.tld/request.xml
This is correct in cases where the first character of a redirect location contains a “/”. In this case it does not, which makes it incorrect. The following patch I wrote corrects this behavior. As far as I can tell (I haven’t read every word of the spec, but many chunks over the years) the HTTP 1.1 specs RFC 2616 only dictate that URI be provided, it doesn’t seem to require full URL’s. See comments for follow up discussion on the specs. My conclusion is that it’s best practice but not required to use absolute URI’s). I wouldn’t call this a very common practice, but it does exist in the wild.
Code provided in this post is released under the same license as Snoopy itself (GNU Lesser General Public License).
Hopefully that solves this problem for anyone else who runs across it. It also teaches a good lesson about redirects. I bet this isn’t the only code out there that incorrectly handles this. Most redirects don’t do this, but there are a few out there that will.