Categories
Open Source Web Development

Snoopy’s Relative Redirect Bug

Snoopy is a PHP class that automates many common web browsing functions making it easier to fetch and navigate the web using PHP. It’s pretty handy. I found an interesting bug recently and diagnosed it this afternoon.

If you navigate to a 301 or 302 redirect in a subdirectory you can get something like this:

HTTP/1.1 302
Date: Sat, 13 Oct 2007 20:26:46 GMT
Server: Apache/1.3.33 (Unix)
Location: destination.xml
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

The key thing to pay attention to here is Location: destination.xml. Say your initial request was to:

http://somesite.tld/directory/request.xml

Our next request based on the redirect should be to:

http://somesite.tld/directory/desination.xml

Instead what Snoopy is doing is appending to the hostname, resulting in an incorrect request:

http://somesite.tld/request.xml

This is correct in cases where the first character of a redirect location contains a “/”. In this case it does not, which makes it incorrect. The following patch I wrote corrects this behavior. As far as I can tell (I haven’t read every word of the spec, but many chunks over the years) the HTTP 1.1 specs RFC 2616 only dictate that URI be provided, it doesn’t seem to require full URL’s. See comments for follow up discussion on the specs. My conclusion is that it’s best practice but not required to use absolute URI’s). I wouldn’t call this a very common practice, but it does exist in the wild.

— Snoopy.class.php    200511-08 01:55:33.000000000 -0500
+++ Snoopy-patched.class.php    2007-10-13 16:10:38.000000000 -0400
@@ -871,8 +871,18 @@
                                // look for :// in the Location header to see if hostname is included
                                if(!preg_match("|\:\/\/|",$matches[2]))
                                {
                                        // no host in the path, so prepend
                                        $this->_redirectaddr = $URI_PARTS["scheme"]."://".$this->host.":".$this->port;
+                                       // START patch by Robert Accettura
+                                       // Make sure to keep the directory if it doesn’t start with a ‘/’
+                                       if($matches[2]{0} != ‘/’)
+                                       {
+                                               list($urlPath, $urlParams) = explode(‘?’, $URL);
+                                               $urlDirPath = substr($urlPath, 0, strrpos($urlPath, ‘/’)+1);
+                                               $this->_redirectaddr .= $urlDirPath;
+                                       }
+                                       // END patch by Robert Accettura
+
                                        // eliminate double slash
                                        if(!preg_match("|^/|",$matches[2]))
                                                        $this->_redirectaddr .= "/".$matches[2];

Code provided in this post is released under the same license as Snoopy itself (GNU Lesser General Public License).

Hopefully that solves this problem for anyone else who runs across it. It also teaches a good lesson about redirects. I bet this isn’t the only code out there that incorrectly handles this. Most redirects don’t do this, but there are a few out there that will.