Snoopy’s Relative Redirect Bug

Snoopy is a PHP class that automates many common web browsing functions making it easier to fetch and navigate the web using PHP. It’s pretty handy. I found an interesting bug recently and diagnosed it this afternoon.

If you navigate to a 301 or 302 redirect in a subdirectory you can get something like this:

HTTP/1.1 302
Date: Sat, 13 Oct 2007 20:26:46 GMT
Server: Apache/1.3.33 (Unix)
Location: destination.xml
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

The key thing to pay attention to here is Location: destination.xml. Say your initial request was to:

http://somesite.tld/directory/request.xml

Our next request based on the redirect should be to:

http://somesite.tld/directory/desination.xml

Instead what Snoopy is doing is appending to the hostname, resulting in an incorrect request:

http://somesite.tld/request.xml

This is correct in cases where the first character of a redirect location contains a “/”. In this case it does not, which makes it incorrect. The following patch I wrote corrects this behavior. As far as I can tell (I haven’t read every word of the spec, but many chunks over the years) the HTTP 1.1 specs RFC 2616 only dictate that URI be provided, it doesn’t seem to require full url’s. See comments for follow up discussion on the specs. My conclusion is that it’s best practice but not required to use absolute uri’s). I wouldn’t call this a very common practice, but it does exist in the wild.

— Snoopy.class.php    200511-08 01:55:33.000000000 -0500
+++ Snoopy-patched.class.php    2007-10-13 16:10:38.000000000 -0400
@@ -871,8 +871,18 @@
                                // look for :// in the Location header to see if hostname is included
                                if(!preg_match("|\:\/\/|",$matches[2]))
                                {
                                        // no host in the path, so prepend
                                        $this->_redirectaddr = $URI_PARTS["scheme"]."://".$this->host.":".$this->port;
+                                       // START patch by Robert Accettura
+                                       // Make sure to keep the directory if it doesn’t start with a ‘/’
+                                       if($matches[2]{0} != ‘/’)
+                                       {
+                                               list($urlPath, $urlParams) = explode(‘?’, $url);
+                                               $urlDirPath = substr($urlPath, 0, strrpos($urlPath, ‘/’)+1);
+                                               $this->_redirectaddr .= $urlDirPath;
+                                       }
+                                       // END patch by Robert Accettura
+
                                        // eliminate double slash
                                        if(!preg_match("|^/|",$matches[2]))
                                                        $this->_redirectaddr .= "/".$matches[2];

Code provided in this post is released under the same license as Snoopy itself (GNU Lesser General Public License).

Hopefully that solves this problem for anyone else who runs across it. It also teaches a good lesson about redirects. I bet this isn’t the only code out there that incorrectly handles this. Most redirects don’t do this, but there are a few out there that will.

4 thoughts on “Snoopy’s Relative Redirect Bug

  1. Now you made me dig even deeper 😉 .

    14.30 clearly states:

    For 3xx responses, the location SHOULD indicate the server’s preferred URI for automatic redirection to the resource. The field value consists of a single absolute URI.

    Note the key word “SHOULD” according to section 1.2 is defined in accordance with RFC 2119. According to RFC2119 the terminology “SHOULD” is to be interpreted:

    3. SHOULD This word, or the adjective “RECOMMENDED”, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

    It’s not required. It’s best practice to use absolute URI.

  2. good!
    when i get the web:”http://www.56php.com/portal.php?mod=list&catid=39″ with snoopy,it get something like this:
    HTTP/1.1 302 Moved Temporarily
    but the web is:HTTP/1.1 200 OK(look this:”http://www.mjjer.com/gethttpheader.php?url=www.56php.com%2Fportal.php%3Fmod%3Dlist%26catid%3D39″)

    i can’t fix the bug.can you help me?(my english is very poor,I hope you can understand O(∩_∩)O)

    用snoopy抓取页面:
    http://www.56php.com/portal.ph.....8;catid=39 没有获取到内容,查看头响应,返回 302状态,可是我用其他工具去查这个页面,发现返回200,并没有302状态,可见,这是snoopy的一个bug。不知道然后解决?

Leave a Reply

Your email address will not be published. Required fields are marked *