Snoopy is a PHP class that automates many common web browsing functions making it easier to fetch and navigate the web using PHP. It’s pretty handy. I found an interesting bug recently and diagnosed it this afternoon.
If you navigate to a 301 or 302 redirect in a subdirectory you can get something like this:
HTTP/1.1 302 Date: Sat, 13 Oct 2007 20:26:46 GMT Server: Apache/1.3.33 (Unix) Location: destination.xml Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8
The key thing to pay attention to here is Location: destination.xml
. Say your initial request was to:
http://somesite.tld/directory/request.xml
Our next request based on the redirect should be to:
http://somesite.tld/directory/desination.xml
Instead what Snoopy is doing is appending to the hostname, resulting in an incorrect request:
http://somesite.tld/request.xml
This is correct in cases where the first character of a redirect location contains a “/”. In this case it does not, which makes it incorrect. The following patch I wrote corrects this behavior. As far as I can tell (I haven’t read every word of the spec, but many chunks over the years) the HTTP 1.1 specs RFC 2616 only dictate that URI be provided, it doesn’t seem to require full url’s. See comments for follow up discussion on the specs. My conclusion is that it’s best practice but not required to use absolute uri’s). I wouldn’t call this a very common practice, but it does exist in the wild.
Code provided in this post is released under the same license as Snoopy itself (GNU Lesser General Public License).
Hopefully that solves this problem for anyone else who runs across it. It also teaches a good lesson about redirects. I bet this isn’t the only code out there that incorrectly handles this. Most redirects don’t do this, but there are a few out there that will.
4 replies on “Snoopy’s Relative Redirect Bug”
Actually, according to section 14.30, the Location header has to contain an absolute URI, so I think Snoopy’s behavior is correct… (but a lot of sites get this wrong, I guess).
http://www.w3.org/Protocols/rf.....l#sec14.30
Now you made me dig even deeper 😉 .
14.30 clearly states:
Note the key word “SHOULD” according to section 1.2 is defined in accordance with RFC 2119. According to RFC2119 the terminology “SHOULD” is to be interpreted:
It’s not required. It’s best practice to use absolute URI.
Ah, I guess you’re right.
good!
when i get the web:”http://www.56php.com/portal.php?mod=list&catid=39″ with snoopy,it get something like this:
HTTP/1.1 302 Moved Temporarily
but the web is:HTTP/1.1 200 OK(look this:”http://www.mjjer.com/gethttpheader.php?url=www.56php.com%2Fportal.php%3Fmod%3Dlist%26catid%3D39″)
i can’t fix the bug.can you help me?(my english is very poor,I hope you can understand O(∩_∩)O)
用snoopy抓取页面:
http://www.56php.com/portal.ph.....8;catid=39 没有获取到内容,查看头响应,返回 302状态,可是我用其他工具去查这个页面,发现返回200,并没有302状态,可见,这是snoopy的一个bug。不知道然后解决?