Categories
Google Security Web Development

Phorm’s UserAgent

There’s a fair amount of controversy regarding Phorm a company who plans to target advertising by harvesting information via deep packet inspection. They are already in talks with several ISP’s. I’ll leave the debate over Phorm from a user perspective for someplace else.

They claim to offer ways to let websites opt out of their tracking but it’s a true double-edged sword as they don’t play nice with a standard robots.txt file. Take a look at what they are doing here:

The Webwise system observes the rules that a website sets for the Googlebot, Slurp (Yahoo! agent) and “*” (any robot) user agents. Where a website’s robots.txt file disallows any of these user agents, Webwise will not profile the relevant URL. As an example, the following robots.txt text will prevent profiling of all pages on a site:

Rather than use a unique user agent they are copying that of Google and Yahoo. The only way to block them via a robots.txt file is to tell one of the two largest search engines in the western world not to index your site. This seems fundamentally wrong.

There is an email address where you can provide a list of domains to exclude, but that requires intervention and updating a list of domains when you create a site. This obviously doesn’t scale.

Now I’m curious. Is piggybacking off of another companies user agent considered a trademark violation? From what I understand they aren’t broadcasting it, just honoring it. If I were Google or Yahoo I’d be pretty annoyed. Particularly Yahoo since there are websites who will just block Slurm given Google’s dominance in search. Yes there are many user-agent spoofing products out there (including wget and curl), but nobody to my knowledge is crawling web pages for a commercial purpose hiding behind another company name.

robots.txt is a somewhat flawed system as not all user agents even obey it (sadly) though it’s one of the only defenses without actual blocks that exist.

3 replies on “Phorm’s UserAgent”

Many people in the UK thought the same about Phorm/BT Websise, regarding robots.txt

BT and Phorm seem to rewrite UK law and web standards as they go, so the IT community of the UK (or some of them) wrote a new standard themselves! Violate it at your peril.

http://www.parasitestxt.org/ (It’s a good read, have a wonder around)

Already it has scared the Amazon and Wiki empires off.

http://news.bbc.co.uk/1/hi/technology/7999635.stm

http://techblog.wikimedia.org/.....-of-phorm/

And the European Union looks as if they have been forced to enshrine it into the European Convention on Human Rights.

http://news.bbc.co.uk/1/hi/technology/7998009.stm

So, dear reader, learn the lesson; abuse robots.txt at your peril!

Sorry, me again.

Just to further your knowledge of the ‘stranglehold’ phorm has in the UK. At a recent briefing in the UK parliament, in front of MP’s and Lords, the CEO of phorm accused Tim Berners-Lee of being a neo-luddite who did not understand the internet.

https://nodpi.org/wp-content/uploads/2008/06/priceless.gif

Hope you had the same ad campaign, else you won’t get that.

Leave a Reply to Mark Cancel reply

Your email address will not be published. Required fields are marked *