Privacy Issues Behind localStorage

Browsers need to overhaul their privacy settings to account for things like localStorage and bring control back to the user. In the days of cookies it was relatively simple for a user to wipe any identifiers (excluding IP address) from their browser. Simply clear cookies.

Firefox has two basic abilities, you can clear cookies, or you can browse and delete cookies. That’s great but not terribly clear that there’s more than cookies.

Firefox Cookie Privacy

Chrome as far as I know has no cookie browser like Firefox has, but (edit: Erunno notes in the comments you can via chrome://settings/cookies) explicitly lets you “Delete cookies and other site and plug-in data”. That’s pretty good.

Chrome Cookie Privacy

Today, I think Safari’s UI is the closest to perfect. Each hostname shows exactly what it has. My only gripe is that Safari doesn’t let you see what’s there. That’s a “power-user” feature however and I think it does an adequate job regardless.

Safari Cookie Privacy

Websites use more than just cookies these days. I discussed this a little over a year ago. The reason evercookie is controversial is that browsers don’t quite give users the level of control (real or perceived) that they expect for objects other than cookies.

Here is another use case for why this is needed. Google Analytics is used on perhaps half the internet’s websites. It sets a cookie every time. That means 230 bytes added to every http request for a lot of websites. Google could switch to localStorage and free up that 230 bytes. While they technically could do this, in practice, this could create a firestorm of attacks against them. The problem is it would be spun as Google trying to evade cookie deletion and and a privacy violation. The same storm that evercookie created. I suspect that’s why it hasn’t been done to date. The truth is the Google Analytics team has done a lot for improving performance including making it entirely async. But this move would be controversial.

It’s no longer about “cookies”, but “user data”.

Smartphone Guest Mode

A very good idea by Greg Kumparak on TechCrunch:

Here’s the dream: one lock-screen, two PINs. One for me, one for anyone else who might use my phone but doesn’t necessarily need to see everything.

Not only is it a good idea for there to be a guest mode, the implementation is quite nice and simple. Maps, Phone, Clock, Calculator, Safari. Perhaps the ability to granularity add/remove from that default set. Everything is stateless and rest when guest mode ends.

This could potentially even lower the divorce rate in the US.

On Apple’s Location Tracking

The controversy over Apple’s “Location Tracking” is quite interesting. It’s worth making clear that the nodes stored in the database are approximations of cell phone towers and WiFi hotspots you’re likely to encounter rather than your location(s) at any given point in time. It’s a way to “prime the well” when doing a GPS lookup to improve performance.

Apple notably failed in a few key ways which should serve as a lesson to others:

  1. Always disclose what you’re doing. – Never just assume what you’re doing with someone’s information is cool. Apple could have mitigated a lot of this had they disclosed what the phone was actually doing from day 1. Never transmit anonymous or personal information without letting the user know first.
  2. Never store more than you need – I can’t believe how many companies mess this up. Storing user information is a liability. A good business limits it’s liabilities to only what’s necessary to conduct business. Storing so much data, and not expunging was a very bad move and amplified the situation. On top of not letting users know what was going on, there was no way to purge information. This just made things much worse. Apple went as far as backing up what should be an expendable cache.
  3. Always be paranoid with information – Apple states “The local cache is protected with iOS security features, but it is not encrypted. Beginning with the next major release of iOS, the operating system will encrypt any local cache of the hotspot and cell tower location information.” in the response to Edward J. Markey. This should have been encrypted since day 1. Various tools existed for a few years that could read this data in the surveillance community. Apple undoubtedly knew people were using this data sometimes for illicit purposes. No company has gotten in trouble for being to secure with customer information with anyone other than the NSA or FBI.

It’s worth noting that their software update in response to this controversy is actually pretty good and pretty thorough. I’m surprised they couldn’t quickly shim some encryption around it. The iOS is loaded with enough DRM and crypto.

On another note, I fully expect some court cases to be reopened now that “cell phone records” are not quite as accurate as they were falsely billed to be. Also companies who marketed software are capable of showing a users location history may be liable as this wasn’t accurately vetted. If they did good testing they would have seen the extent of it’s “tracking”. It seems inevitable.

Lastly, I wonder how much battery life, and how much bandwidth this was utilizing. Some customers are on metered WiFi (especially some hotspots). To geo-tag one must turn on GPS, meaning battery life was being drained behind the scenes.

Apple’s full response can be found on Congressman Ed Markey’s website (copied here for perpetuity).

On HTML5 And The Future Of Privacy

Today’s alarmist without much research news is “New Web Code Draws Concern Over Risks to Privacy” about HTML5 and its threat to privacy. How evil of HTML5 and its creators.

The Real Deal

Persistent cookies are nothing new. Essentially the strategy works like this: Store data everywhere you can on the users footprint, and if data it deleted in a few locations, you copy it back from another location the next time you can. It’s regenerative by design. A popular example is evercookie which uses:

  • Standard HTTP Cookies
  • Local Shared Objects (Flash Cookies)
  • Storing cookies in RGB values of auto-generated, force-cached PNGs using HTML5 Canvas tag to read pixels (cookies) back out
  • Storing cookies in and reading out Web History
  • Storing cookies in HTTP ETags
  • Internet Explorer userData storage
  • HTML5 Session Storage
  • HTML5 Local Storage
  • HTML5 Global Storage
  • HTML5 Database Storage via SQLite

Note that several of these aren’t HTML5 specific. More than one of which isn’t cleared by just “erasing cookies”.

HTML5 does add a few new possibilities, but they are also by design as easy to control, monitor and restrict as your browser (or third-party add-on) will allow. HTML5 storage mechanisms are bound to the host that created them making them easy to search/sift/manage as HTTP cookies. Much worse are some of the more obscure cookie methods (Flash Cookies, various history hacks). They don’t really provide any more of a privacy risk than what the browser already has been offering for the past decade.

To Shut Up The Geolocaiton Conspiracy Theorists

Before someone even attempts the “Geolocation API lets advertisers know my location” myth, lets get this out of the way. The specification explicitly states:

User agents must not send location information to Web sites without the express permission of the user. User agents must acquire permission through a user interface, unless they have prearranged trust relationships with users, as described below. The user interface must include the URI of the document origin [DOCUMENTORIGIN]. Those permissions that are acquired through the user interface and that are preserved beyond the current browsing session (i.e. beyond the time when the browsing context [BROWSINGCONTEXT] is navigated to another URL) must be revocable and user agents must respect revoked permissions.

Some user agents will have prearranged trust relationships that do not require such user interfaces. For example, while a Web browser will present a user interface when a Web site performs a geolocation request, a VOIP telephone may not present any user interface when using location information to perform an E911 function.

To my knowledge no user agent implements Geolocation without complying with these specifications. None.

No HTML5 Needed For Fingerprinting

Even if you do manage to wipe all the above storage locations, you’re still not untraceable. Browser fingerprinting is the idea that just your system configuration makes you unique enough to be traceable. This includes things like your browser version, platform, flash version, and various other bits of data plugins may additionally leak. The EFF recently did a rather impressive study to learn about the accuracy of this technique. Computers with Flash and Java installed sport 18.8 bits of entropy and result in 94.2% of browsers being unique in the EFF study [cite, pdf]. Of course their data was likely skewing towards more experienced web users who are more likely to have an assortment of customizations to their computer (specific plugins, more variety in web browsers, operating systems, fonts) than the average internet user. I’d wager that their data downplays the effectiveness of this technique.

The idea that HTML5 is a privacy risk is FUD. It doesn’t provide any worse security than anything else already out there. It’s actually easier to counteract than what’s already being used since it’s handled by the browser.

The Future

I still believe all browsers out there can do a much better job of protecting privacy when it comes to local data storage for the purpose of tracking. What I believe what needs to happen is web browsers need to start moving away from the “cookie manager” interfaces that are now a decade+ old and move towards a “my data management” interface that lets users view and delete more than just cookies. It needs to encompass all the storage methods listed above as supported by the browser. Hooks should also exist so that plug-ins that have data storage (like Flash) can also be dealt with using the same UI.

Additionally it needs to be possible to control retention policies per website. For example I should be able to let Google storage persist indefinitely, Facebook for 2 weeks, and Yahoo for the length of my browser session should I wish.

My personal preference would be for a website to denote the longest storage time for any object on a webpage in the UI. Clicking on it would give a breakdown of all hostnames that makeup the page, what they are storing and let the user select their own policy. With 2 clicks I could then control my privacy on a granular level. For example visiting SafePasswd.com would give me a [6] in the UI. Clicking would show me a panel this:

+------------------------------------------------------------------------------+
| My Data Settings for SafePasswd.com:                                         |
|                                                                              |
|  Host                        Longest Requested Lifespan    Your Choice       |
|                                                                              |
| *safepasswd.com              2 years                       [site default]    |
| googleads.g.doubleclick.net  6 years                       [browser session] |
|                                                                              |
|                                                                              |
|                                                       (Done)  (Cancel)       |
+------------------------------------------------------------------------------+

I could then override googleads.g.doubleclick.net to be for the browser session via the drop down if that’s what I wanted. I could optionally forbid it from saving anything if that’s what I wanted. I could optionally click-through for more detail or view the data to help me make my decision. Perhaps this would also be a good place for P3P like data to be available. One of the notable failures of P3P that impeded usage was it was never easy to view so it never caught on.

The browser would then remember I forbid googleads.g.doubleclick.net from storing data beyond my browser session. This would apply to googleads.g.doubleclick.net regardless of what website it was used on.

This model works better than the “click to confirm cookie” model that only a handful of people on earth ever had the patience for. It provides easy access to control and view information with minimal click-throughs.

It also makes a web page much more transparent to an end-user who could then easily see who they are interacting with when they visit one webpage with several ads, widgets, social media integration points etc.

One click to view data policies, two clicks to customize, three to save.

HTML5 is not a risk here. The web moving to HTML5 is like going from the lawless land to a civilized society where structure and order rule.

More On Facebook Places Privacy

Via NY Times:

“I like Foursquare because I can actually pick who sees where I actually am, compared to Facebook, where I have 1,200 friends,” she said. “I don’t want 1,200 people knowing where I am.” Facebook does let users pick a smaller subgroup of friends who can see location updates, but Ms. Lovelidge said it would be too much trouble to set that up.

Emphasis mine. This isn’t lost on Facebook. Zuckerberg himself said: “But guess what? Nobody wants to make lists”.

The problem is that for every Ms. Lovelidge who at least acknowledges the risk and avoids it, there will be 10 others completely oblivious to the risks.

One great lesson here is that you can’t change the paradigm and assume an old security model, in this case the “friends” network will continue to work. This is the equivalent to turning a store into a private residence without bothering to replace the open store front with a more traditional door.

Sharing Location With Strangers Via Facebook Places

Twice in a weeks time [1, 2] I’ve suggested that teens in particular have more “friends” than friends. AOL apparently did some of the research for me regarding the prevalence:

…more than half of the children surveyed (54%) don’t personally know all of the friends…

54% of teens surveyed don’t know all their “friends”. Facebook defaults the privacy settings on places to “friends”. 54% of children surveyed will likely be sharing their current location with people they don’t personally know. Places will catch on, especially once the check-in games start coming up and it becomes more fun and competitive. Half will likely share their location with people they don’t know.

Think about this for a second. Just a few years ago society would have found the idea of teenagers revealing their current location to people they don’t even personally know to be insanity.

It’s easy to fix, just setup a group and include/exclude as desired. The problem is awareness of the problem is low. Also problematic is the desire and patience to sort through several hundred “friends” and bucket people.

It would also be easy for Facebook to fix by forcing users to either select specific groups or individuals rather than just defaulting to the overly broad “friends”. They have the UI, and it’s actually pretty good (I’ve got some gripes, but they don’t apply to 99.9% of the population) they just don’t make users go through it for the sake of simplicity.

I don’t really like this.

More On Facebook “Friends” And Privacy

Last week when I wrote about the risks of Facebook Places I specifically said:

Decisions on who qualifies as a friend may have been made a few years ago when the risks were different and content being exposed was much less harmful. Letting a stranger see your obnoxious status update is different than letting them know where you are.

MG Siegler at TechCrunch just realized this himself and cut the number of friends he had in half. To quote:

Facebook is mutating. The problem is that the original social graph isn’t built for this mutation. And we’re going to see that very clearly with things like this new location element.

I’d argue MG Siegler is brighter and more in tune to this sort of thing than 90%+ of Facebook users. Perhaps 99%. If he just realized this now, it’s going to take a long time for the more casual user to catch on.

As I wrote last week, the term “friend” has been grossly distorted over the past few years. I strongly suspect the most at risk users are the ones who distorted it the most. Defaulting things like Places to “friends” isn’t good enough.

You’ll be seeing more about this in the press over the coming several months. This is going to get messy as people leak information they didn’t intend to.

Facebook “Simplistic” Privacy Settings Coming Soon

I’d be nothing but a jerk if I didn’t post this considering I’ve spent a fair amount of time criticizing Facebook’s privacy policies. Facebook head of public policy Tim Sparapani as quoted in Wired:

“Now we’ve heard from our users that we have gotten a little bit complex,” Sparapani said in a radio interview Tuesday. “I think we are going to work on that. We are going to be providing options for users who want simplistic bands of privacy that they can choose from and I think we will see that in the next couple of weeks.”

I can deal with public defaults provided it’s clear in the UI that the defaults are public and the user has an easy way to adjust privacy. What isn’t addressed is this policy of resetting things when changes are made. No comments on that as far as I can tell.

Why “The Geeks” Are Upset About Privacy

Pete Warden on why everyone should pay attention to “the geeks”:

So why are the geeks so upset? They’re looking down the road and imagining all the things that the bad guys will be able to do once they figure out what a bonanza of information is being released. Do you remember in the 90’s when techies were hating on Windows for its poor security model? That seemed pretty esoteric for ordinary people because it didn’t cause many problems in their day-to-day usage. The next decade was when those bad decisions about the security architecture became important, as viruses and malware became far more common, and the measures to prevent them became a lot more burdensome.

I’d recommend reading the entire article.

That might be the best argument I’ve seen in a while for people who just don’t get it. When you spend enough time dealing with data you’re forced to understand the threat models that can impact your work. You become very tuned into what the potential exploits are and how it can be used to everyone’s advantage, and disadvantage. Despite surveys that show people are “concerned” about their privacy, and some “use privacy settings” I’d venture very few, likely less than 10% actually understand what harm any piece of data can have, and how exactly it’s being handled and shared.

There’s a reason the industry is so focused on this lately. There’s a reason why I’ve now dedicated a majority of recent blog posts to it.