Protecting Photo Privacy Via Browsers

Browsers can do more to protect users from inadvertently violating their own privacy. The NY Times today had an article about a topic that has been discussed in various circles several times now. The existence of geotagging data in photos. Many cameras, in particular smart phones like the iPhone can tag photos with GPS data. This is pretty handy for various purposes including organizing photos at a later date, iPhoto for example does a pretty nice job of it. Most photo applications however don’t make this information very visible, as a result many users don’t even know it exists, others simply forget.

What the problem looks like

The data, embedded in a photo looks something like this:

GPSLatitude                    : 57.64911
GPSLongitude                   : 10.40744
GPSPosition                    : 57.64911 10.40744

Which I could map.

Proposal

I propose that browsers need to have a content policy for when users upload images that can better protect them from uploading information they may not even realize. Here’s what I’m imagining:

The first time a user attempts to upload a photo that has EXIF or XMP data containing location they are prompted if they want it stripped from the image they are uploading. The original file remains unharmed, just the uploaded version won’t have the data. They can also choose to have the browser remember their preference to prevent being prompted in the future. They can revise their choice in the preferences window later if they want. This isn’t to different from how popups are handled. I thnk that per-site policy might be too confusing and not warranted, but perhaps I’m wrong.

Warning users about hidden information they may be revealing is a worthwhile effort. It’s only a matter of time before someone uses a “contest” or some other form of social engineering to solicit pictures that may reveal location data for users. Evildoers always find creative ways to exploit people.

Caveat

There are a notable caveat to this approach. The most notable is that flash uploaders would bypass this security measure though individual uploaders could do it themselves, or Adobe could do it, but I don’t think that’s enough of a turnoff to this approach. The same caveat applied to “private browsing” in browsers.

Prior Work

As far as I know no browser actually implements a security feature like this yet. There are a few Firefox Add-ons like Exif Viewer and FxIF (both written in pure JavaScript) that look at EXIF data but nothing that intercepts uploads.

Who Can Do It First?

I’m curious who can do it first. By add-on (seems like it should be possible at least in Firefox), and dare I say include in a browser itself? If this were earlier in the year I would have added this to the Summer of Code ideas list. Instead I’m just throwing it into the wind until 2011 rolls around.

The Myth Of The “Internet Generation”

I’m glad to see more evidence dispelling the myth that there is a generation that is so tuned into the Internet they put others to shame. There’s no such thing as “digital natives” or an “Internet generation”. There were no “automotive natives”, “electric natives”, “movable type natives”, or any other “[insert technical revolution] natives” in human history.

I do suspect the results will be slightly different in the US where social and cultural differences tend to result in more online usage and less social interaction than in Europe, but that is still immaterial.

The idea that any skills can become almost innate is silly at best. Growing up with something doesn’t make you more functional with it. Human language is a very specialized skill. Equating language to using a search engine, managing files on your computer is hardly sensible. Language is so specialized science suggests there are parts of our brain that have evolved specifically for language. By contrast we don’t have a part of our brain for computer or internet skills.

From the article:

More surprising yet, these supposedly gifted netizens are not even particularly adept at getting the most out of the Internet. “They can play around,” says Rolf Schulmeister, an educational researcher from Hamburg who specializes in the use of digital media in the classroom. “They know how to start up programs, and they know where to get music and films. But only a minority is really good at using it.”

It’s not really surprising. People learn to do what the need and want to do. They also know how to shop in stores, and watch movies in theaters. That doesn’t mean they know how to produce products and movies. They don’t even become experts in products they buy or movies they watch. They became consumers over a new medium and nothing more. They couldn’t tell you what format the YouTube video they watched is in, and they likely don’t know what format it’s in at the theater either. They don’t know the technology behind their favorite websites, and they don’t know what goes into running a store. They just consume. That’s what consumers do.

A special segment with an interest in it will specialize in it and learn. Those people generally become Computer Science majors. Others will choose things like medicine, geology, marketing, economics and basket weaving to gain a thorough understanding of and eventually use for gainful employment. None of these fields had “natives” either despite having their own renaissance periods.

So lets stop with this “Internet Generation” stuff.

DNSSEC Root Key

Bruce Schneier pointed out that DNSSEC root key has been divided among seven people for security:

Part of ICANN’s security scheme is the Domain Name System Security, a security protocol that ensures Web sites are registered and “signed” (this is the security measure built into the Web that ensures when you go to a URL you arrive at a real site and not an identical pirate site). Most major servers are a part of DNSSEC, as it’s known, and during a major international attack, the system might sever connections between important servers to contain the damage.

A minimum of five of the seven keyholders – one each from Britain, the U.S., Burkina Faso, Trinidad and Tobago, Canada, China, and the Czech Republic – would have to converge at a U.S. base with their keys to restart the system and connect eveything once again.

Based on this key signing video it looks like they are using smart cards and an AEP Keyper HSM for this critical task. Schneier suspects it implements the Shamir’s Secret Sharing algorithm.

Considering how much our economy and our lives rely on the Internet these days, DNS is becoming a more and more critical part of our society. This is a very big event. No precaution is too great to ensure security of such critical infrastructure.

Internet Under The Sea

Most people these days seem to insist that all telecom (phone, internet in particular) go overseas by way of satellites. This however is far from the truth. There are actually many trans-oceanic cables and they provide most of the capacity. Mail Online has a great article about what goes into keeping the continents connected.

It’s a whole secret industry, partly for security reasons who in a low-key way keep critical communication between entire continents moving. Most people aren’t even looking at the water, they look up at the sky.

Developing A Thick Skin Is Bullshit

From Sarah Lacy’s TechCrunch post Conan O’Brien’s Love/Hate Relationship with the Internet is a very interesting insight from Coco:

O’Brien said the biggest thing that held him back from both writing and performing was a fear of being criticized because he’s incredibly sensitive. He punched a big hole in one of the biggest clichés in fame—that you just have to develop a thick skin. He says he’s still just as sensitive and criticism still hurts just as much. The secret is to just keep going anyway, because you will get criticized no matter how brilliant you are.

This is clearly something that’s gotten more pronounced in a Web age, but there may be a silver lining to that. In a time when every video, photo, blog post and Tweet can easily be trashed by others, people learn that criticism is inevitable early on.

The sensitivity to criticism I suspect is a motivator to O’Brien himself and I’m certain to others. It’s hard for it to not have an impact, even on the most jaded of individuals.

I think that has become true for anyone in any industry, not just night show hosts. To quote a movie title “We Live In Public” thanks largely to the Internet.

I learned long ago to take criticism and praise with a grain of salt. Neither are always genuine or accurate, especially on the Internet. Anonymity does make people more aggressive, but it also sometimes makes them more honest. Most people have a hard time giving criticism to you’re face even when asked to do so. It’s almost too easy when you can type it from a distance. Praise can have many false motivators that can often be hard to detect online.

It’s never “fun” to get a really nasty or critical blog comment, email, article, blog post written etc. about you or something you’ve done. This is especially true if you’ve dedicated a lot of time and effort. Regardless at some point you need to ask yourself: Is there any truth to this? Can I do something better? Once you’ve done that, it’s time to move on and plow forward. You won’t always be able to learn something, but that’s OK.

From my experience being a web developer, writing code that’s open source, and blogging means you’re going to get feedback, welcome or not. With certainty 100% of it will not be positive. Some will be negative, some will just be vile. Some however will be constructive. It’s to your advantage to use it.

Criticism and debate are a critical part of academia. Given programming’s still strong ties to academia, perhaps more so than many other industries may be why it seems so natural to accept for me and at least some of my peers. It’s like being graded in school. Or perhaps it’s because I started at about 14 or 15 years old as a developer. I got used to this sort of thing pretty early on during my formative teen years and it’s more natural for me. I guess I’m not really a true part of the trophy kids generation despite falling into the age group.

Why “The Geeks” Are Upset About Privacy

Pete Warden on why everyone should pay attention to “the geeks”:

So why are the geeks so upset? They’re looking down the road and imagining all the things that the bad guys will be able to do once they figure out what a bonanza of information is being released. Do you remember in the 90’s when techies were hating on Windows for its poor security model? That seemed pretty esoteric for ordinary people because it didn’t cause many problems in their day-to-day usage. The next decade was when those bad decisions about the security architecture became important, as viruses and malware became far more common, and the measures to prevent them became a lot more burdensome.

I’d recommend reading the entire article.

That might be the best argument I’ve seen in a while for people who just don’t get it. When you spend enough time dealing with data you’re forced to understand the threat models that can impact your work. You become very tuned into what the potential exploits are and how it can be used to everyone’s advantage, and disadvantage. Despite surveys that show people are “concerned” about their privacy, and some “use privacy settings” I’d venture very few, likely less than 10% actually understand what harm any piece of data can have, and how exactly it’s being handled and shared.

There’s a reason the industry is so focused on this lately. There’s a reason why I’ve now dedicated a majority of recent blog posts to it.

Google Should Use Google Wave Against Facebook

Help me Google; you're my only hopeGoogle should use Google Wave against Facebook.

It’s not as crazy as it sounds. I will be the first to say I was unimpressed by Google Wave from a user point of view. I should note Google Wave was pitched as an email alternative, and it’s not great at that job. The technical perspective was pretty impressive. It is however a potentially killer distributed social media network. It will take slight retooling to adjust it for the task, but it is already better suited to compete against Facebook than against email.

It’s actually a pretty good alternative if the UI were better tuned to the task. Allow me to explain:

It’s close feature wise

I won’t go into point after point, but Google Wave can carry out many of the same things that Facebook can. It’s a good way to communicate in an open or closed fashion and each wave can already be granular in terms of privacy. It can be used to share much more than text. It can be used for the purposes of photos or video. It can be extended by third parties utilizing its API. It already has chat support. It’s built on XMPP. It can easily parody Facebook in almost every way already. It can be extended to do what it can’t today. Profiles are the biggest thing it lacks. I suspect that wouldn’t take much to add in. I’m thinking an extendable XMPP vCard from the technical side.

It’s distributed

Google Wave is hosted by Google, but it’s also an open protocol and Google’s releasing chunks of their implementation. That means they can partner with other large companies (AOL, Yahoo, Microsoft, Apple etc.) who can federate and let their users all instantly be part of one huge social network. Users already have “friends” via their address books for email. Importing from other sources is easy, just look how Facebook did it. If Google got AOL, Yahoo, or Microsoft to partner join them they would overnight reach a huge chunk of the Internet population via their e-mail users.

For those who are going to try and argue that Facebook users don’t have email addresses, yes they do. It’s a primary method of notifying users of things other than SMS and is required to signup for an account.

This also means you can host yourself, or use the provider of your choice. Your not subject to Facebook deciding your fate, or any one company.

It would be more private

One of the primary gripes against Facebook is its privacy measures are inadequate. Facebook has motives to force people to be more public. There’s little incentive to help you stay private, since the alternatives are slim. With Google Wave being hosted by several providers they will need to give you more control, or you will just move to a provider that will give you the controls you want. Just like with email. By using your own domain to point to a provider you would have portability of your identity. Once again Google Wave by design is more granular than Facebook. It’s based already around the concept of sharing data. What Google Wave really needs is a robust profile implementation with granular permissions and the ability to bucket contacts to make permissions more manageable.

Despite its UI and marketing pitch, it’s a surprisingly close Facebook competitor.

It would be a healthier ecosystem

Like I mentioned before, Google Wave has a fairly decent API already. What is great about it is that providers would be pressured to provide a robust enough API so that the killer apps exist on their platform. Again, no more reliance on a single source. By standardizing at least a subset of the API developers can target multiple providers and implementors. It also means providers will need to allow for more granular controls over privacy settings for third-party apps or once again, people will be switching.

Google wins too – keeps them in the center of the universe

Google likes to be the center of things, especially information. By doing this Google would still be able to organize a users information in meaningful ways for them, which is really what Google Wave’s main goal for Google is. Google has a major win. Anyone a user trusts to index their information can do so. If the user is paranoid, they can keep totally private. If you really want to be private you could run it on your own private server. If you don’t trust Google, you can avoid them but still join the party.

It would be more permanent

Facebook is still not guaranteed to be around in 10 years. Email however is overwhelmingly likely to still be around. Just like newsgroups and IRC still have their place, even if they aren’t as mainstream anymore. Why? Because they are all open standards and not tied to one companies profitability. I can still find and read old newsgroup posts from over 20 years ago. Feel that confident about Twitter? Facebook? foursquare? How much time do you invest in them?

What about dispora or _______?

diaspora is a clever effort and a noble one getting a lot of press today. It really is. But I think it’s to complex for real widespread adoption, especially in the era of easy to use web apps. It’s true that users flocked to P2P apps despite complexity but that’s because of no alternatives with less overhead. I’d give most of these efforts a 5% chance of any real success.

StarWars is copyright Lucasfilm

Opera Mini Approved For iPhone

I’ve yet to actually try it myself, but Opera Mini was approved today for the iPhone. While this is the first non-WebKit browser “on” the iPhone, it’s worth noting that the rendering engine isn’t actually on the phone. The rendering is done on a proxy server which is how they save bandwidth and increase performance.

Interesting, but I’d still like to see other rendering engines on the iPhone.

Fourth Amendment In The Cloud

The Fourth Amendment in the United States Constitution reads:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

James Madison slipped up and failed to account for advancements in technology like computers and the Internet. Are digital files considered “papers and effects”? Is law enforcement copying files considered “searches and seizures”? If your files live on a server is that considered your “house”? Of course back in his day, this wasn’t even comprehensible. The amendment is a bit dated.

Electronic Communications Privacy Act (EPICA) was an effort in 1986 to clarify how such laws applied to electronic communications. It too is somewhat outdated and heavily focused on the transfer than the storage aspect, something the modern SaaS model has completely disrupted. It’s also been weakened and contradicted by court rulings and things like the Patriot Act.

This creates enough of a legal quagmire to concern a seemingly bizarre list of companies and organizations to form the Digital Due Process Coalition to revise and clarify these laws. For companies like Google and Microsoft it makes sense. Their business relies on making companies and individuals feel comfortable trusting them with personal data. They are also increasingly stuck in odd positions thanks to contradictory and untested laws.

The outcome of this will possibly be as long-lasting and as iconic as the fourth amendment itself. Given our culture, information, and way of life is becoming increasingly digital it will impact a large part of how we function and will function in years to come. For anyone working in IT, this will impact the way you do business.

Who Indexes Tweets

I was curious who is indexing the links that people tweet on Twitter. It’s obvious someone does since links get ‘clicks’ almost immediately after submission. To do this presumably they are tapping into the xmpp firehose.

Lets take a look:

66.xxx.xxx.xxx - - [06/Dec/2009:20:17:43 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I guess Google has a deal with Twitter. Googlebot indexed just a few seconds after it was sent. As far as I know nothing is actually announced. This is the first evidence I know of a potential deal of some sort. I’d be shocked if Google is scraping the site this quickly.

Edit: Stephen Duncan pointed out in the comments that this was announced in October. Totally forgot about that.

208.xxx.xxx.xxx - - [06/Dec/2009:20:17:47 +0000] "GET /test HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8"

This is Topsy, a twitter search engine. Never saw this site before. Few tests and I actually kind of like the output.

89.xxx.xxx.xxx - - [06/Dec/2009:20:17:58 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot"

Tweetmeme mines Twitter links and attempts to build a Digg-like index based on retweets rather than Diggs.

75.xxx.xxx.xxx - - [06/Dec/2009:20:18:05 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
72.xxx.xxx.xxx - - [06/Dec/2009:20:20:25 +0000] "GET /test HTTP/1.1" 301 - "-" "Python-urllib/2.5"

Can’t identify these AWS hosted services.

70.xxx.xxx.xxx - - [06/Dec/2009:20:20:53 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
70.xxx.xxx.xxx - - [06/Dec/2009:20:24:23 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

This is actually Microsoft. Microsoft’s Bing search engine indexes Twitter. I’m not sure why they indexed twice in such close intervals that seems odd for this day and age.

Mining logs a little deeper it looks like when tweets meet certain criteria (such as retweeted) there are other bots that spider them. It also looks like other search engines may be indexing at a slower rate (Baidu for example).

There are several others from AWS and a few other dedicated providers. These servers are obviously trying to keep a low profile, they don’t even have reverse DNS.

So there you go. Just a matter of seconds after a link hits Twitter this all happens.

Here’s a few more from another Tweet that weren’t in the first set:

Edit: More!:

75.xxx.xxx.xxx - - [06/Dec/2009:20:49:42 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; Feedtrace-bot/0.2; bot@feedtrace.com)"

Feedtrace is some sort of twitter mining service currently in beta.

67.xxx.xxx.xxx - - [06/Dec/2009:20:49:45 +0000] "GET /test HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)"

Chainn is a social data mining service with a few apps that make use of the data it collects.