Back in 2008 I did a special segment in my “Secrets In Websites” series for the 2008 Presidential Elections. It was quite popular (almost crashed the server). I decided to do it again, but slightly revised for 2012.
The OkCupid blog has a pretty cool series of posts analyzing all sorts of data that they collect in anonymous ways. This time around they analyzed photos taken and tried to figure out trends behind what makes people attractive. Some of the more interesting things:
- Panasonic cameras are better than Nikon,
- Interchangable lens cameras (like digital SLRs) are better than basic point & shoot, which are better than camera phones.
- iPhone users have more sex,
- Flash makes you look older (obvious, harsh lighting is nobody’s friend).
- Shallow depth of field is a good thing,
- Late night, late afternoon photos are better.
In conclusion they say:
It’s actually not that hard. Use a decent camera. Go easy on the flash. Own the foreground. Take your picture in the afternoon. Then visit the nearest Apple store. Done.
Very interesting stuff. It’s a great example of what can be done with a large set of data. I’m guessing somewhere deep within Google and Facebook are a group of people doing even deeper analysis with a much larger set of data.
A bunch of folks on Planet Mozilla are running Wordle on their blogs. I can’t resist. My apologies to all who hate these memes. I’m doing a little bit of a twist though. The first is my blog, the 2nd is only the Mozilla related posts so that this is a bit more relevant to PMO.
I’ve always had a little fascination with this stuff. I’ve had tag clouds on my blog archives page for years now. It’s an interesting way to look at text. It gives you a good feel for the content of a large body of text in just a quick glance. Often better than any summary could.
Places, perhaps using the spell checking dictionary to pull words out of url’s for example. For obvious reasons the page title is easier. It would be interesting to be able to view your browsing history like this. I think it should take into account the number of times a word is found as well as the number of times you visit a page with a given word. The number of times a word is found should have a slightly higher weight. It could be implemented by using
<canvas/> as Benjamin Smedberg demonstrated in a similar exercise. The complexity here would likely be processing time. Anyone interested?