Categories
Internet Security

Why “The Geeks” Are Upset About Privacy

Pete Warden on why everyone should pay attention to “the geeks”:

So why are the geeks so upset? They’re looking down the road and imagining all the things that the bad guys will be able to do once they figure out what a bonanza of information is being released. Do you remember in the 90’s when techies were hating on Windows for its poor security model? That seemed pretty esoteric for ordinary people because it didn’t cause many problems in their day-to-day usage. The next decade was when those bad decisions about the security architecture became important, as viruses and malware became far more common, and the measures to prevent them became a lot more burdensome.

I’d recommend reading the entire article.

That might be the best argument I’ve seen in a while for people who just don’t get it. When you spend enough time dealing with data you’re forced to understand the threat models that can impact your work. You become very tuned into what the potential exploits are and how it can be used to everyone’s advantage, and disadvantage. Despite surveys that show people are “concerned” about their privacy, and some “use privacy settings” I’d venture very few, likely less than 10% actually understand what harm any piece of data can have, and how exactly it’s being handled and shared.

There’s a reason the industry is so focused on this lately. There’s a reason why I’ve now dedicated a majority of recent blog posts to it.

Categories
Google Internet

Google Should Use Google Wave Against Facebook

Help me Google; you're my only hopeGoogle should use Google Wave against Facebook.

It’s not as crazy as it sounds. I will be the first to say I was unimpressed by Google Wave from a user point of view. I should note Google Wave was pitched as an email alternative, and it’s not great at that job. The technical perspective was pretty impressive. It is however a potentially killer distributed social media network. It will take slight retooling to adjust it for the task, but it is already better suited to compete against Facebook than against email.

It’s actually a pretty good alternative if the UI were better tuned to the task. Allow me to explain:

It’s close feature wise

I won’t go into point after point, but Google Wave can carry out many of the same things that Facebook can. It’s a good way to communicate in an open or closed fashion and each wave can already be granular in terms of privacy. It can be used to share much more than text. It can be used for the purposes of photos or video. It can be extended by third parties utilizing its API. It already has chat support. It’s built on XMPP. It can easily parody Facebook in almost every way already. It can be extended to do what it can’t today. Profiles are the biggest thing it lacks. I suspect that wouldn’t take much to add in. I’m thinking an extendable XMPP vCard from the technical side.

It’s distributed

Google Wave is hosted by Google, but it’s also an open protocol and Google’s releasing chunks of their implementation. That means they can partner with other large companies (AOL, Yahoo, Microsoft, Apple etc.) who can federate and let their users all instantly be part of one huge social network. Users already have “friends” via their address books for email. Importing from other sources is easy, just look how Facebook did it. If Google got AOL, Yahoo, or Microsoft to partner join them they would overnight reach a huge chunk of the Internet population via their e-mail users.

For those who are going to try and argue that Facebook users don’t have email addresses, yes they do. It’s a primary method of notifying users of things other than SMS and is required to signup for an account.

This also means you can host yourself, or use the provider of your choice. Your not subject to Facebook deciding your fate, or any one company.

It would be more private

One of the primary gripes against Facebook is its privacy measures are inadequate. Facebook has motives to force people to be more public. There’s little incentive to help you stay private, since the alternatives are slim. With Google Wave being hosted by several providers they will need to give you more control, or you will just move to a provider that will give you the controls you want. Just like with email. By using your own domain to point to a provider you would have portability of your identity. Once again Google Wave by design is more granular than Facebook. It’s based already around the concept of sharing data. What Google Wave really needs is a robust profile implementation with granular permissions and the ability to bucket contacts to make permissions more manageable.

Despite its UI and marketing pitch, it’s a surprisingly close Facebook competitor.

It would be a healthier ecosystem

Like I mentioned before, Google Wave has a fairly decent API already. What is great about it is that providers would be pressured to provide a robust enough API so that the killer apps exist on their platform. Again, no more reliance on a single source. By standardizing at least a subset of the API developers can target multiple providers and implementors. It also means providers will need to allow for more granular controls over privacy settings for third-party apps or once again, people will be switching.

Google wins too – keeps them in the center of the universe

Google likes to be the center of things, especially information. By doing this Google would still be able to organize a users information in meaningful ways for them, which is really what Google Wave’s main goal for Google is. Google has a major win. Anyone a user trusts to index their information can do so. If the user is paranoid, they can keep totally private. If you really want to be private you could run it on your own private server. If you don’t trust Google, you can avoid them but still join the party.

It would be more permanent

Facebook is still not guaranteed to be around in 10 years. Email however is overwhelmingly likely to still be around. Just like newsgroups and IRC still have their place, even if they aren’t as mainstream anymore. Why? Because they are all open standards and not tied to one companies profitability. I can still find and read old newsgroup posts from over 20 years ago. Feel that confident about Twitter? Facebook? foursquare? How much time do you invest in them?

What about dispora or _______?

diaspora is a clever effort and a noble one getting a lot of press today. It really is. But I think it’s to complex for real widespread adoption, especially in the era of easy to use web apps. It’s true that users flocked to P2P apps despite complexity but that’s because of no alternatives with less overhead. I’d give most of these efforts a 5% chance of any real success.

StarWars is copyright Lucasfilm

Categories
Apple Internet

Opera Mini Approved For iPhone

I’ve yet to actually try it myself, but Opera Mini was approved today for the iPhone. While this is the first non-WebKit browser “on” the iPhone, it’s worth noting that the rendering engine isn’t actually on the phone. The rendering is done on a proxy server which is how they save bandwidth and increase performance.

Interesting, but I’d still like to see other rendering engines on the iPhone.

Categories
Internet Politics Security

Fourth Amendment In The Cloud

The Fourth Amendment in the United States Constitution reads:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

James Madison slipped up and failed to account for advancements in technology like computers and the Internet. Are digital files considered “papers and effects”? Is law enforcement copying files considered “searches and seizures”? If your files live on a server is that considered your “house”? Of course back in his day, this wasn’t even comprehensible. The amendment is a bit dated.

Electronic Communications Privacy Act (EPICA) was an effort in 1986 to clarify how such laws applied to electronic communications. It too is somewhat outdated and heavily focused on the transfer than the storage aspect, something the modern SaaS model has completely disrupted. It’s also been weakened and contradicted by court rulings and things like the Patriot Act.

This creates enough of a legal quagmire to concern a seemingly bizarre list of companies and organizations to form the Digital Due Process Coalition to revise and clarify these laws. For companies like Google and Microsoft it makes sense. Their business relies on making companies and individuals feel comfortable trusting them with personal data. They are also increasingly stuck in odd positions thanks to contradictory and untested laws.

The outcome of this will possibly be as long-lasting and as iconic as the fourth amendment itself. Given our culture, information, and way of life is becoming increasingly digital it will impact a large part of how we function and will function in years to come. For anyone working in IT, this will impact the way you do business.

Categories
Google Internet

Who Indexes Tweets

I was curious who is indexing the links that people tweet on Twitter. It’s obvious someone does since links get ‘clicks’ almost immediately after submission. To do this presumably they are tapping into the xmpp firehose.

Lets take a look:

66.xxx.xxx.xxx - - [06/Dec/2009:20:17:43 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I guess Google has a deal with Twitter. Googlebot indexed just a few seconds after it was sent. As far as I know nothing is actually announced. This is the first evidence I know of a potential deal of some sort. I’d be shocked if Google is scraping the site this quickly.

Edit: Stephen Duncan pointed out in the comments that this was announced in October. Totally forgot about that.

208.xxx.xxx.xxx - - [06/Dec/2009:20:17:47 +0000] "GET /test HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8"

This is Topsy, a twitter search engine. Never saw this site before. Few tests and I actually kind of like the output.

89.xxx.xxx.xxx - - [06/Dec/2009:20:17:58 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot"

Tweetmeme mines Twitter links and attempts to build a Digg-like index based on retweets rather than Diggs.

75.xxx.xxx.xxx - - [06/Dec/2009:20:18:05 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
72.xxx.xxx.xxx - - [06/Dec/2009:20:20:25 +0000] "GET /test HTTP/1.1" 301 - "-" "Python-urllib/2.5"

Can’t identify these AWS hosted services.

70.xxx.xxx.xxx - - [06/Dec/2009:20:20:53 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
70.xxx.xxx.xxx - - [06/Dec/2009:20:24:23 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

This is actually Microsoft. Microsoft’s Bing search engine indexes Twitter. I’m not sure why they indexed twice in such close intervals that seems odd for this day and age.

Mining logs a little deeper it looks like when tweets meet certain criteria (such as retweeted) there are other bots that spider them. It also looks like other search engines may be indexing at a slower rate (Baidu for example).

There are several others from AWS and a few other dedicated providers. These servers are obviously trying to keep a low profile, they don’t even have reverse DNS.

So there you go. Just a matter of seconds after a link hits Twitter this all happens.

Here’s a few more from another Tweet that weren’t in the first set:

Edit: More!:

75.xxx.xxx.xxx - - [06/Dec/2009:20:49:42 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; Feedtrace-bot/0.2; bot@feedtrace.com)"

Feedtrace is some sort of twitter mining service currently in beta.

67.xxx.xxx.xxx - - [06/Dec/2009:20:49:45 +0000] "GET /test HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)"

Chainn is a social data mining service with a few apps that make use of the data it collects.

Categories
Internet

Data Center Power Consumption

It’s hardly a secret that there is a serious demand for saving power in data centers. In a recent Times Magazine article:

Data centers worldwide now consume more energy annually than Sweden. And the amount of energy required is growing, says Jonathan Koomey, a scientist at Lawrence Berkeley National Laboratory. From 2000 to 2005, the aggregate electricity use by data centers doubled. The cloud, he calculates, consumes 1 to 2 percent of the world’s electricity.

To put that in a little more perspective, the 2009 census for Sweden puts the population at 9,263,872. Sweden’s population is just slightly higher than New York City (8,274,527 in 2007) or the state of New Jersey (8,682,661 estimate in 2008). Granted Sweden’s population density is 20.6/km2 compared to New York City’s 10,482/km2 or New Jersey’s 438/km2. Population density is important since that says a lot about energy consumption. Dense populations require less energy thanks to communal resources. I still suspect the average Swede uses less electricity than the average American anyway. All these numbers were pulled from Wikipedia.

The US Department of Energy does have data on power consumption and capacity as well as forecasts on consumption and production. The obvious downside in the data is the reliance on coal, oil and gas which have environmental impacts as well as political impacts and costs (we know about the instabilities of the oil market). This is why companies with lots of servers like Google are looking very carefully at power generation alternatives such as hydroelectric and solar.

We all benefit from data center efficiency. Lower cost computing is a big advantage to startups and encourages more innovation by removing price barriers. It’s also an advantage to the general public since the technology and tricks learned eventually trickle down to consumers. We already are seeing more efficient power supplies, some even beating the original 80 PLUS certification.

Perhaps if we started tracking “performance per watt” in addition to “watts per square foot” we’d be looking at things from a more sustainable perspective.

Data center capacity and consumption is pretty interesting when you look at all the variables involved. Growth, power costs, facility size, technology available, even foreign politics play a role in what it costs to operate.

Categories
Google Internet

Google Wave

Google Wave is a pretty impressive demo, and the fact that they are open sourcing most of it, documenting the protocol and enabling federation is a major win, but I’m hesitant to think it will replace email anytime soon, if ever.

John Gruber has a very interesting observation:

Communication systems that succeed are usually conceptually simple: the telegraph, the telephone, fax, email, IM, Twitter. So color me skeptical regarding Wave’s prospects.

A very valid point. Popular technical communication systems solve one communications problem. Attempts to solve more than one so far have failed. A good example is the video phone we were supposed to have in every home 20 years ago. Even today with cheap web cams, video and telephony is rare to combine and is seen as somewhat of a novelty.

Wave also has other limitations such as people who use Wave interacting with people who don’t. Most of the “wow” in Wave requires interacting with other Wave users. Pretty cool if everyone you communicate with is using Wave, but no so much if many/most of your contacts aren’t using Wave. How many people only communicate with others using the same mail provider? Google users never email Yahoo and Microsoft users?

Will Wave be adopted? For one thing it will change the business model of many email providers. Wave will be significantly more resource intensive than basic webmail and POP3 access (or IMAP for the rare few). One could argue spam has made Email somewhat resource intensive, but Email has more slack regarding expected latency since it’s not “real time”. Email is often given away with internet access, web hosting, or just as a freebie because providers know that email keeps users coming back and it’s extremely low cost to provide. It also retains users. For example lots of non-AOL users keep their AOL account just for the email.

Then there’s the issue of ownership. Group editing a wave sure sounds like fun, as it’s so wiki-like. All that collaboration is also a real boost for productivity, but it does have it’s downsides. Who owns that data? Obviously companies are going to be a bit concerned about this aspect. Email has the benefit of being rather concrete. Send, receive. Those are the only two functions supported. Replies are merely a copy of a previously received email with an appended response. Ad-hoc collaboration seems to create a new twist. The courts have also seen their share of email. Wave means new precedents and interpretation in the law. How many companies you think want to test that pool?

One thing the Google team said virtually nothing about was security.

Email was never designed to be secure. SMTP servers initially had no authentication anyone could send using any SMTP server. Auth was bolted on later on, and is still problematic (receive before send anyone?). Presumably since Wave is built on top of XMPP SSL will be we the encryption mechanism. But that’s only on the transport level between federated servers. What about end to end? Is an S/MIME like method supported? SSL to the user is a secure transport layer but doesn’t protect from interception by either server. Since it’s text you could use PGP and send a message, though you loose a lot of functionality and grace.

SPF is a hack for email origin verification. It works OK where it’s supported, but not everyone supports it from a provider or user perspective making it a pretty poor solution. Will Wave be utilizing EV-SSL? How about supporting verification from the actual user? S/MIME signature? Verifying identity is critical to being a successor to email. Both verifying the organization, and the user at the organization.

Lastly spam. How does Wave attempt to mitigate the spam problem? Sounds like one of the possibilities is a whitelist which doesn’t work in email, and is unlikely to work in a Wave. Unsolicited emails are good in many non-spammy situations. For example a friend emailing from a new address or another business discussing a partnership. Sure you can prompt each time to add to whitelist, but then the process itself becomes spam. Do you wish to add “buy-viagra-at-thebiggestviagrastoreintheworld.com@yahoo.com” to your whitelist? You get the picture. I’m sure there will be traditional filtering as well, but that still doesn’t solve the problem.

I think Wave has a chance, but it’s not a very high chance of success. There are a lot of barriers. Email is still the ultimate API for it’s ease of use and implementation. Email didn’t survive for so long because there was nobody willing to build something better. Email survived because it became the standard and worked in virtually all situations. It was simple enough for users and implementers alike.

I think it’s much more likely that concepts from Wave will end up elsewhere, rather than Wave replace email. Because of that, I’d call it a disruptive innovation.

Categories
Google Internet

Engineering Efficiency

Internet companies have the unique ability to scale quicker than any other industry on earth. Never before has a company been able to position itself from being nothing more than an idea to being in the living rooms of millions around the globe in a matter of hours. While this introduces seemingly unlimited opportunities to grow it also allows for exponential waste if a company isn’t careful. It’s interesting to see how they scale. Scaling businesses in many ways isn’t very different than scaling servers and software stacks.

The Classic Example: UPS

Started in 1907 and adopting the name United Parcel Service in 1919 UPS has no real “high tech” background unless you include the Ford Model T. That doesn’t mean it couldn’t become more efficient. UPS has made a science of the delivery business. For example it’s famous for it’s “no left” policy. Simply put they found that avoiding left turns means less time waiting at lights which means less fuel is wasted. The more efficient routing formerly done by humans now computerized saves them 3 million gallons of fuel in 2007 alone. Lets do the math:

Assuming they run 100% diesel at an average cost of $2.87/gallon in 2007 [doe] multiplied by 3 million that’s $8.61 million dollars by trying to avoid left turns.

Not bad for a souped up mapping application.

By having their drivers do things like turning of the ignition while unbuckling their seat belt at the same time, and scanning for the doorbell while walking towards the door (it’s easier to see from a distance than up close) they can shave time off of their routes.

Then of course there’s package tracking. While customers might like to know in what city their weight loss taps are sitting tracking systems help reduce loss and monitor package routing for optimal efficiency.

Cutting Utility Bills: Google

Being the largest search engine, a large ad network, email provider, analytics firm, mapping service, video site, and whatever else they do means Google needs a ton of servers. Cramming servers into data centers and keeping them cool to prevent hardware failures is a complicated and expensive task. Keeping the whole thing powered is also really expensive. Google has scrutinized server designs to eliminate all waste possible. This has resulted in Google having more horsepower at a lower cost than their competitors. Having more capacity at a lower cost means Google can do more at a lower cost than their competitors. I won’t discuss Google in too much detail since they did a great job themselves recently and I mentioned it the other day in another blog post: Google’s Data Center Secrets.

Shipping Efficiency: Amazon

Amazon’s long been improving efficiency by using data collection and analysis to encouraging their customers to spend more. Their algorithms to recommend related products you might be interested in is one of the best out there. Their ordering system is streamlined to prevent customers from bailing before completion. Their products are SEO’d to appear on the top of Google searches. That doesn’t mean Amazon can’t improve other parts of their business.

Amazon several months ago started a Frustration-Free Packaging program. Here’s how they describe it:

The Frustration-Free Package (on the left) is recyclable and comes without excess packaging materials such as hard plastic clamshell casings, plastic bindings, and wire ties. It’s designed to be opened without the use of a box cutter or knife and will protect your product just as well as traditional packaging (on the right). Products with Frustration-Free Packaging can frequently be shipped in their own boxes, without an additional shipping box.

The key here is “can frequently be shipped in their own boxes”. By shipping a box alone rather than packaging they can skip a step in their warehouses (and the packaging materials that go with packaging something for delivery). This also lowers the weight as those extra boxes don’t weigh 0 oz. The frustration free packaging is also the perfect shape for efficiently filling trucks and strong enough to not crush easily thus lowering returns due to damage.

Amazon now even has a feedback form [login required] for users to share what they think of their package. This has the added bonus of helping further reduce the inefficient shipping practices so common right now.

Amazon’s also done a significant amount of work on their infrastructure to make their servers scale well using tech such as EC2 and S3. By selling capacity to other companies they able to take advantage of economy of scale as well as diversify their business beyond just retail. Of course they are planning their data centers to have access to cheap power.

These aren’t haphazard attempts at increasing efficiency, these are well calculated engineered approaches to removing even the smallest inefficiencies with the knowledge of how they compound as operations scale. Aren’t they clever?

Categories
Internet

How To Build A Good Order/Shipment Notification Email

I buy a decent amount of stuff online, both physical goods and services from various vendors. It amazes me how few get the order confirmation and shipment notification emails right. Most companies do a downright awful job.

Order Confirmation

Order confirmations should be sent shortly after an order has been sent and the credit card has been accepted. It should contain the following information:

  • Order # – This should be in the subject as well as the body. Obvious.
  • Sanitized payment information – The last 4 digits of your credit card should be included, or other billing information.
  • Itemized order list – Each item, description, quantity, stock status (or est. date) should be listed in a table.
  • Shipping Address – Where is my stuff going?
  • Shipping Method – USPS? UPS? FedEx? Overnight? Ground?
  • Estimated ship date – If different things have a different shipping date, it should be per item, otherwise one date specified.
  • Contact Information – Email address, link to contact form, phone number to get in touch with store

Shipment Notification

A shipment notification should be sent for each day something ships. All shipments from all warehouses should appear in 1 email. For example if I order 3 things and it ships from 3 warehouses on 1 day, I expect 1 email. If I order 3 things and it ships over 2 days from any number of warehouses I expect 2 emails, one each day. If there is more than one package, I expect each to be listed in the email.

The email should contain the following:

  • Order # – Again, obvious
  • For each package it should tell me:
    • Delivery Address – Where is this package going?
    • Shipment Method – How the item is being mail (USPS, UPS, FedEx, overnight, ground etc.).
    • Estimated Arrival – When is this package expected to arrive?
    • Tracking Number – Tracking number to track package, there should be a direct link to shipper to track package NOT a page that requires you to login first. Most stores mess this up. I shouldn’t need to login. I just want the number. (Pro Tip: search Google with your tracking number for an quick direct link to the tracking status page).
    • Inventory – What is in this package? Should be an itemized list with quantity.
  • Contact Information – Email address, link to contact form, phone number to get in touch with store

It makes things so much easier the closer companies get to following this. Most companies get about 80% of the list, only a select few get this done correctly 100% of the time. The closest I’ve seen to date is ThinkGeek who has been pretty close to perfect every time I order.

Categories
Internet

User Generated Content Ownership

Since the creation of the <form/> elements people have been wondering about the ownership and copyright of content created online. From email and message boards in Web 1.0 to blogs and Twitter in Web 2.0 the same fundamental questions remain.

Lately, Twitter has been the focus. Twitter is actually pretty clear about it’s claims to user generated content:

  1. We claim no intellectual property rights over the material you provide to the Twitter service. Your profile and materials uploaded remain yours. You can remove your profile at any time by deleting your account. This will also remove any text and images you have stored in the system.
  2. We encourage users to contribute their creations to the public domain or consider progressive licensing terms.

It’s pretty clear that Twitter is taking a hands off approach, but it doesn’t let users decide what they want. I’m personally a fan of Creative Commons so my suggestion would be to let decide in their account settings how they wish to license and choose between CC licenses. That of course makes retweeting complicated to put it nicely (it’s more like a minefield). That’s likely the reason they avoid the licensing issue. Sure you can put some sort of an icon next to the tweet to indicate the licensing, but what if someone retweets it? Or modifies it ever so slightly? Is it a new tweet? How many characters must change for it to be a new one? This is where it gets murky.

Yahoo owned Flickr choose to solve this problem by letting users choose what copyright they want to impose, and include a Creative Commons option. A very graceful solution though admittedly their situation is much simpler than Twitter’s since they don’t have to deal with complexities like retweeting which would make things very complicated.

WordPress.com isn’t as clear in regards to it’s claims (or lack of) to copyright. Though they are far from locking people in considering you can delete stuff at any time and download your entire blog and move it elsewhere. Matt‘s been pretty open about giving users choice including the ability to leave WordPress.com. There is of course room for improvement to clarify their stance on copyright ownership.

Even Google has been criticized for copyright concerns regarding services like Google Docs.

They could adopt the Richard Stallman stance to “intellectual property” (his airquotes), though that would alienate at least as many as it attracts.

While Twitter might be the hot topic today it’s hardly a problem exclusive to Twitter. It’s an issue for virtually any site out there that accepts third party content. It gets more complicated when content can be remixed and redistributed.

The reality is people should know what rights they are giving up by putting content on these or any other services, but people rarely do. Perhaps a great Creative Commons project would be to create the same simplified icon/license system but for websites that allow users to submit content. The licenses would indicate what the impacts of the Terms of Service jargon are in plain English. It’s essentially the inverse of what they do now. Label the service as well as the content.

So what’s the best solution?