Web 2.0 I can’t hear you

There’s been a lot of talk about what seems to be called “Web 2.0” lately. It’s this new renaissance of browser wars, new dot com’s coming about, users contributing content (blogs, wiki’s), more fluid applications using AJAX, rich media over broadband, and all that good stuff. Personally I agree, we are at a great time for the Internet. I barely remember the last time it was this good. Ideas are flowing, and technology is advancing. But how far will it advance?

Using newly discovered (though not new) technologies like AJAX, it becomes possible to make a web page feel rather fluid. Almost to the point of a good client side application. Using something like SVG (or more likely Flash as SVG is still rather new) you can get enhance that even further. These are great. When put together nicely, you get this wonderful complete application. Well not really. Since very early on, computers gave audible feedback. Apparently we lost that in Web 1.0, and haven’t fixed that regression in “Web 2.0”. We leave it to plug ins like Flash, or QuickTime, but is that really appropriate? I will suggest it’s not. Audio has been rather closely integrated to computing since the beginning from those beeps computers made when keyboards really clicked as you typed. Auditory feedback is part of a complete application (that error beep when you do something wrong in an OS for example). We don’t have that on the web.

Innocent Proposal

Yes, I am aware the below proposal will upset some people, but hear me out before attacking.

I propose the web push to make OGG or find some other open solution to solve part of this problem. Pre recorded audio that’s compact and patent free so web application developers can provide audio feedback to user problems. OGG has been used by games such as Unreal for some time, so it’s proved to be adequate in quality. It would be perfect for things like voice overs, music, and other pre-defined audio purposes.

Secondly, there’s a need for what is essentially MIDIXML, MIDI in XML format. Something that could easily be generated by a server using JAVA, PHP, PERL, ASP, CF, or what ever language and transmitted. Since XML can be gziped, it could be compact (though a slight latency for gzip reasons). Easy for anyone to generate it would allow for much simpler creation of audio than ever before.

Bonus points for text-to-speech on the web, which would reinvent this whole thing to a new level (imagine using simple xml-like markup to present a human speaking, from within a web application). Combine that with AJAX and filling out your taxes on line could be designed in a way that would be usable. You could get explanations while you enter data, dynamic forms adjusting so you only see what you need to.

If these two formats were included in browsers like we now are seeing with CSS support that finally has started to come of age, Web 2.0 would essentially be able to replicate a client side experience, minus the graphical abilities, though flash can compensate for part of that. Sound isn’t just a frill, it’s partially accessibility. Audible feedback is a good thing. That’s why cars do it (in addition to that light on your dashboard), and aircraft as well “Pull Up!”. Even my cell phone is capable of audible feedback (key press sound, ringing, photo taking, etc.). Yet my computer can’t really do audio when online.

There is an annoyance factor of course (we all hate loud websites), but that could easily be compensated for by a good browser UI which could feature volume controls, including a mute capacity. Ideally plug ins would respect that setting so that the experience is clean and simple. Perhaps a way to have visual notification when audio is used if the user has it muted. This would mitigate the annoyance factor while providing for audible feedback.

Why not plug ins? Because they don’t standardize. We’d never get the penetration that you can get with standards. Look at video, there is still a complete lack of standards between players and codecs. Imagine if CSS was only available with a plug in. Do you think the entire web would download the CSS plug in? No, not likely. The penetration Flash has had is unique, and not likely to repeat itself, so that’s not even an argument. It’s one front the browser has no hand on. With video the browser at least has GIF support (which is on occasion used for things like webcams), it supports, images and text natively. But really no audio support.

Imagine a web application that could verbally explain a form to you (filing out taxes online?), or the ability to have a service like Gmail open in a tab, and get notification of a new message via audio. No javascript alert(). Imagine an online store with complete audio support (so far we really have only iTunes, which is proprietary).

Audio on the web has been misguided for a long time. I think Web 2.0 needs to address this. Audio is a part of computing.

The web is capable of so much, but it only touches 1 sense. If the web reaches 2 senses it doubles it’s potential. Perhaps in a few years I’ll be able to suggest SmellML or TouchML or TasteML.