Categories
Internet Mozilla Web Development

Web 2.0 I can’t hear you

There’s been a lot of talk about what seems to be called “Web 2.0” lately. It’s this new renaissance of browser wars, new dot com’s coming about, users contributing content (blogs, wiki’s), more fluid applications using AJAX, rich media over broadband, and all that good stuff. Personally I agree, we are at a great time for the Internet. I barely remember the last time it was this good. Ideas are flowing, and technology is advancing. But how far will it advance?

Using newly discovered (though not new) technologies like AJAX, it becomes possible to make a web page feel rather fluid. Almost to the point of a good client side application. Using something like SVG (or more likely Flash as SVG is still rather new) you can get enhance that even further. These are great. When put together nicely, you get this wonderful complete application. Well not really. Since very early on, computers gave audible feedback. Apparently we lost that in Web 1.0, and haven’t fixed that regression in “Web 2.0”. We leave it to plug ins like Flash, or QuickTime, but is that really appropriate? I will suggest it’s not. Audio has been rather closely integrated to computing since the beginning from those beeps computers made when keyboards really clicked as you typed. Auditory feedback is part of a complete application (that error beep when you do something wrong in an OS for example). We don’t have that on the web.

Innocent Proposal

Yes, I am aware the below proposal will upset some people, but hear me out before attacking.

I propose the web push to make OGG or find some other open solution to solve part of this problem. Pre recorded audio that’s compact and patent free so web application developers can provide audio feedback to user problems. OGG has been used by games such as Unreal for some time, so it’s proved to be adequate in quality. It would be perfect for things like voice overs, music, and other pre-defined audio purposes.

Secondly, there’s a need for what is essentially MIDIXML, MIDI in XML format. Something that could easily be generated by a server using JAVA, PHP, PERL, ASP, CF, or what ever language and transmitted. Since XML can be gziped, it could be compact (though a slight latency for gzip reasons). Easy for anyone to generate it would allow for much simpler creation of audio than ever before.

Bonus points for text-to-speech on the web, which would reinvent this whole thing to a new level (imagine using simple XML-like markup to present a human speaking, from within a web application). Combine that with AJAX and filling out your taxes on line could be designed in a way that would be usable. You could get explanations while you enter data, dynamic forms adjusting so you only see what you need to.

If these two formats were included in browsers like we now are seeing with CSS support that finally has started to come of age, Web 2.0 would essentially be able to replicate a client side experience, minus the graphical abilities, though flash can compensate for part of that. Sound isn’t just a frill, it’s partially accessibility. Audible feedback is a good thing. That’s why cars do it (in addition to that light on your dashboard), and aircraft as well “Pull Up!”. Even my cell phone is capable of audible feedback (key press sound, ringing, photo taking, etc.). Yet my computer can’t really do audio when online.

There is an annoyance factor of course (we all hate loud websites), but that could easily be compensated for by a good browser UI which could feature volume controls, including a mute capacity. Ideally plug ins would respect that setting so that the experience is clean and simple. Perhaps a way to have visual notification when audio is used if the user has it muted. This would mitigate the annoyance factor while providing for audible feedback.

Why not plug ins? Because they don’t standardize. We’d never get the penetration that you can get with standards. Look at video, there is still a complete lack of standards between players and codecs. Imagine if CSS was only available with a plug in. Do you think the entire web would download the CSS plug in? No, not likely. The penetration Flash has had is unique, and not likely to repeat itself, so that’s not even an argument. It’s one front the browser has no hand on. With video the browser at least has GIF support (which is on occasion used for things like webcams), it supports, images and text natively. But really no audio support.

Imagine a web application that could verbally explain a form to you (filing out taxes online?), or the ability to have a service like Gmail open in a tab, and get notification of a new message via audio. No javascript alert(). Imagine an online store with complete audio support (so far we really have only iTunes, which is proprietary).

Audio on the web has been misguided for a long time. I think Web 2.0 needs to address this. Audio is a part of computing.

The web is capable of so much, but it only touches 1 sense. If the web reaches 2 senses it doubles it’s potential. Perhaps in a few years I’ll be able to suggest SmellML or TouchML or TasteML.

6 replies on “Web 2.0 I can’t hear you”

Great ideas, audio seems like the natural followup to SVG. In fact, I think the need will become obvious once rich SVG apps start appearing. I like the idea of rolling over an audio icon to hear a talk about how the feature works. Or even a popup multimedia tutorial.

How can we get Audio (and and Video) into the Browser ? There is no HTML tag for that yet. Do you accept a patch for playback of speex and vorbis in an ogg container (audio functions accessable via Javascript). I am currently experimenting with a browser plugin for audio/video. It is based on the http://www.wigiwigi.com Videophone project. (Ashod Apakian, the guy behind http://www.wigiwigi.com is commited to standards, open source and AFAIK his technologies are far more advanced than anything else in the audio/video field !!)

I absolutely agree that we need sound. I have this problem all the time, specially with my Javascript games. They are completely silent – and I use Linux as a development platform, so there’s no Flash available to make my sound.

Once upon a time there was BGSOUND, but now all we have is Embed, and by default, firefox/mozilla has no plugin to handle .wav…

SVG is also built on top of Synchronized Multimedia Integration Language (SMIL), which is (XML) designed for coordinating images, movies, text, and audio.

http://www.w3.org/AudioVideo/

OS X has good text-to-speech capabilities built in, but there are also web services for this.

http://www.naturalvoices.att.com/demos/
http://www.research.att.com/projects/tts/demo.html

Someone already pointed out VoiceXML, but the cover pages also list SpeechML, CallML, VoxML, and the W3C has a whole section devoted to various voice proposals for browsers.

http://xml.coverpages.org/xmlApplications.html
http://www.w3.org/Voice/

Again, from the Cover Pages, there are many XML specs for music, some which explicitly tie in with MIDI.

http://xml.coverpages.org/xmlMusic.html

Outside of XML specs, there is the text format ABC for marking up music, which can be played directly, converted to sheet music, etc.

http://staffweb.cms.gre.ac.uk/~c.walshaw/abc/

Of course, that’s not all of the media possibilities. The W3C activity on multimodal annotations, including “ink” was linked previously, but the X3D (3D for the web, or VRML in XML) consortium have been making solid progress and there are open-source players available.

http://www.web3d.org/

Good article. I think all of this is coming, but look how long it took just to get SVG in the browser.

Leave a Reply

Your email address will not be published. Required fields are marked *