An Overview of WebRTC – the most disruptive HTML5 technology to date?
WebRTC - a fantastic tool for web app developers
WebRTC or Web Real-Time Communications is an emerging standard that is being currently being rolled out in most modern browsers.
WebRTC APIs allows you to stream audio, video and other data between browsers with the minimum of effort, while shielding the developer of the underlying complexity that is required to make this happen.
Push and Pull
If you think about it the world wide web is a classic example of a minimum viable product, early versions allowed pages to be created and text to be hyperlinked to other pages. Soon there were images, forms for user input and things started to get interesting.
One part of the web that has been developing slowly is 'push' mechanisms to compliment the existing 'pull'. By 'push' I really mean the ability for the server to push content to the browser without it being explicitly requested. The pull-only model system served us well, our old friend HTTP request was the very minimum we needed to 'make things work', it worked well and there is no real reason for many content sites to move from that model. However, as average connection speeds get faster and the web matures into the dominant platform not only for content but also for applications, there is a real need for real-time technologies.
Since the advent of the supercharged XMLHttpRequest we can simulate pushing by making repeated requests to the server until some state changes and new content is delivered. It's not difficult to imagine why this polling method becomes increasingly inefficient and less feasible as you approach real-time accuracy, although it works out better if only an approximation of real-time is required.
Then there's long-polling that relies on the server to hold open an HTTP request - ready for any data that the server might happen to pass down the line. Whole libraries have been written around that principal, but essentially -- like many of great things in life -- it's a hack.
Luckily, newer less-hacky solutions were proposed by the W3C and others in response to the need for real-time comms. Solutions such as the WebSocket API The promise of web sockets is that they gives us full duplex communication between browser and server which means that data can travel back and forth without consuming any http connections, in fact websockets uses TCP - a lower level protocol than HTTP. Stable and secure implementations are finally shipping in all major browsers and libraries are available for nearly all server-side set-ups. The WebSocket API works well for real-time comms when you are sending data to and from browser to server.
WebRTC - the break-down
So what exactly is WebRTC and how does it fit in? WebRTC stands for Web Real-Time Communications, part of HTML5 it is a W3C working draft and is just now starting to see adoption in most modern browsers.
WebRTC includes several unique features:
- Designed with browser-to-browser communication in mind.
- Handles live streaming of audio and video.
- Facilitates the streaming of any type of data.
I was lucky enough to be invited to the Open Video Conference a couple of years ago and as a result found myself in a room with representatives from all the major browser vendors (bar Microsoft) talking about how they could standardise peer-to-peer media streaming on the web. It was inspiring to say the least and a year later it's fantastic to see progress made and standards maturing. I have a personal interest in web-based media and that is very much a part of what WebRTC is about, although it is not restricted to media as it deals with binary formats it can facilitate any type of file sharing.
So the main applications for WebRTC are:
- Voice Calling / Video Chat
- P2P File sharing
- Inter-application comms
One thing to note is that it's not pure P2P some negotiation and brokerage on a server somewhere is required to be able to connect to other users. That said, apart from perhaps BitTorrent and similar there are very few pure P2P applications out there, even Skype which used to use supernodes has now replaced them with linux servers .
So let's take a look at the API which is pretty neatly split into three component parts.
- MediaStream API (aka getUserMedia) - for accessing devices such as a camera and mic
- PeerConnection API- for connecting client's audio/video streams
- DataChannel API - for the real-time P2P transfer of generic data
Sometimes known as getUserMedia, the MediaStream API allows you direct access to devices, currently the microphone and camera but in future this could include other devices. Audio and video are split into synchronised MediaStreamTracks. Effects can be applied to the video using canvas - check out this great demo from Paul Neave to see what's actually possible. I look forward to integration with the Web Audio API so that we can analyse and apply effects to the audio stream.
Getting access to devices such as cameras and microphones is of limited use if you don't connect those streams. The PeerConnection API allows you to do this. As touched on above, you will need a 'broker' in the cloud to be able to discover and connect to other peers, you can use your own WebSocket server, Google App Engine or something called SIP over WebSockets.
Thankfully the PeerConnection API makes this all relatively painless and quietly deals with the issues of lost packets and the general quality of streamed audio and video.
The full potential for this technology has yet to be realised, which is why it is very good news that the DataChannel API exists. The DataChannel API makes it possible to stream any data you can think of. It deals with any issues that might occur in the pipeline and includes features such a channel prioritisation, security and automatic congestion control, leaving the developer to focus on the application of that streamed data.
So what does all this mean our humble web app developer? Well, this technology is currently available in most modern desktop browsers and it's surely just a matter of time until this rolls-out on mobile. So we can start picking it up and experimenting right now.
But what can we do with it? WebRTC technology is very powerful, representing as it does a paradigm shift in the traditional client-server model, as such it is potentially hugely disruptive. Developers can look forward to building video and audio chat right into their apps and of course there's many possibilities for real-time multi-player games and anything that requires you to push data around. Although currently we may need to touch the server, the server's involvement can be minimal and so we have the building blocks for decentralised networks as we move towards a P2P web.
This is just a basic introduction to WebRTC, if you want to learn more I highly recommend you check out the following resources.
Real-Time Communication without Plugins
WebRTC is almost here, and it will change the web
Video Conferencing in HTML5: WebRTC via Web Sockets
Record audio using webrtc in chrome and speech recognition with websockets
Bowser - the First WebRTC enabled Browser for Mobile
This blog post has been written by Mark Boas