Over the past few years, the world has drastically changed its approach in terms of business by handling everything via online. Among so many impactful criteria’s, digital real-time communication is the one that has grabbed a lot of attention, whether it could be video chatting apps or online chat platforms.
Today, businesses are trying to take advantage of this upgrading technology to connect with their customers; ordinary people are using them to chat with their loved ones. This has eventually made this video chat niche the all-time super lucrative one.
Let’s start with some fundamentals of any video chatting app.
All About A Webrtc Video Chatting App
Before moving ahead on with the facts to make a video chat app, it is important to have some understanding as to how an e-conferencing app differs from a simple chat app.
Let’s start with a simple chatting web app – Well, every chatting web app consists of two browsers through which the messages can be sent and received. Moreover, they typically need a server in between to have co-ordination to send across the messages. This process is considered to be a time consuming one since there is no direct interaction among the two users. However, there is an active need for a browser to communicate.
Well, this is not the case when it comes to video calling apps. But still, let’s assume – what if this was the case, imagine how could the communication be?
Of course – the most horrible one wherein, the receiver will hear the voice after 5 sec.
Now, since we have got some insight with the regular concepts of these communication modes. Let’s see more on video conferencing. Yes! It is the one where there is a need for browsers to have real-time communication. Here, the server will be removed and instead WebRTC will be used.
What is WebRTC?
WebRTC (Web Real-time Communication) is an open source framework that provides web browsers and mobile applications with real-time communication using APIs. It permits a peer-to-peer communication without any server in between. It allows a direct exchange of audio, video and chat data.
With WebRTC, the role of the server is very limited, i.e., it simply supports the two browsers or peers to discover each other to connect directly. However, if you ever plan to build any video chat app from scratch without WebRTC, then you are suppose to do a lot of framework as there are chances to across certain typical issues that involves,
- Drop in connections
- Loss of data
- NAT traversal
- Echo cancellation
- Dynamic jitter buffering
- Automatic gain control
- Noise reduction and suppressions
- Bandwidth adaptivity
But with WebRTC all of these above comes, with built-in into the browser. Here, this technology doesn’t even require any plugins or third-party software. The WebRTC handles everything automatically. Moreover, being an open-source all its source codes are available for free at https://webrtc.org/
However, WebRTC is supported by major browsers like Firefox, Bing, Chrome, etc., WebRTC has its video calling APIs available for developers, which needs to be checked for compatibility before moving ahead, as whether they support your existing version or not. Let’s have a look at some of the WebRTC APIs!
WebRTC Video calling APIs “A Need for Real-time Communication”
In general, WebRTC video chatting app consists of a lot of interrelated APIs and protocols that work together in making real time communications a success. Some of the important APIs are,
- getUserMedia(): it captures audio and video files
- MediaRecorder: record audio and video files
- RTCPeerConnection: stream audio and video between users
- RTCDataChannel: stream data among the users
Signaling “the Connecting mode”
Signaling is one of the most important concepts, wherein before communication the two peers must know the information about each other to connect. These information includes,
- The update about the presence of any other peer for communication
- Network data like peer’s IP address and port
- Session-control messages – used to open and end up communication
- Error messages
- Media metadata, this includes codecs, codec settings, bandwidth, and media types
- Key data that are needed for secure connections
These information’s are known as metadata that are the must for any direct connection to take place. For signaling the availability of a server is a must.
The mechanism of signaling is used to initiate initial communication between the two browsers wherein they can discover other available peers and share the information that is needed to create a direct connection among them. This signaling mechanism is used until the establishment of direct connection
Session Description Protocol
This is a format that describes multimedia communication sessions for announcements and invitations. It supports the streaming media application that includes VoIP and video conferencing. Here, the signaling methods and protocol are not specified by WebRTC. We have to build it by ourselves.
As already known, WebRTC requires two peers as offers and answers to have a data exchange; these Session Description Protocol (SDP) formats are needed to communicate.
The Session Description Protocol format seems like the below:
- o=- 7614219274584779017 2 IN IP4 127.0.0.1
- t=0 0
- a=group:BUNDLE audio video
- a=msid-semantic: WMS
- m=audio 1 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126
- c=IN IP4 0.0.0.0
The above are the codes that are automatically created by WebRTC according to the audio/video device which is available on your laptop/PC.
Since, you have got a better understanding about important concepts of WebRTC from the above with regards to signaling, the variety of APIs and the session description protocols.
Let’s move-on as to how all these work together to make a successful connection.
Let’s Explore the Working Strategy of WebRTC Application!
Before getting into the details of the WebRTC video chat application’s work process, it’s better to have some knowledge about IP Addresses and PORTS as they are the base on which the entire structure stands on.
To start with, IP address is the identification number provided to each device that is connected to the internet whereas the Port number, specifies the process through which an internet or other network message will be forwarded from one end to another.
Moreover, the port number is majorly used so as the data can be directed to the current location within the device. However, in general each device which is connected to the internet has an IP Address and Port, typically 65,536.
To begin with the working process and APIs:
These RTCPeerConnection APIs and signaling are all about offer, answer, and candidate. Let’s see in detail
As already being discussed, the RTCPeerConnection API for WebRTC is used to stream audio and video between users. The signaling works together with RTCPeerConnection and establishes a direct connection among the browsers.
Moving ahead ,let’s have a look at how the entire process of RTCPeerConnection is carried over. To begin with, this process involves two steps,
- The use of metadata – Ascertain the local video and audio media conditions that is to send the data via signaling
- And another one, to get potential network addresses to host the app
Once the local voice and video data like resolution and codec capabilities has been ascertained, it should be exchanged with a signaling mechanism using remote browsers.
Let’s understand the scenario with an example. For instance, imagine there are two users ‘X’ and ‘Y.’ If suppose, X calls Y – then there is a possibility that the below steps will take place in the media conditions when they both share the information,
- X will create RTCPeerConnection object
- X will create an offer with RTCPeerConnection createoffer() method
- Now, X calls setLocalDescription() to set the created offer as the description of local media
- Then X makes the offer using signaling mechanism to send the same to Y
- Y calls setRemoteDescription() with X’s offer, so that his RTCPeerConnection can be known of all X’s set up
- Now, Y calls createAnswer() depending upon the X’s data. Thus,the success callback function for this is generated with Y’s answer
- Y set X’s answer as the local description by calling upon setLocalDescription()
- Y then uses the signaling mechanism to send her the answer via signal
- X sets Y’s answer as the remote session description with setRemoteDescription
Now, with this X and Y will also exchange the network information. Here, the above specified expression “finding candidate” talks about the process of finding network interfaces and ports using the ICE framework.
- Once all the above procedure has been done X creates an RTCPeerConnection object with an onIcecandidate handler
- This handler will be called only when the network candidates are available
- In the handler, X sends signal candidate data to Y via their signal mechanism
- And when Y get a candidate message from X, then Y will call addIceCandidate() to add the candidate to the remote peer description
As WebRTC supports ICE Candidate Trickling, here the callers are allowed to automatically provide candidates with the callee once they make the initial offer. So, the callee can automatically begin on the call and set up a connection without waiting for other candidates to arrive.
On the whole, the integral point to be noted is that WebRTC automatically creates ICE candidates once the offer is created. Thus, we are supposed to implement the method that is needed to receive and send these candidates through signaling.
As once the information about media condition and ice candidates is shared among the two peers, the WebRTC automatically creates a direct connection among both peers to have any video chat or other conversation.
Done with Signaling — Brings About ICE to cope with NATs and firewalls
Getting the WebRTC connection for video chat with a unique IP address and PORT number and having them exchanged among the peers to communicate directly, might sound simple but it is far more difficult. This is so as due to two factors that can cause issues over here. So, it is vital to deal with these issues before making use of any web video conferencing application.
Let’s check on these two causing issues/factor,
Network Address Translation (NAT) is the process where one or more local IP addresses are translated into one or more Global IP addresses simply to provide internet access to the local hosts.
Well, we all know that it’s the address that identifies a device connection on the internet. Thus, everybody thinks that all the devices will have a unique IP address, but that’s not the truth.
Generally, an IPv4 address is 32 bits long that specifies that there are about 4 billion unique addresses (2³² = 4,294,967,296)available overall. But, it has been found that in 2018 alone, there were about 22 billion devices that were connected to the internet.
Now, you might be thinking how is it possible? – How come 22 billion devices can connect on the internet when there are only 4 billion possible unique addresses available? right!
For that, the answer is “NAT.”
Here, the entire story takes a turn when these IP addresses are divided into two categories – Public IP Addresses and private IP Addresses.
Now, public IP addresses can be assigned only to one device which is not the case with the private IP address. The idea of NAT is to provide multiple devices with access to the internet via a single public address.
So, this indicates that each device will have the information about its private IP address alone and not about the public IP address of the router. Moreover, during the Google search also the google will track and tell you about the public IP address of the router only.
Thus, we can say each device will have two IP addresses, both private IP address as well as public IP address. And as per the above scenario with WebRTC – the network candidates contain the details about only the devices private IP addresses and will not be aware of public IP addresses at all. So, now it is an extra task for us to find a way for the browser to know the Public IP address for the candidate to create a public IP address.
Henceforth, STUN (Session Traversal Utilities for NAT) server is used. Here, when the device makes a request to the STUN server, the STUN will respond back with a message containing the public IP of the router and help the browser to generate candidates.
Firewall is a network security device that monitors the incoming and outgoing network traffic. It also decides whether there is a need to allow or block a specific traffic or not, all that’s depending upon the defined set of security protocols.
Now, let’s see how this firewall creates a problem when it comes to WebRTC.
Well, to resolve the firewall issue here we need to utilize a TURN (Traversal Using Relay NAT) server. TURN server most likely acts as a relay server that relays the traffic directly between the two browsers or peers when direct peer to peer connection fails.
Now as we know, these STUN and TURN servers are used to make peer-to-peer connections using WebRTC. We can integrate a TURN/STUN with a WebRTC, simply by passing an object containing the URLs of TURN and STUN servers to the RTCPeerConnection as its argument.
Let’s have an illustration using coding for better clarity about the entire concept.
In the above example we have to pass the URL alone, the rest of the thing will be managed by WebRTC.
Have a look at the illustration diagram with all the connections that are made during the WebRTC video call.
However, during the entire process there are certain points that need to be made an account-of. This includes,
- It’s quite usual to have a successful connection using a STUN server without the need of TURN. But sometime, TURN server are also used to make calls
- Some of the organizations like XirSys gives out TURN and STUN server for free
However, this is the beginning of the video chatting app building process but there is much more to explore when it comes to implementation. We will have a look into it in our future blogs, so stay alert and get to know more about it.