From the rise of the Internet itself until today, the entire human race has had this amazing gadget which is our smartphone. The smartphone is one of many great inventions of our time which gives us the ability to play games, see videos on Netflix, write mean tweets on tweeter, and make phone calls to another destination on earth, and many more…
Isn’t that amazing? Well yes kind of, almost everyone has a smartphone today. It became the norm for each kid to have one, but If you’re asking me it’s a gift but can also be a burden if not treated properly, but that’s from the spiritual approach 😉
In the past, we didn’t have smartphones which allowed us to initiate phone calls that easily without being connected to anything.
If not… what we did have?
Well it’s kind of funny but a lot of people as of today, do not remember the old phone landlines, and some don’t even know them.
All of those phone landlines in time developed protocols and equipment that were used to scale those networks for more features and performance in some cases.
As of today, we use some of those protocols and infrastructure with our phones and even home networks.
If that got you intrigued than I highly suggest you to read about DSL, PSTN, ISDN, MSC(Mobile Switching Center).
There are even more networks and techonologies which maybe I’m not even familiar with, but I bet if you start reading on one of those that will get you going to more subjects as well. Happy Reading! 🙂
Intime that the internet has grown pretty big, a great idea of leveraging the IP network came, which was making voice and video calls on it in addition to the telecommunications networks, and as a result, it brings to the table a lot of features which changes the game in many ways.
If we already have that IP infrastructure which works great and we actually getting it “for free”, why not use it for making phone calls which can be modified more easily than landlines.
The challenging subject here is like with everything else, as technology grows there’s the complexity that comes with it.
If you’re not familiar with TCP/IP or in particular with the networking world then I highly suggest you check out this post as well.
The entrance of VoIP
VoIP or Voice over IP, is a world that includes a set of protocols and technologies, that in the end allow us to make video and voice calls over IP networks.
Sounds great, isn’t it?
It really is great, because it allows us to use more dynamic infrastructure in terms of cabling and addition of protocols, so when the need to grow in terms of features and caballing, that could be done easily.
The last sentence might seem hard to understand, but you can look at the old movies.
Remember that we saw operators connecting people over the phone to other people?
That was wiring cables end-to-end in order to connect customers all along the way.
What I meant by wiring the cables end-to-end, that means in terms of electricity connecting the cables from one end to the other, and as you can imagine already, maintaining infrastructure like that for each client is quite hard and especially as the clients number grow by the minute.
This means that people needed physical connections from end-to-end, in order to arrive somewhere.
In IP networks it’s a whole another game.
Because of the entire IP stack which includes routing for example which allows dynamic routes for data to flow, and it also allows more flexibility in terms of infrastructure and client connections.
So how does it work?
There are many protocols and tools, but if we talk about the main protocols in this world they are SIP(Session Initiation Protocol) and RTP(Real-Time Protocol), RTCP(Real-Time Control Protocol), SDP(Session Description Protocol), and the new kid on the block WebRTC.
In a very high-level manner:
- SIP — Responsible to handle the session of the call itself in terms of signaling, but wait… what is signaling?
Well, they are basic operations in a call like dialing, ringing, putting on hold, ending the call, transfer the call to someone else, and more… - RTP — Responsible to handle sending/receiving the media itself of the call(Video/Audio).
- SDP — As his name suggests, he describes how the media of the session is decided between the call parties.
- RTCP — Responsible to accumulate statistics about the call and also do real-time updates for the call. For example, slowing down the sending/receiving rate and more…
- WebRTC — Is a stack that after a very long time, allowed to setup a real-time connection that allows ease of development regarding live chat applications with Audio/Video.
Quite landed a huge bump on you, didn’t I?
Well, we’re going to cover each part of it from up to bottom so don’t worry.
Do you have an example?
When we’re taking an IP phone or using any kind of application that does VoIP calls, we’re going to do the following flow.
Once dialing, we’re going to invite the one we called to a call session.
That invite is a SIP message which describes to the called party how we can talk to him, which is basically SDP included in the SIP message body.
Once the called client has received the invite, his phone starts to ring until he answered, and once that happens the session has started both parties use RTP to send/receive the audio/video of the call.
You might be thinking, well it sounds that SIP is quite lazy and he only does the call setup, while RTP is doing the hard work.
Let’s just say that SIP hasn’t finished his job at all so he actually does a lot more.
While the call is live, the call could be going a lot more events that could change its course — busy, call transfer, conference, etc…
Another important fact is that SIP and RTP are application layer protocols.
SIP is using TCP in order to ensure that his messages will be sent, and they will not be lost, while RTP is being sent over UDP.
The reason for RTP to be sent over RTP is quite simple.
When we do a voice or video call, it doesn’t really matter if a frame of video or a fragment of a call was lost, because the need is for it be sent fast, and because of that we can afford to miss some sort of data.
Performance and networking
You might be thinking for yourself already.
How does video calls in popular applications work today, because they transfer not only video but audio and even sometimes other data as well.
The answer to that is quite simple.
A SIP call could have multiple RTPs for each data source, so each end of the call receives separate audio and video for different ports of RTP.
RTP could also contain multiple sources of data(voice, video, etc..) on a single RTP port.
That’s achieved with channeling inside the RTP, which is achieved by describing inside the SIP SDP multiple channels for it beforehand.
Now we can see that RTP could do a lot of stuff, and he really does because he can pretty much do anything in terms of transferring data, and the only thing we need to do in order to achieve that is simply to describe him how to do that.
If we’re sending over RTP our media sources which are video and/or voice, it must be very heavy on a typical smartphone and even common PC.
You might be asking, what about compressing the data before sending and decompress it after receiving it, just like Winrar for example.
We do have a solution for that, and because of that, we use codecs.
What are codecs?
In order to understand what codecs are, let’s first discuss how we send an image over the internet.
Do you know that awesome 4K technology?
Have you ever considered why the quality of the image is so astonishing when we see our favorite movie?
Well, there’s a reason for that.
Let’s just say that a 4K image has its own cost.
Each 4K image in terms of pixles has 3840 x 2160 = 8,294,400 pixles in each image.
Each pixel is usually a byte, so if we will send that image over a socket, for example, we are actually sending an 8.29 GB over the network.
What if we need to send a 4K image with 30 frames per second to the other party?
If we calculate it, we get 248.7 GB being sent each second!
What if we add also a voice to that?
What if there are many more applications on the phone or PC working as well, and even some of them are high networking consuming services?
I hope you see where I’m going with this, it will simply be very high consuming for our network card.
Therefore, codecs were born!
Codecs help us to take data and compress it to smaller units so we will send much smaller data units over the wire.
This way we will not have too many network errors that could lead to losing data, because there’s too much of it over the wire.
You can think of the codecs like Winrar, but for more specific cases which are video and voice.
You might be guessing right now the following question.
Can computer A that has Codecs X, Y, and Z talk to computer B which has Codecs X, G, R?
It can but through only one codec which is X.
When a voice or video call is trying to be initiated, the codecs must match, because his codec compresses the data by his algorithm so the same algorithm is needed to decompress it.
How the call is decided?
Another protocol we now will meet is SDP(Session Description Protocol).
SDP is actually being sent inside the SIP negotiation.
SDP is simply describing how the RTP will be sent and received.
The main values are deciding on the ports for sending and receiving the RTP, and also the Codecs to use.
SIP is actually trying to find out which RTP ports he can use in order to receive RTP data from the other party.
When he finds that out, he inserts those UDP ports to the SDP, so the other party will know where to send his side of RTP to us.
Hmmm… you might be asking yourself.
Why RTP wouldn’t decide for himself all of that?
Remember we said that RTP is over UDP?
UDP is a stateless protocol and we also don’t even know if a UDP message was sent to the other end, so that’s why we need a safe connection that will tell both parties how to act once the RTP connection will start.
While SIP does his call negotiation he sends inside his own SIP message, the SDP itself.
It seems a little bit overwhelming, but believe me, after you start to work with it for a while, this thing would look like a cooking recipe.
In order to understand more deeply about RTP, I wanted to add the fact that if we don’t actually receive the data on the allocated port of RTP, the RTP would still be sent because there’s no validation of the data has been sent due to using UDP.
SIP Dialogs
In every SIP call, we have a SIP dialog.
A SIP dialog helps to understand SIP messages to distinguish between them, so we could understand to which session they are related to.
That means that for a SIP call, we would be able to understand which call needs to be forwarded, turn to busy, end it, and more…
Distinguish between the sessions is quite easy.
In the first SIP invite message, we have on the From field on the end of it a tag variable, and also when the recipient of that Invite responds back with the OK message, another tag is added for the To field.
As with the first SIP Invite, each invite is referred to as SIP ReInvite.
This signals that the RTP of the call might need to be updated due to the removal or additional clients on the call.
Architectures and stuff…
Today we will use these technologies in a variety of use cases.
In some cases, we will use a Call Center which is a SIP server that lets SIP clients connect to him, and allow access to call other clients on that Call Center.
The call center could be a shelf product from Cisco or Juniper for example or even be an open-source project or self-developed SIP server.
Of course that the better the hardware and the implementation of the relevant product lead to better results in most cases.
Conclusion
VoIP is a nonstop growing world which has many interesting challenges, which each day, I’m personally learning from it how the world of networking works better, so if you wish to get better at the world of networking try giving it a shot 😉
With the entrance of WebRTC of course there are many opportunities for making our web-based applications to have a better experience in terms of features, but still, the world of SIP is still needed and gives a lot of solutions as well.
I hope you had a great time reading this piece, and if you have any further questions I would be delighted to answer them.
Also, if you have any opinions or suggestions for improving this piece, I would like to hear 🙂
Thank you all for your time and I wish you a great journey!