Voice over IP

Nova Stars Elecronics

Member of

Saudi Arabia Trade and Business Directory

Special Communication Systems

Voice over IP

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.

Voice over an IP based network protocols

Purpose
Layered model
IP (Internet protocol)
UDP (User Datagram Protocol)
RTP (Real-time Transport Protocol)
The complete header
RTP Payload
Conclusion

Purpose

This white paper describes the protocols involved in the transmission of voice samples through an IP based network. This document aims to give the reader the basic grounding that is required to further investigate the bandwidth requirements of voice over IP.

This paper does not discuss header compression schemes, and does not discuss layer 2 protocols. Furthermore, this paper only considers IPv4 and not IPv6.

Layered model

In common with many communications systems, the protocols involved in Voice over IP (VoIP) follow a layered hierarchy which can be compared with the theoretical model developed by the International Standards Organisation (OSI seven layer model). Breaking a system into defined layers can make that system more manageable and flexible. Each layer has its job, and does not need a detailed understanding of the layers around it.

For example, IP datagrams can be transported across a variety of link layer systems including serial lines (using PPP), Ethernet and Token Ring. The link layer protocol is for the most part irrelevant to IP (unless that protocol limits the size of its datagrams), and need not be the same for the first link of a Voice over IP call and the final link of a VoIP call.

As always there are exceptions (such as IP over ATM), but the simple discreet layered model will be considered in this document.

The effect of each layer's contribution the the communication process is an additional header preceding the information being transmitted. The complete packet which a layer creates (header and data) becomes the data passed to the next level for processing. That layer will then add a header portion, and so on...

Each layer, started at the Network (or Internet) Layer are considered in the sections which follow.

IP (Internet Protocol)

The Internet Protocol is the lowest level protocol considered in this document. It is responsible for the delivery of packets (or datagrams) between host computers. IP is a connectionless protocol, that is, it does not establish a virtual connection through a network prior to commencing transmission; this is the job for higher level protocols.

IP makes no guarantees concerning reliability, flow control, error detection or error correction. The result is that datagrams could arrive at the destination computer out of sequence, with errors or not even arrive at all. Nevertheless, IP succeeds in making the network transparent to the upper layers involved in voice transmission through an IP based network.

Any Voice over IP transmission must use IP (by definition). IP is not well suited to voice transmission. Real time applications such as voice and video require guaranteed connection with consistent delay characteristics. Higher layer protocols address these issues (to a certain extent).

The diagram below shows the header that proceeds the data payload to be transmitted. In its most basic form, the header comprises 20 octets. There are optional fields which can be appended to the basic header, but these offer additional capabilities which are not necessary for VoIP transmission as described in this document.

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5,9...								Octet 2,6,10...								Octet 3,7,11...								Octet 4,8,12...
1 - 4	Version				IHL				Type of service								Total length
5 - 8	Identification																Flags			Fragment offset
9 - 12	Time to live								Protocol								Header checksum
13 - 16	Source address
17 - 20	Destination address

The fields shown are briefly described below:

Version: The version of IP being used. For this format header, the version would be 4.
IHL: The length of the IP header in units of four octets (32 bits). For the basic header shown in this diagram, the value would be 5 (each line in the diagram represents four octets).
Type of service: Specifies the quality of service requested by the host computer sending the datagram. This is not always effectively supported by routers or Internet Service Providers.
Total length: The length of the datagram, measured in octets, including the header and payload.
Identification: As well as handling the addressing of datagrams between two computers (or hosts), IP needs to handle the splitting of data payloads into smaller packages. This process, known as fragmentation, is required because, although a single IP datagram can handle a theoretical maximum length of 65,515 octets, lower link layer protocols such as Ethernet cannot always handle these large packet sizes. This field is a unique reference number assigned by the sending host to aid in the reassembly of a fragmented datagram.
Flags: These flags indicate whether the datagram may be fragmented, and, if it has been fragmented, whether further fragments follow this one.
Fragment offset: This field indicates where in the datagram this fragment belongs. It is measured in units of 8 octets (64 bits).
Time to live: This field indicates the maximum time the datagram is permitted to remain in the internet system. This parameter ensures that a datagram which cannot reach its destination host is given a finite lifetime.
Protocol: This indicates the higher level protocol in use for this datagram. Numbers have been assigned for use with this field to represent such transport layer protocols as TCP and UDP.
Header checksum: This is a checksum covering the header only.
Source address: The IP address of the host which generated this datagram. IPv4 addresses are 32 bits in length and, when written or spoken, a dotted decimal notation is used (e.g.: 192.168.0.1).
Destination address: The IP address of the destination host.

UDP (User Datagram Protocol)

Generally, there are two protocols available at the transport layer when transmitting information through an IP network. These are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). Both protocols enable the transmission of information between the correct processes (or applications) on host computers. These processes are associated with unique port numbers (for example, the HTTP application is usually associated with port 80).

TCP is a connection oriented protocol; that is, it establishes a communications path prior to transmitting data. It handles sequencing and error detection, ensuring that a reliable stream of data is received by the destination application.

Voice is a real-time application, and mechanisms must be in place with ensure that information is received in the correct sequence, reliably and with predictable delay characteristics. Although TCP would address these requirements to a certain extent, there are some functions which are reserved for the layer above TCP. Therefore, for the transport layer, TCP is not used, and the alternative protocol, UDP, is commonly used.

In common with IP, UDP is a connectionless protocol. UDP routes data to it's correct destination port, but does not attempt to perform any sequencing, or to ensure data reliability.

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5								Octet 2,6								Octet 3,7								Octet 4,8
1 - 4	Source port																Destination port
5 - 8	Length																Checksum

The fields shown are briefly described below:

Source port: Identifies the higher layer process which originated the data.
Destination port: Identifies with higher layer process to which this data is being transmitted.
Length: The length in octets of the UDP data and payload (minimum 8).
Checksum: Optional field supporting error detection.

RTP (Real-time Transport Protocol)

Real time applications require mechanisms to be in place to ensure that a stream of data can be reconstructed accurately. Datagrams must be reconstructed in the correct order, and a means of detecting network delays must be in place.

Jitter is the variation in delay times experienced by the individual packets making up the data stream. In order to reduce the effects of jitter, data must be buffered at the receiving end of the link so that it can be played out at a constant rate. To support this requirement, two protocols have been developed. These are RTP (Real-time Transport Protocol) and RTCP (RTP Control Protocol).

RTCP provides feedback on the quality of the transmission link. RTP transports the digitised samples of real time information. RTP and RTCP do not reduce the overall delay of the real time information. Nor do they make any guarantees concerning quality of service.

The RTP header, which precedes the data payload, is shown in the diagram below:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5,9								Octet 2,6,10								Octet 3,7,11								Octet 4,8,12
1 - 4	V=2		P	X	CC				M	PT							Sequence number
5 - 8	Timestamp
9 - 12	Synchronisation source (SSRC) number

Version: Identifies the version of RTP (currently 2).
Padding: A flag which indicates whether the packet has been appended with padding octets after the payload data.
X (Header extension): Indicates whether an optional fixed length extension has been added to the RTP header.
CC (CSRC count): Although not shown on this header diagram, the 12 octet header can optionally be expanded to include a list of up to contributing sources. Contributing sources are added by mixers, and are only relevant for conferencing application where elements of the data payload have originated from different computers. For point to point communications, CSRCs are not required.
M (Marker): Alllows significant events such as frame boundaries to be marked in the packet stream.
PT (Payload type): This field identifies the format of the RTP payload and determines its interpretation by the application
Sequence number: A unique reference number which increments by one for each RTP packet sent. It allows the receiver to reconstruct the sender's packet sequence.
Timestamp: The time that this packet was transmitted. This field allows the received to buffer and playout the data in a continuous stream.
Synchronisation source (SSRC) number: A randomly chosen number which identifies the source of the data stream.

The complete header

The headers of the three payload carrying protocols discussed are sent sequentially before the digitised voice or video samples, which are actually the payload the RTP header.

The result is a 40 octet overhead for every packet of data:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5,9...								Octet 2,6,10...								Octet 3,7,11...								Octet 4,8,12...
1 - 4	Version				IHL				Type of service								Total length
5 - 8	Identification																Flags			Fragment offset
9 - 12	Time to live								Protocol								Header checksum
13 - 16	Source address
17 - 20	Destination address
21 - 24	Source port																Destination port
25 -28	Length																Checksum
29 - 32	V=2		P	X	CC				M	PT							Sequence number
33 - 36	Timestamp
37 - 40	Synchronisation source (SSRC) number
	The headers are followed by a payload of digitised voice or video samples

RTP payload

The IP, UDP and RTP headers are followed by the data payload of the RTP header. This comprises digitised samples of voice and video. The length of these samples can vary, but for voice, samples representing 20ms are considered the maximum duration for the payload.

The selection of this payload duration is a compromise between bandwidth requirements and quality. Smaller payloads demand higher bandwidth per channel band, because the header length remains at forty octets. However, if payloads are increased, the overall delay of the system will increase, and the system will be more susceptible to the loss of individual packets by the network.

This subject is discussed in more detail in the white paper Bandwidth requirements for Voice over IP transmission.

Conclusion

This document has detailed a common set of protocols used for the transmission of voice over IP through a local or wide area network. It should be borne in mind that there are other methods of transmitting voice through an IP based network. Some of these are vendor specific, and some are still under development by the Internet Engineering Task Force.

Specifically, header compression and multiplexing techniques can go some way towards reducing the bandwidth requirement across a WAN.