Chapter 7: Related subjects

In this chapter we will see some VoIP related topics. There is a lot going on in the VoIP world and it would be impossible to describe every item which is related to VoIP. Therefore, I will give a brief overview of some subjects which I found interesting. They also give a good idea of what is currently being worked on.

First, a description is given about H.323 and the Session Initiation Protocol (SIP). Because these protocols are somewhat related, next I will present a short comparison of them. Finally, a brief explanation of the Real-Time Streaming Protocol (RTSP) is given.

7.1 H.323

The ITU-T document about H.323 is a recommendation for multimedia conferencing over packet based networks without QoS support. It is a part of the H.32X series of recommendations which all describe multimedia conferencing but over different types of networks. These recommendations are: [6]

RecommendationNetwork
H.320Narrowband Integrated Services Digital Network (N-ISDN)
H.321Broadband Integrated Services Digital Network (ISDN)
H.322Guaranteed bandwidth packet switched network
H.323Non-guaranteed bandwidth packet switched network
H.324The analogue phone system

This section presents an overview of the H.323 recommendation, mostly based upon the information in [6] and [14].

7.1.1 Functionality

End systems conforming to the H.323 recommendation can communicate with each other, either point-to-point or in a multipoint conference. These end systems may have different capabilities, but each must at least support G.711 audio encoding. Video support and other audio coders are optional. H.323 also defines how to do general data transfers, but this feature also is optional.

The recommendation allows communication with end systems on a different type of network, conforming to other H.32X standards. This requires special devices which connect to the different networks and do the necessary conversions.

Management and accounting support are also provided. This way it is possible to specify for example the maximum amount of bandwidth that may be occupied with H.323 calls. Accounting is provided to support billing of the callers.

The H.323 recommendation defines a framework for the development of supplementary services. Currently, two such services are already defined: call transfer and call forwarding.

Finally, since packet based networks - like IP networks - are often not very secure, H.323 defines several mechanisms to provide better security.

7.1.2 Components

Four components are specified in recommendation H.323: terminals, gateways, gatekeepers and multipoint control units (MCUs). A terminal is a system where H.323 data and signalling streams originate and terminate. It was already mentioned that such a system must at least be capable of handling G.711 audio.

A gateway is a device which allows H.323 capable systems to communicate with other H.32X systems. Gateways connect the different networks together and perform the necessary transformations. For example, it may be necessary to change signalling information or to use another audio encoding. A gateway is optional in a H.323 enabled network.

A gatekeeper is an optional component, but is very useful when present. When a gatekeeper is present, all terminals, gateways and MCUs must be registered with it. Two important services are provided by a gatekeeper. The first one is address translation from an alias - an international phone number for example - to a network address - an IP address for example.

The second major service of a gatekeeper is bandwidth management. A gatekeeper could be configured to limit the bandwidth used by H.323 calls or to only allow a certain amount of simultaneous calls.

An optional feature of a gatekeeper is to route calls. When a call is routed through a gatekeeper, this allows more effective control and more information about the call. This could be used to bill calls or to re-route a call to another system when a user is unavailable at the called endpoint.

A MCU is used for conferences between three or more endpoints. It contains a multipoint controller (MC) and possibly a number of multipoint processors (MPs). Participants send their control information to the MC so that endpoint capabilities can be exchanged and communication parameters can be negotiated. A MP is used to process the incoming media, for example to mix several streams together.

Three models for multipoint conferencing are defined. In all models each participant sends its control information is to the MCU, where it can be processed by the MC. In the centralised model, each participant also sends its media to the MCU. In the decentralised model the different media are distributed by multicasting them. In the hybrid model, some participants use multicasting to distribute the media, others send their media directly to the MCU.

7.1.3 Architecture

The H.323 recommendation is often called an `umbrella specification'. This is because it uses several other ITU-T recommendations to provide its functionality. The structure of the H.323 architecture is illustrated in figure 7.1.




Figure 7.1: H.323 architecture [14]

The audio coders are the ITU-T G. standards which were already described in chapter four. The video coders defined in the recommendation are H.261 and H.263. The H.263 coder was designed for low bit rate transmission but is more complex than H.261. Both audio and video are encapsulated in RTP packets and then transmitted across the network. Additional information about these transmissions is provided by RTCP.

Before two or more parties can communicate with each other, the call first has got to be set up. This is done using mechanisms defined in H.225.0 and H.245. A part of the H.225.0 recommendation specifies how a call should be set up and torn down. When the call has been established, the capabilities of the involved end systems are exchanged so that each end system can select the appropriate coders. This capability exchange is done by H.245, which also defines other functions, for example the opening and closing of logical channels to transport audio and video.

Another part of the H.225.0 recommendation specifies how the interaction with a gatekeeper should be done. This is a done by a protocol called RAS, which stands for Registration, Admission and Status. The RAS functions include gatekeeper discovery and endpoint registration with a gatekeeper. Functions like bandwidth management and admission control are also done by RAS messages.

H.323 end systems can also exchange general data with each other. How this should be done is specified in the T.120 recommendation. Like H.323, this is also an umbrella recommendation, defining how to use other protocols to exchange data.

How security services should be provided is defined in recommendation H.235. Authentication is provided by admission control of endpoints, which is done by a gatekeeper. Data integrity and privacy are implemented using encryption techniques. Finally, non-repudiation is also provided by a gatekeeper. Non-repudiation means that nobody can deny that he participated in a call.

7.2 Session Initiation Protocol (SIP)

The Internet Engineering Task Force (IETF) has also been working on protocols to provide multimedia communication. Like with H.323, the media themselves are transported with RTP, so the main difference between the approaches of the ITU-T and IETF is how call signalling and control is done. These functions are covered by the Session Initiation Protocol (SIP).

SIP is formally specified in [31] where it is described as an application-layer control protocol that can establish, modify and terminate multimedia sessions or calls. Although no real assumptions are made about the underlying network and protocols, SIP has been designed with the TCP/IP architecture in mind. Unlike call signalling and control protocols in the H.323 recommendation, SIP is a text based protocol. It resembles somewhat the Simple Mail Transfer Protocol (SMTP) and the Hypertext Transfer Protocol (HTTP), the protocols used to transfer e-mail and World Wide Web pages respectively.

This section describes the Session Initiation Protocol. The information herein is mostly based upon the SIP specification in [31].

7.2.1 User agent (UA)

A user agent (UA) is an application which resides at a SIP end station. It consists of two parts: a user agent client (UAC) and a user agent server (UAS). The UAC is responsible for sending SIP requests when a call needs to be established. The UAS is a server application which contacts the user when there is an incoming request and responds to it.

7.2.2 Network servers

Three types of network servers are defined. The first one is a redirect server. A user can send a call invitation request for another person to a redirect server. This server will then locate the user and return the necessary information to enable the caller to establish a call with the intended person.

The second type of server is a proxy server. Like with a redirect server, a user can send an invitation request to a proxy server. The proxy server will also try to locate the destination of the call, but unlike with a redirect server, it will not simply return possible locations of the called person. Instead, based upon that information, a proxy server will try to establish a connection on behalf of the caller.

Finally, the last server type is a called a registrar. A user can send information about its current location to a registrar; the user can register himself. This information can then be used to contact him. Thanks to registration information, personal mobility is allowed, which means that a person should be able to accept calls directed to him at any end system. The information sent to a registrar describes at which system a user should be contacted.

7.2.3 Operation

Somehow, you must specify to who you want to make a call. A SIP user is identified by a SIP Uniform Resource Locator (SIP-URL). Such a URL looks somewhat like a World Wide Web URL or an e-mail address. An example is `sip:me@home.net'.

When a user wants to invite someone into a session or wants to make a call to someone, the user can send an invitation request to the end system specified in the destination's SIP-URL. In the example above, the request would be sent to `home.net'. If the called user is available at that system, he can send a response, indicating whether he wants to participate in the communication or not. When the caller receives this response, he sends an acknowledgement to the other system.

The invitation request could also be sent to a redirect server. This redirect server would then look for possible locations of the called user and send the corresponding SIP-URLs back to the caller. Based upon this information, the caller could then try to contact the other user directly, as described above.

Finally, the caller could also send its invitation request to a proxy server. This proxy server then looks for possible locations of the other user and tries to invite that user itself. When the proxy knows that the invitation was either accepted or denied, it can send an appropriate response back to the caller. This way, a proxy acts as both a client and a server.

The invitation request normally contains information about the media that will be sent. If the invitation was successful, the response will also contain a description about the media that the other user will use. The SIP specification does not demand a specific format, but the Session Description Protocol (SDP) was designed for this purpose.

Note that SIP can be used to invite parties to both unicast and multicast sessions and that the initiator of the invitation does not actually have to participate in the session. SIP also offers services to provide secure communications.

7.3 H.323 vs SIP

Since H.323 and SIP offer similar services, which solution should be used? Comparisons of these protocols are given in [7] and [3]. In this section, I will summarise the key points made in [3]. For a more complete discussion you should consult these references.

When we compare the complexity of the two protocols, it seems that SIP is far less complex than H.323. The specification of H.323 is more extensive than that of SIP and defines a lot more elements. Furthermore, H.323 uses a binary encoding mechanism for call signalling and control, whereas SIP is text based. This textual format is easy to decode and much easier to debug than a binary representation. A part of the complexity of H.323 stems from the interaction between several components which are not cleanly separated. Also, in H.323 there may be several ways to accomplish a single task and some of the functionality is present in several parts of the protocol.

Considering the extensibility of the protocols, the experience with other protocols like SMTP and HTTP has been used to make SIP very extensible: new features can easily be incorporated into the protocol. H.323 also allows some extensions, but only at predefined places within the protocol. SIP is quite modular which allows its components to be changed quite easily. H.323 on the other hand, is less modular. Since various protocol components usually need to work together to accomplish a task, it will be harder to simply replace one component.

H.323 was originally intended for use on a single LAN. Currently, this restriction is no longer present, but H.323 can have some difficulties in detecting looping messages. SIP can be used over wide area networks without any difficulties, easily detecting loops when they occur. H.323 also has some difficulties when the conference size keeps increasing. The use of a Multipoint Controller (MC) is a bottleneck for the conference. When the conference size keeps growing, eventually another protocol will have to be used: H.332. Since SIP does not have something similar to a MC, it does not suffer from such scalability problems.

Like was mentioned before, the services provided by H.323 and SIP are roughly the same. However, when it comes to capability exchange services, it seems that H.323 has a much richer set of functionality than SIP. Also, H.323 has various conference control services for which SIP has to rely on external protocols. On the other hand, the personal mobility services provided by SIP are more extensive than similar support in H.323.

7.4 Real-Time Streaming Protocol (RTSP)

The Real-Time Streaming Protocol (RTSP) is also related to VoIP, but in a whole other way than H.323 and SIP, which can be used to establish VoIP calls in a standardised way. RTSP on the other hand does not have such a direct relationship with VoIP, it merely uses a lot of the same techniques.

The official specification can be found in [38], where RTSP is defined as a protocol which establishes and controls either a single of several time-synchronised streams of continuous media. Like SIP, RTSP is an application level protocol and is a part of the overall IETF multimedia data and control architecture.

Like with VoIP, the continuous media which are transmitted, are divided into tiny pieces which are separately sent across the network. At the other end, the continuous media will have to be reconstructed from those pieces. Usually RTP will be used for the transmission of such packets, although RTSP does not require it.

RTSP has some resemblance to HTTP, the protocol used to transfer World Wide Web pages. Whereas HTTP provides functionality to transmit text and images, RTSP tries to provide similar services for audio and video. RTSP provides a VCR-style remote control for audio and video. For example, a user can start, pause or stop the playback of media across a network.

A media server offers playback or recording functionality of media. A client can use RTSP to interact with such a media server. The following operations are provided:

A presentation or a media stream is identified by a RTSP Uniform Resource Locator (RTSP URL), which looks somewhat like a HTTP URL. An example of a RTSP URL is `rtsp://example.com/audio'. The overall presentation and the properties of the different media are defined in a presentation description which has to be obtained by means other that RTSP, for example via a World Wide Web page.

7.5 Summary

Several protocols and standards are related to VoIP. A first example is H.323, a recommendation for multimedia conferencing over packet based networks without QoS guarantees. It is a part of a series of standards which describe similar services over different kinds of networks.

H.323 is an umbrella specification: it defines how different components should work together to allow certain functions. Using H.323, a call can be set up, capabilities can be exchanged and parties can communicate with each other. Multi-user conferences are also possible. Media originate and terminate in terminals. A gateway makes communication possible over different kinds of networks. Multipoint conferences are made possible through the use of a multipoint control unit (MCU). A gatekeeper makes advanced features like bandwidth management and accounting possible.

Recommendation H.323 was developed by the ITU-T. The IETF has been working on an architecture which provides similar services. As in H.323, the media are transmitted using RTP, so the main difference between the two approaches lies in call signalling and control. In the IETF architecture, the Session Initiation Protocol (SIP) can be used for these services. The functionality which SIP provides is similar to that of signalling and control components in the H.323 architecture. An important feature of SIP is the possibility of personal mobility. This means that a user can answer calls directed to him at any place he wants.

When we compare signalling and control in H.323 and SIP, it seems that SIP is less complex, better extensible and better scalable. Their functionality is similar, but the capability exchange method of H.323 is more advanced. SIP on the other hand, provides better personal mobility.

The Real-Time Streaming Protocol (RTSP) is another VoIP related protocol. The protocol serves as a sort of remote control for a media server. For example, RTSP can be used to play back a presentation or media file stored on a server. It could also be used to record an ongoing presentation.


Next: Chapter 8
Previous: Chapter 6
Contents