Thesis: Voice over IP in networked virtual environments
I submitted this thesis for a university degree at the School for Knowledge Technology (or 'School voor Kennistechnologie' in Dutch), a cooperation between the Hasselt University and the Maastricht University.
Voice over IP is about transmitting voice information across an IP network, for example the Internet. The classical application of VoIP is as a telephone alternative. However, this thesis is about using VoIP in networked virtual environments.
The Internet Protocol (IP) is a part of the TCP/IP architecture. The protocol itself offers only a best-effort service: packets can be delivered out of order, corrupted, duplicated or not at all. Also, each packet takes a different amount of time to reach its destination. Applications normally do not use IP itself, but the higher level protocols: TCP, which offers a reliable byte stream service, and UDP, which offers a similar service as IP.
The speech signal is transmitted by digitising tiny pieces of it at regular intervals and sending these to the destination where an analogue signal is reconstructed. For good quality communication, the overall delay should be below 200 ms. Delay variance or jitter should be eliminated through buffering. Speech communication is fairly tolerant to lost or corrupted packets.
When the digitised speech signal is left uncompressed, a bandwidth of 64 kbps is needed for telephone quality communication. Various compression techniques can reduce this amount. The most successful among them model how the speech was produced rather than the signal itself. Various compression standards allow interoperability between applications.
To transmit the speech data, TCP is not a good choice: it has a lot of features which are unnecessary for VoIP, but which increase the overall delay. UDP itself is too simple, but we can extend it: this is the way RTP is used in the TCP/IP architecture. The Real-time Transport Protocol (RTP) provides information for synchronisation, flow and congestion control and identification. To provide some quality of service (QoS) guarantees, resources can be reserved by using RSVP, the Resource Reservation Protocol.
For VoIP in virtual environments, speech data will have to be sent to several destinations. This can be done in an efficient way by using multicasting. When a packet arrives at the receiver, the voice signal is extracted and a 3D effect is added to it, corresponding to the position of the sender. A sound appears to be localised because of interaural differences. These differences can be captured in Head-Related Transfer Functions (HRTFs) which can then be used to recreate localised sounds.
To be able to create VoIP applications myself, I first developed a RTP library which performs quite well. I also developed a VoIP framework in which different VoIP components can easily be tested. The applications I created with this framework include an Internet Telephony application and a 3D environment. Both allow good quality communication when sufficient bandwidth is available.
You can download the thesis text in PostScript format. Two versions are available. The first version is the original of my thesis, which was made with StarOffice 5.1 and 5.2. However, since the resulting PostScript file is rather large, I have also made a LaTeX version, which resulted in a much smaller file.
Apart from the PostScript versions, I have also converted the thesis into HTML format, so you can also browse the thesis online.
- Original version:
- LaTeX version:
- Browse thesis online
As mentioned in the thesis, I also developed some software. Here's a brief overview:
- JRTPLIB, an object-oriented RTP library, written in C++. This library can be downloaded and can be used freely.
- I also developed a VoIP framework. This makes it easier to create VoIP applications. Further work later resulted in a VoIP library: JVOIPLIB
- Based upon the VoIP framework, I created some sample applications. One is a simple Internet telephony application, the other a 3D environment where simple 3D effects were added to the voice signals. You can look at some screenshots here (taken from the thesis text).
About the original
Like I mentioned before, I created the original version of the thesis text with StarOffice v5.1 and v5.2. Both versions were for the Linux platform (I used the Slackware distribution). All images in the thesis were created with 'Dia' v0.84, which I found very useful. A few images were edited with 'The Gimp' v1.0.4