|
A DISCUSSION
ON VoIP |
|
Peter Lupica |
|
Copywrite
12/15/2003 |
Some forty years ago, the telephone companies
began to introduce digital carrier systems (DS1 or
T1) and thus established the digital carrier
hierarchy that is still prevalent today. These
systems digitized voice, or analog communications,
into as a series of 1's and 0's so in essence, "it
looked just like data". In fact, since the
beginnings of digitized voice, there was little
debate that somehow voice bits were different from
data bits. This digitization offered numerous
advantages, primarily from the standpoint of
reducing/controlling transmission impairments.
Voice does however have some unique
characteristics when compared to data. Chief among
them is the fact that voice communication is time
sensitive in that each packet has to arrive not only
in the proper sequence (order) but also within a
certain time frame. When using our PC's, most of us
have probably experienced 'hung' applications or
what would be a seemingly inordinate delay in
response time. One need only imagine how a similar
situation would impact a voice call. Due in part to
this, voice communications has remained a circuit
switched application. Conventional TDM (Time
Division Multiplexing) was certainly a reliable
means to support voice communication however a
'disadvantage' of TDM is that the circuits or paths,
are idle until they were required to service another
connection. In the case of DS1 (T1-Carrier), there
are 24 time slots or paths, each of which can
support one connection for a voice call. With
fluctuating traffic, it is not uncommon for the
facility to be, on average, largely idle.
With regard to data communications, the later
situation is somewhat similar in that contemporary
networks are generally underutilized with the
average occupancy being 10% - 15% or less. However,
their architecture is fundamentally different in
that the entire facility is used for communication,
analogous to a 'party line', with the protocol
keeping track of which messages are intended for
which users.
Since most organizations currently have a data
communications network as well as a voice
communication network, it would appear that savings,
in some cases on a rather significant scale, could
be realized from having only one network. This would
be the case not only from the MAN/WAN perspective
but also from the perspective of the LAN for local
distribution or cabling. This factor, along with the
promise of futuristic applications, has largely
fueled the movement toward a "converged" network. In
many circles, IP communications is believed to be
the next generation of networking technology that
will be used to combine voice, data, video,
wireless, and multimedia applications into a single
integrated enterprise infrastructure. It not only
holds the promise of offering much more efficient
use of bandwidth by having voice and data share the
same connections/networks but also the capability of
handling all types of traffic and to deliver more
services than were available with separate voice and
data networks.
VoIP, as it is generally referred to, stands for
"Voice over Internet Protocol". Some also refer to
it simply as 'voice over' or just IP. There is also
some interest devoted to accommodating voice over a
Frame Relay network. That is referred to as VoFR.
Regardless of what it is called, it means taking the
digitized voice signals, assembling them into a
protocol data unit, and transmitting them over a
shared facility. The front runner for the protocol
of choice is the Internet Protocol. What makes IP a
good choice besides the fact that it has become the
universal standard for enterprise networks, is that
it operates over a wide array of physical networks
from Ethernet LANs to MAN's to WAN's almost
regardless of the underlying transport mechanism. IP
is able to do this because the application is unable
to "see" the physical network details, and the
protocol provides a consistent user interface.
There appears to be little doubt that this
technology will eventually become commonplace. The
reason is that it has widespread vendor support.
Most, if not all of the major data hardware vendors,
(notably Cisco Systems), are solidly in this camp.
PBX vendors are beginning to alter the fundamental
switching fabric of their systems - away from TDM
and towards packetized voice using IP. The major
communications carriers, such as Sprint, AT&T, MCI,
SBC, Verizon, etc. are beginning to deploy IP based
networks to carry voice communications. Their
impetus is derived from the potential cost savings
that could be realized by more fully utilizing their
network infrastructures and by offering
network-based applications which could not be easily
implemented just a few years ago.
The evolution of VoIP seems to be inevitable. It
appears that this is all you will be able to buy in
a matter of a couple of years. However, deploying,
or attempting to deploy a VoIP network is not
without its pitfalls. Voice is more than just an IP
network application. It is a fundamental business
and consumer service that has for a century been
delivered on a daily basis with predictable quality.
When VoIP technology is deployed for voice services,
users will both expect and need service quality that
matches that of the Public Switched Telephone
Network (PSTN). Voice, being a real-time
application, requires special QoS (Quality of
Service) considerations that are not needed by data.
Being time-sensitive, voice has a low tolerance for
delay, and an even lower tolerance for delay
variance or jitter. In addition, voice applications
generally have a low tolerance for packet loss.
Since voice most often utilizes UDP (User Datagram
Protocol), there is no real 'end-to-end' connection
as this would essentially defeat its' purpose. That
means that a lost packet means lost data; there are
no re-transmissions.
While it is well-known that the IP network
performance parameters that impact voice are packet
loss, delay, and jitter, the type and degree of
impact that these parameters have on
voice quality is lesser known. This is
because there are many other VoIP processes that
impact voice, and these various processes, together
with IP network performance; influence each other in
complex ways to affect overall voice service
quality.
A good IP Communications system is
standards-based. However, as the saying goes -
the thing about standards is that there are so many
from which to choose. A standards-based system,
while not absolutely guaranteeing interoperability,
goes a long way toward insuring interoperability
among and between the products and services of
different vendors. The difficulty at this point is
that these standards are still emerging. This fact
does not lend itself to implementing a system that
is easily installed and maintained especially when
considering that the system should also allow a
network to be upgraded or migrated in stages while
still being able to interoperate with existing
'legacy' systems whether they be voice or data.
It appears that the current standards efforts
that are receiving the most attention are addressing
issues such as call set-up and disconnect procedures
(i.e. H.323 vs. SIP) and schemes aimed at
'enhancing' packet delivery mechanisms (i.e., MPLS
vs. DIFFSERV). While these individual areas will in
fact have an impact on VoIP, in our opinion the
critical area of concern has to do with the
somewhat elusive concept of "voice
quality" or "call quality".
Thus, it warrants further discussion.
The two key parameters of voice service quality
most affected by IP network performance and VoIP
processing are voice clarity (also known as speech
quality) and voice delay. Voice clarity
depends on many factors in addition to packet loss
and jitter, and the various factors influence one
another. It is vital that the specific impact of
these parameters be known before judgments are made.
For example, a certain degree of packet loss can
have varying affects on clarity so it may prove to
be unwise to invest in QoS technology to overcome a
perceived packet loss problem, if packet loss does
not appear to affect voice quality.
Voice delay includes more than just IP
packet transmission delay. Delay can be introduced
from a number of sources including VoIP gateway
processes such as codecs and jitter buffers. High
packet jitter can add to delay by increasing a
gateway's jitter buffer size requirements. Actual
packet delay will simply add to this. Thus, it is
vital to know what the end user delay experience
will be, and this can only be accomplished with
active voice delay measurements. Knowing how an IP
network will perform in terms of these important end
user service parameters, and in terms of the
underlying factors of packet loss and jitter, is
very valuable prior to making critical decisions and
investments regarding a VoIP deployment. This is the
primary purpose of a pre-VoIP network assessment
(discussed below).
Data network performance is usually reported on
using several metrics since there are many factors
to consider. However in the "voice" telephony world
call quality measurement has traditionally been
subjective and is accomplished by listening to the
quality of a voice call. The leading subjective,
single metric measurement of voice quality is the
MOS (mean opinion score). This is derived by having
a group of people listen to the call and give their
opinion of the call quality on a scale from 1 to 5
with 5 being best.
Since VoIP is a data network
application the MOS method leaves much
to be desired in measuring call quality, which is at
the very heart of the matter and perhaps the most
important criteria to consider. Progress has been
made in establishing objective measurements of call
quality. Again, various standards have been
developed and espoused:
- PSQM (ITU P.861) / PSQM+: Perceptual Speech
Quality Measure
- MNB (ITU P.861): Measuring Normalized Blocks
- PESQ (ITU P.862): Perceptual Evaluation of
Speech Quality
- PAMS (British Telecom): Perceptual Analysis
Measurement System
- The E-model (ITU G.107)
PSQM, PSQM+, MNB, and PESQ are part of a
succession of algorithm modifications starting in
ITU standard P.861. British Telecom developed PAMS,
which is similar to PSQM. The PSQM and PAMS
measurements send a reference signal through the
network and then compare the reference signal with
the signal that's received on the other end of the
network via digital signal processing algorithms.
These measurements are frequently found in test labs
and are used primarily for analyzing the clarity of
individual devices such as a telephone handset.
Vendors that implement these algorithms then map
their scores to MOS. However, these approaches are
not really well suited to assessing call quality on
a data network. The models used are not based on
data network issues, so they do not lend themselves
to mapping back to the network issues of delay,
jitter, and datagram loss. Also, they aren't suited
to the two-way simultaneous flows of a real phone
conversation, and they don't scale to allow
evaluation of the quality of hundreds or thousands
of simultaneous calls. On the other hand, the
"E-model" (ITU G.107) is a complex formula that
calculates a single score called an "R factor". Once
an R factor is obtained, it can be us to calculate
an estimated MOS. R factor values range from 100
(excellent) down to 0 (poor) whereas a MOS can range
from 5 down to 1.
While it is beyond the scope of this paper to
present a detailed discussion of all
of the attendant protocols and issues nor to offer
any sort of an endorsement, it is our considered
opinion that there is little doubt that the critical
issue of "voice quality"
must be thoroughly addressed as it can either "make"
or "break" a successful VoIP implementation.
The essential starting point, if one is to
seriously consider a VoIP implementation, is
assessing the IP network for expected VoIP
performance, prior to VoIP deployment. This is
required in order to determine what needs to be done
to the IP network, and what VoIP systems and
architectures will be needed in order to take
advantage of the particular IP network that is in
place. This will enable an organization to put in
place the appropriate IP network architectures,
configurations, and possibly QoS mechanisms, to
guarantee voice service performance. It will also
enable the organization to select the optimal VoIP
systems and architectures needed.
In order to develop this guidance it is essential
that a complete and comprehensive assessment be
conducted in order to insure that nothing is
overlooked. It goes without saying that the embedded
infrastructure must be thoroughly documented and
reviewed - routers, bridges, switches, distribution,
gateways, firewalls, servers, etc., etc. Beyond that
however, it is not enough to simply measure IP
packet loss, delay, and jitter. Knowing these
performance parameters, while establishing important
reference points, will not provide an adequate
indication of how well a voice service will perform.
As indicated above, one must also know how these
parameters affect voice clarity and delay.
An adequate pre-VoIP network assessment must
benchmark a network's performance in terms of voice
clarity and delay, as well as packet loss and
jitter. Also, actual end user voice delay should be
measured, rather than just IP packet delay. This
provides a complete and comprehensive assessment of
VoIP network performance, ensuring that critical end
user parameters are known prior to designing and
deploying VoIP services.
This part of a VoIP readiness assessment is
usually done in steps, starting with a simple test
and getting more advanced.
- One call - determine the voice quality of a
single call, in two directions
- Many calls - determine the voice quality of
each call, during peak call volume
- Many calls on a busy network - determine the
voice quality of each call, during peak call
volume with heavy background traffic
It is important to understand the results at each
step before continuing. For example, if the voice
quality is 'low' on successive single VoIP calls,
then a determination must be made as to the degree
to which the underlying network attributes are
contributing to the situation. Only after completing
the third step, with documented support that voice
quality will be acceptable, would an organization be
ready to proceed with VoIP deployment.
Most basic pre-VoIP assessments can be performed
using some rather straightforward techniques. While
the following is not intended as an exhaustive
treatment of an extremely important and somewhat
complex area, it will serve as a reference for the
reader when considering the extent of a through
network assessment.
- Measure voice clarity and delay between each
site at which VoIP will be deployed. If trended
measurement results fall within the thresholds
of acceptability, refer to packet loss and
jitter measurement results for acceptable
baseline values. These baseline values should be
maintained for acceptable service quality when
VoIP services have been deployed. However, when
actual VoIP services have been deployed, clarity
and delay testing should be repeated to certify
the deployment. If measurement results indicate
potential quality problems, refer to packet loss
and jitter measurement results for indications
of possible causes.
- Measure VoIP Packet Loss and Jitter for more
precise determination of causes of poor voice
clarity, or to baseline the performance
parameters of the IP network under conditions of
acceptable service quality.
- Measure Voice Delay. Round-trip voice delay
measurements are valuable because they more
accurately characterize a user's experience with
regard to delay. A telephone user perceives
round-trip voice delay, not one-way voice delay.
That is, the delay for a speaker's voice to
reach a listener's ear is perceptible to neither
speaker nor listener. However, the delay between
a speaker saying something, and then hearing the
other person's response, is perceptible.
- Baseline Performance with Trending in order
to baseline the network's performance over time,
and to determine any variance in performance due
to network usage, perform clarity and delay.
- Perform assessments with VoIP Equipment. One
may need to assess a network using a particular
VoIP gateway. Generate calls on analog FXO,
analog E&M, T1, E1, and ISDN PRI telephony
interfaces. The same techniques described
previously for testing clarity and delay can be
used.
- Assess Performance Against Background
Traffic. It is valuable to assess the
performance of a VoIP network against a
background of actual traffic. In a converged
network, this would include both voice and data
traffic.
- Test for Echo. Echo is usually the result of
an impedance mismatch on analog two-wire to
four-wire hybrid junctions. Echo impacts
conversational quality with the degree of impact
being proportional to the echo signal's level
and delay. The greater the echo signal level (or
lack of echo return loss), and the greater the
echo signal delay, the greater the impact on
conversational quality. A call originating on a
VoIP network may terminate to a PSTN two wire
analog line. The two-wire to four-wire
conversion will generate an echo.
So, yes, VOIP is the "real deal", but the pathway
to successful implementation is tedious and
detailed. Research has shown that it is very likely
that your data network will not
deliver the call quality you would like. A recent
estimate predicted that 85% of today's router-based
data networks are not ready for toll-quality VoIP
calls.
Those of you paying attention will notice that I
have not mentioned SECURITY. Security will be the
topic of another paper.