Squeezing Bandwidth Out of Voice
By Woody Benson
To maximize network performance, corporations have been
employing effective data compression schemes for years. Yet, when it comes to voice
compression, the benefits often are underestimated.
|While bandwidth will be more readily available and more cost-effective,
there always will be a drive to maximize the service available through compression.
In our never-ending quest for better network performance at a cheaper overall cost, the
issue not only is how much bandwidth we can get, but also how effectively we use the
bandwidth we have available. The deployment of broadband networks will enable us to have
virtually unlimited bandwidth access, and the convergence of voice and data networks will
create additional efficiencies, allowing us to use our network resources more flexibly and
While we all have grown to accept the benefits of unlimited bandwidth and the
transmission efficiencies created by sending voice and data traffic over the same network,
voice compression and its effect on bandwidth utilization within this new network
environment have been overlooked. For example, to maximize network performance,
corporations have been employing effective data compression schemes for years. Yet, when
it comes to voice compression, the benefits often are underestimated and the
"rules" have yet to be written.
Voice, as defined by real-time human conversation, is an interactive, on-demand audio
application. Unlike data, which can be transmitted intermittently and reconstructed at the
termination point with less concern for time, voice traffic must be transmitted in real
time and with more care to minimize packet loss and ensure a natural voice experience.
There are many choices for digital voice compression, due to the wide range of
applications and technologies available. Whether you are deploying Internet telephony over
a local area network (LAN) or wide area network (WAN), using a wireless phone or extending
your corporate private branch exchange (PBX) to branch offices or telecommuters over
private data networks or broadband digital subscriber line (DSL) networks, the compression
requirements based on different applications will vary significantly. One thing everyone
seems to agree on is that voice quality cannot be sacrificed in the quest for bandwidth
Digital Voice Technologies
|One approach taken by some vendors is to offer the user a choice,
providing a range of compression options in products.
The true enablers of digital telephony have been the voice coders (or vocoders) and
digital signal processing (DSP) technologies, which have evolved substantially over the
past 10 years. Depending on the application, vocoders now can operate at rates as low as 2
kilobits per second (kbps), providing greater bandwidth efficiency and acceptable quality.
Based on the application and user requirements, vocoders offer trade-offs in terms of bit
rates (which are determined by the degree of compression), complexity (the processing
requirements, or millions of instructions per second [MIPS]) and voice quality (the user
Voice quality is the most difficult aspect of voice transmission to measure. While we
can evaluate some quantitative features of voice transmission through mean opinion score
(MOS) and rate the quality of different speech codecs, audio quality still is based on the
When selecting a voice compression scheme, vendors and users must consider which
attributes are most important in the application, and which features may be sacrificed.
The application and the network environment will dictate the trade-offs available between
bandwidth, memory/MIPS and, most importantly, the voice quality required. When considering
voice quality, users and vendors must take into account the speaker, the environment,
network quality and fidelity required.
The two major compression technologies include waveform coders and model-based coders,
each with their own strengths. Waveform coders typically minimize distortion and provide a
more natural speech experience, but provide lower compression rates, usually more than
16kbps. Model-based speech vocoders use a parametric model to approximate small segments
of speech, and as a result, they can gain much greater compression rates, typically
operating at less than 8kbps data rates. Both have advantages for given applications, and
each continues to improve. One aspect they share is that, as voice coders get more
sophisticated, more processing power will be needed.
DSP technology is rapidly advancing, with industry leaders such as Dallas-based Texas
Instruments Inc.; Analog Devices Inc., Norwood, Mass.; Motorola Inc., Schaumberg, Ill.;
Lucent Technologies Inc., Murray Hill, N.J.; and others providing multichannel capacity,
faster and more efficient processors and lower power requirements–all on less-expensive
chips. These chips now are powerful enough to drive some of the features that will improve
the user experience, such as echo cancelers, jitter buffering, filtering, equalization and
automatic gain control, which are necessary to duplicate the circuit-voice experience.
There is a range of compression standards, including pulse-code modulation (PCM) at
64kbps, adaptive differential PCM (ADPCM) at 32kbps, G.729 at 8kbps and G.723 at 5.3kbps
to 6.3kbps. These can compress speech down to as little as one-fortieth its original size,
or provide no compression at all, as in 64kbps G.711 (packetization only). Applications
such as videoconferencing or PC voice applications require the highest degrees of
compression, and the International Telecommunications Union (ITU) has sanctioned a G.723.1
speech vocoder as the videoconferencing standard for today’s network environment. As
broadband networks proliferate, less-compressed schemes may be sanctioned.
Which Is Best?
So how do vendors and users decide which voice compression scheme is best for a given
application? The goal should be to match the voice performance of the packet network to
what users are accustomed to in today’s circuit-switched network. And, for business users,
that includes the features and functionality of the PBX. Quality is paramount, and a
digital voice call must deliver the same voice quality, without latency and without echo,
that we experience today with a circuit-switched voice call. Additionally, features are
important. It’s not just that we can place a call over a packet network; it’s also how
seamlessly we extend today’s voice features over that network. So even though we can
compress voice packets down to 2.8kbps, we may introduce so many artifacts that the
trade-offs are not acceptable, or strip out so many features that we are not creating a
One approach taken by some vendors is to offer the user a choice, providing a range of
compression options in products. Although there are standard voice metrics, voice quality
is subjective. What is acceptable to one person may not be acceptable to another.
Furthermore, the environment and the type of phone usage have an impact on the perceived
quality. A customer service representative (CSR) who is on the phone all day in a noisy
call center might require uncompressed 64kbps voice, or certainly minimally compressed
voice, such as ADPCM 32kbps or 24kbps. An accountant or engineer, whose job is much less
phone-centric and who works in a quiet environment, may find that highly compressed
voice–8kbps G.729A, for example–is perfectly acceptable. Users who access many dual tone
multifrequency (DTMF) or digitized applications may find that the highly compressed G.729A
standard is not an acceptable scheme, since it was designed for live voice and does not
handle predigitized recordings as well.
Therefore, the ability to let the user decide, even down to a port-by-port or
call-by-call basis, is a very valuable feature.
Additionally, the ability to monitor the network and allocate bandwidth to compensate
for network speed variations could be provided.
With one pipe for voice and data, an interesting approach is dynamically selecting the
compression scheme to meet bandwidth requirements or properly utilize available bandwidth.
Take the case of a fractional T1 connection from a corporate location to a branch office,
providing both voice and data connectivity to the corporate LAN and PBX systems. When no
one is on the phone, all bandwidth is allocated to data. When the first user makes a call,
there most likely is sufficient bandwidth to provide minimal voice compression, say 32kbps
ADPCM. When the second or third user picks up the phone, he or she may be given 8kbps
G.729A compression, since there is less bandwidth available. By offering the option of
dynamic compression, each user will get the optimal experience at any given time, making
the best use of available bandwidth.
As broadband network availability becomes widespread through the implementation of DSL,
asynchronous transfer mode (ATM) and cable service, there will be an increase in digital
voice applications. And, while bandwidth will be more readily available and more
cost-effective, there always will be a drive to maximize the service available through
compression. For digital voice applications to move into the mainstream, we will need to
see quality of service (QoS) issues addressed, along with high-quality compression
capabilities. That’s why the earliest successful providers of integrated voice and data
services will be partnering closely with vendors that develop customized applications in
managed networks. As network realities are addressed through packet prioritization,
framing and other QoS solutions, there is a huge opportunity for service providers to roll
out value-added voice applications to hundreds of millions of business users.
Compression will play a major role in this explosion, and there is no right or wrong,
better or best approach. At least for today, it should be a matter of application and user
choice. Service providers should keep their eyes (and ears) focused on the opportunities
and technologies ahead.
|Woody Benson is president and CEO of MCK Communications Inc., Newton, Mass., a
company that designs, manufactures and markets hardware and software products that extend,
monitor and terminate voice traffic associated with corporate PBX systems. He can be
reached at +1 617 454 6100.