What’s new in Wideband Audio?
Wideband Audio VoIP is indeed a disruptive technology, but has it changed the life of the average consumer? Cost? Quality? Features? Wideband Audio codecs and improved handling of music could soon change this dynamic Let’s discuss Technology behind the codecs Real-world implementations
Telecom Audio Spectrum Human voice: 80 Hz to 14,00Hz Narrowband: 8 kHz sampling (300-3400 Hz bandwidth) Used in PSTN, mostly intelligible Wideband: 16 kHz sampling (50-7000 Hz bandwidth) Used in VoIP
Wideband Audio? Captures significantly more speech information Significant improvement in speech quality over traditional PSTN Improved naturalness & presence below 200Hz Increased intelligibility above 3,400Hz Improves user experience and satisfaction New applications – voice recognition Customer retention Fewer misunderstandings
Wideband Enablers Telecom was about minimizing transport cost Now about differentiation and enhancing the user experience Access bandwidth was limited Broadband access now a reality: high bandwidth delivered at low cost 1 - 10 Mbits/s Cost of WB is similar to NB @ 64kbps Endpoints and Network were not wideband capable Now: VoIP, Wideband DECT, Skype, Microsoft OCS Wireless deployments: wideband, music codecs Private / corporate networks, Tandem Free Operation (TFO), Wideband extension, Wideband SLICS
The Technology
Lossy Codec Classes Speech communication codecs (G.72X, AMR et.al) Designed for “real-time” speech, music handled poorly Low sampling rate (8-16KHz), low fidelity Low-medium delay (10-30 ms) Mostly time-domain (CELP is the most popular) Music codecs (MP3, AAC, Vorbis) Can encode any signal (not optimal for speech) – designed for entertainment Up to 48 kHz sampling rate (full bandwidth), high fidelity (“CD-quality” High delay (>100 ms) Mostly frequency domain (MDCT-based)
Speech Codec Spectrum Applications Deployed Bandwidth Example Codec More than 15Khz Full Band (20Khz) AAC-LD Presence (Video Conf) 14Khz Super Wideband G.722.1C (Siren14), SILK VoIP, Audio Conf 7Khz Wideband G.722.2 (AMR-WB), SVOPC BB VoIP & Audio Chat 3.5Khz Narrowband G.729, G.723.1 G.711, iSAC PSTN &VoIP
ITU and 3GPP codec roadmap Super -wideband EV-VBR 2008 G.722.2 AMR-WB 2002 G.729.1 2007 G.722 1988 G.722.1 1999 wideband AMR-NB 1999 GSM-FR 1987 G.728 1992 GSM-HR 1994 GSM-EFR 1995 G.726 1984 narrowband G.729 1995 Years ITU 3GPP 3GPP & ITU Legend:
Embedded Speech Codecs ITU-Super WB Provides extended bandwidth and stereo capabilities 16 KHz audible bandwidth Stereo extension Generic extension applicable to wideband codecs e.g.. ITU G.729.1 & EV-VBR 3GPP-EPS (evolved packet system) (aka LTE) ITU EV-VBR is well positioned to meet future EPS requirements Interoperable with 3GPP AMR-WB. Open Codecs Speex (4 to 42Kbps) Royalty free but limited to non patented techniques (ACELP for example)
Music Codecs MPEG-1 Layer III (aka MP3) AAC Vorbis Built on top of Layers I and II First-generation, very inefficient AAC Second generation, much better than MP3 Flexible, kitchen-sink type of approach Tons of tools and partially incompatible profiles Variants: AAC-LC, AAC-LD, AAC-HE, ... Vorbis Second-generation, similar quality to AAC Open-source, royalty-free (Xiph.Org Foundation)
Future of codecs Improving quality Reducing delay Super-wideband, coding of music The gap between speech and music codecs is closing AMR-WB+, G.722.1x moving to music, higher quality AAC-LD moving to lower delay Reducing delay Increasing robustness Shift from bit-error robustness to packet loss robustness
Improved Music Handling Background music is poorly handled Most speech codecs (AMR-NB, G.729, AMR-WB, Speex etc) are derivatives based on CELP CELP makes assumptions that are only valid for speech (and single-note music) CELP does not perform well on music – especially at low bit-rate Music codecs are not suitable for speech
Improved Music Handling How do you improve the handling of background music? Three strategies: Increase the bit-rate Dual-mode codecs (e.g. AMR-WB+) Use non-CELP codecs (AAC-LD, G.722.1x, G.711.1, CELT, …)
Wideband Extension (WEx) as an interim solution How do you provide a wideband experience when linking a wideband-capable client to the PSTN? Current solution: up-sample the narrowband speech to 16 kHz Better solution: Create wideband “artificially” from the narrowband speech Support becoming available WEx capable handsets (Philips for example) WEx enabled Media Gateway (Vocallo for example)
a.k.a The Role of the Media Gateway The Implementations a.k.a The Role of the Media Gateway
Wideband VoIP DECT - France Telecom Mobile Platform IAD Access Platform IP Network TDM Network IMS GW DLC Access Platform IAD
Wide Band Extension (WBE) Mobile Platform Wide Band Extension Expand the signal to create impression of wideband. AEC ANR NLE IP Network WBE IMS GW LEC IP/DLC TDM Network Access Platform DLC IAD
Improving the User Experience AEC ANR NLE Wideband Lite Acoustic Echo Canceller acts as a complement to badly designed handset Wideband Adaptive Noise Reduction reduces noise of mobile handset environment. Wideband Natural Level Enhancement, uses info from intensity of the voice and SNR to compensate for loud environment of the talker Mobile Platform IP Network IMS GW IP/DLC TDM Network DLC IAD Access Platform
The role of the MGW When selecting MGW solutions: Don’t just look for checklist of codecs! Look for solutions that provide wideband extension, wideband ECAN, ANR, etc. Select solutions that incur low latency when transcoding IP-to-IP communications
Summary Clear benefit to the users Skype changed expectation levels Technology enablers already in place VoIP deployment CODECS WB-enabled end-points and MGWs available