Download presentation
Presentation is loading. Please wait.
Published byArabella Morrison Modified over 6 years ago
1
Audio Henning Schulzrinne Dept. of Computer Science
Columbia University Fall 2003
2
Common narrowband audio codecs
rate (kb/s) delay (ms) multi-rate em-bedded VBR bit-robust/ PLC remarks iLBC 15.2 13.3 20 30 --/X quality higher than G.729A no licensing Speex X AMR-NB X/X 3G wireless G.729 8 15 TDMA wireless GSM-FR 13 GSM wireless (Cingular) GSM-EFR 12.2 2.5G G.728 16 12.8 2.5 H.320 (ISDN videconferencing) G.723.1 X/-- H.323, videoconferences
3
Common wideband audio codecs
rate (kb/s) delay (ms) multi-rate em-bedded VBR bit-robust/ PLC remarks Speex 4—44.4 34 X --/X no licensing AMR-WB 6.6—23.85 20 X/X 3G wireless G.722 48, 56, 64 0.125 (1.5) X/-- 2 sub-bands now dated
4
iLBC – MOS behavior with packet loss
5
Recent audio codecs iLBC: optimized for high packet loss rates (frames encoded independently) AMR-NB 3G wireless codec kb/s 20 ms coding delay
6
Speex Open-source patent-free speech codec
CELP (code-excited linear prediction) codec operating modes: narrowband (8 kHz sampling rate) 2.15 – 24.6 kb/s delay of 30 ms wideband (16 kHz sampling rate) kb/s delay of 34 ms ultra-wideband (32 kHz sampling rate) intensity stereo encoding variable bit rate (VBR) possible voice activity detection (VAD)
7
Ogg Vorbis Similar in application to AAC, MP3, VQF, …, but claims to be free of patents Ogg = container format file (also for Speex, FLAC) Vorbis = music speech codec near CD quality = 160 kb/s forward-adaptive modified DCT (discrete cosine transform) overlapping windows floor: carries frequency representation as piecewise linear interpolated representation on a dB amplitude scale and linear frequency scale residue: subtract out floor cascaded (multi-pass) vector quantization entropy (Huffman) coding carries codec parameters in header
8
Sound localization Human ear uses 3 metrics for stereo localization:
intensity time of arrival (TOA) – 7 µs direction filtering and spectral shaping by outer ear For shorter wavelengths (4 – 20 kHz), head casts an acoustical shadow giving rise to a lower sound level at the ear farthest from the sound sources At long wavelength (20 Hz - 1 KHz) the, head is very small compared to wavelengths In this case localization is based on perceived Interaural Time Differences (ITD) UCSC CMPE250 Fall 2002
9
Audio samples http://www.cs.columbia.edu/~hgs/audio/codecs.html
Speex: both narrowband and wideband
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.