Sharif University of Technology Speech Coding Basics A Tutorial Mahdi Amiri Supervisor Dr. H. R. Rabiee April 2009 Sharif University of Technology
Speech Coding PCM DPCM ADPCM LPC CELP A road map Page 1 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Basics Digital Representation of an Analog Signal Sampling and Quantization Parameters: Sampling Rate (Samples per Second) Quantization Levels (Bits per Sample) Page 2 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Why Call it PCM? 4-bit PCM Page 3 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Bit per Second (bit/s) How to choose proper… Sampling Rate 8 Khz ? Quantization Level 8 bit/sample ? Bit per Second for 8000 Hz 8 bit PCM 64 kbit/s Page 4 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Sampling Rate Human Hearing Frequency Range 20 Hz to 20 kHz Play with “HearTest” to test your hearing Most people will find that their hearing is most sensitive around 1-4 kHz and that it is less sensitive at high and low frequencies. Page 5 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Hearing Range Ferret = Persian: Raasoo GERBIL = Persian: Moosh Sahraaiee Hedgehog = Persian: Joojeh Tighi Possom = Like Opossums : Persian: Saarigh Seal = Persian: Fok Porpoise = Persian: Khook Daryaayee (Shabihe dolphin va nahang) Page 6 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Sampling Rate Human Vocal Range Normal: 80 Hz to 1100 Hz Charles Kellogg (14 KHz) (not verified) Guinness Book of Records Female: Georgia Brown (Eight octaves, 25087Hz) Male: Tim Storms (Six octaves) Georgia Brown's High Notes Georgia Brown incredibly screams the high notes that made her the woman with the largest vocal range on the planet www.youtube.com/watch?v=P6wSyIdwCFM Tim Storms Sings Eight Hertz Tim storms demonstrates his low range. He sings so low you can't even hear it. www.youtube.com/watch?v=___sG3AJaNc Page 7 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Common Sampling Rates 8,000 Hz: Telephone, adequate for human speech 11,025 Hz 22,050 Hz – radio 32,000 Hz - miniDV digital video camcorder, DAT (LP mode) 44,100 Hz - audio CD, also most commonly used with MPEG-1 audio (VCD, SVCD, MP3) 48,000 Hz - digital sound used for miniDV, digital TV, DVD, DAT, films and professional audio 96,000 or 192,000 Hz - DVD-Audio, some LPCM DVD tracks, BD-ROM (Blu-ray Disc) audio tracks, and HD-DVD (High-Definition DVD) audio tracks 2.8224 MHz - SACD, 1-bit sigma-delta modulation process known as Direct Stream Digital, co-developed by Sony and Philips” Page 8 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Quantization Levels Want to prevent human ear fatigue by minimizing quantization noise Signal-to-Noise Ratio = 6.02B dB SNR is approximately 6 dB per bit. 16-bit => 96 dB Above 36 dB is required Page 9 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Good to Know The average person cannot tell the difference between a bitrate above 192 kbit/s and the original CD/WAV. Even if your headphones seal really well around your ears, they will probably only give you about 20 to 25 dB insulation from the external sound. Page 10 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) Images Page 11 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) u-law, a-law Nonuniform quantizers: Difficult to make, Expensive. Solution: Companding Uniform Q. Expanding Page 12 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) U-law, A-law Page 13 of 30 Speech Coding Basics
Pulse-code Modulation (PCM) u-law, a-law North America and Japan Europe Page 14 of 30 Speech Coding Basics
Differential PCM (DPCM) Idea Unfortunately, this does not work on analog sources since dn != dn^ , and thus Pn != Pn^. Leads to error accumulation! Page 15 of 30 Speech Coding Basics
Differential PCM (DPCM) Basic Scheme General Predictive Coding Problem? Page 16 of 30 Speech Coding Basics
Differential PCM (DPCM) Better Structure Page 17 of 30 Speech Coding Basics
Adaptive DPCM (ADPCM) Idea Problem? Page 18 of 30 Speech Coding Basics Unfortunately, this does not work on analog sources since dn != dn^ , and thus Pn != Pn^. Leads to error accumulation! Page 18 of 30 Speech Coding Basics
Adaptive DPCM (ADPCM) Size of Quantization Step Page 19 of 30 Speech Coding Basics
Speech Compression Concepts Spectrogram, STFT 3D surface spectrogram of a part from a music piece. Page 20 of 30 Speech Coding Basics
Speech Compression Concepts Spectrogram Spectrogram of a male voice saying ‘nineteenth century’. Page 21 of 30 Speech Coding Basics
Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Face! Singing Voice Page 22 of 30 Speech Coding Basics
Speech Compression Concepts Formant Page 23 of 30 Speech Coding Basics
Linear Predictive Coding (LPC) Modeling Page 24 of 30 Speech Coding Basics
Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue Predictor for each frame: Page 25 of 30 Speech Coding Basics
Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Page 26 of 30 Speech Coding Basics
Code Excited Linear Prediction CELP Problem of LPC Where there is both Hiss and Buzz Solution Encode residue Method Vector Quantization (Codebook) Page 27 of 30 Speech Coding Basics
Comparison Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 28 of 30 Speech Coding Basics
Comparison Demonstration Original ADPCM LPC CELP Page 29 of 30 Speech Coding Basics
Thank You Speech Coding Basics A Tutorial FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.aictct.ir/dml/ Page 30 of 30 Speech Coding Basics
Animated Title Title Abc Page 31 of 30 Speech Coding Basics
Definition of Vanishing Percentage (VP) Title Title Abc Definition of Vanishing Percentage (VP) Page 32 of 20 Speech Coding Basics