Download presentation
Published byBeverly Adams Modified over 9 years ago
0
Sharif University of Technology
Speech Coding Basics A Tutorial Mahdi Amiri Supervisor Dr. H. R. Rabiee April 2009 Sharif University of Technology
1
Speech Coding PCM DPCM ADPCM LPC CELP A road map Page 1 of 30
Speech Coding Basics
2
Pulse-code Modulation (PCM)
Basics Digital Representation of an Analog Signal Sampling and Quantization Parameters: Sampling Rate (Samples per Second) Quantization Levels (Bits per Sample) Page 2 of 30 Speech Coding Basics
3
Pulse-code Modulation (PCM)
Why Call it PCM? 4-bit PCM Page 3 of 30 Speech Coding Basics
4
Pulse-code Modulation (PCM)
Bit per Second (bit/s) How to choose proper… Sampling Rate 8 Khz ? Quantization Level 8 bit/sample ? Bit per Second for 8000 Hz 8 bit PCM 64 kbit/s Page 4 of 30 Speech Coding Basics
5
Pulse-code Modulation (PCM)
Sampling Rate Human Hearing Frequency Range 20 Hz to 20 kHz Play with “HearTest” to test your hearing Most people will find that their hearing is most sensitive around 1-4 kHz and that it is less sensitive at high and low frequencies. Page 5 of 30 Speech Coding Basics
6
Pulse-code Modulation (PCM)
Hearing Range Ferret = Persian: Raasoo GERBIL = Persian: Moosh Sahraaiee Hedgehog = Persian: Joojeh Tighi Possom = Like Opossums : Persian: Saarigh Seal = Persian: Fok Porpoise = Persian: Khook Daryaayee (Shabihe dolphin va nahang) Page 6 of 30 Speech Coding Basics
7
Pulse-code Modulation (PCM)
Sampling Rate Human Vocal Range Normal: 80 Hz to 1100 Hz Charles Kellogg (14 KHz) (not verified) Guinness Book of Records Female: Georgia Brown (Eight octaves, 25087Hz) Male: Tim Storms (Six octaves) Georgia Brown's High Notes Georgia Brown incredibly screams the high notes that made her the woman with the largest vocal range on the planet Tim Storms Sings Eight Hertz Tim storms demonstrates his low range. He sings so low you can't even hear it. Page 7 of 30 Speech Coding Basics
8
Pulse-code Modulation (PCM)
Common Sampling Rates 8,000 Hz: Telephone, adequate for human speech 11,025 Hz 22,050 Hz – radio 32,000 Hz - miniDV digital video camcorder, DAT (LP mode) 44,100 Hz - audio CD, also most commonly used with MPEG-1 audio (VCD, SVCD, MP3) 48,000 Hz - digital sound used for miniDV, digital TV, DVD, DAT, films and professional audio 96,000 or 192,000 Hz - DVD-Audio, some LPCM DVD tracks, BD-ROM (Blu-ray Disc) audio tracks, and HD-DVD (High-Definition DVD) audio tracks MHz - SACD, 1-bit sigma-delta modulation process known as Direct Stream Digital, co-developed by Sony and Philips” Page 8 of 30 Speech Coding Basics
9
Pulse-code Modulation (PCM)
Quantization Levels Want to prevent human ear fatigue by minimizing quantization noise Signal-to-Noise Ratio = 6.02B dB SNR is approximately 6 dB per bit. 16-bit => 96 dB Above 36 dB is required Page 9 of 30 Speech Coding Basics
10
Pulse-code Modulation (PCM)
Good to Know The average person cannot tell the difference between a bitrate above 192 kbit/s and the original CD/WAV. Even if your headphones seal really well around your ears, they will probably only give you about 20 to 25 dB insulation from the external sound. Page 10 of 30 Speech Coding Basics
11
Pulse-code Modulation (PCM)
Images Page 11 of 30 Speech Coding Basics
12
Pulse-code Modulation (PCM)
u-law, a-law Nonuniform quantizers: Difficult to make, Expensive. Solution: Companding Uniform Q. Expanding Page 12 of 30 Speech Coding Basics
13
Pulse-code Modulation (PCM)
U-law, A-law Page 13 of 30 Speech Coding Basics
14
Pulse-code Modulation (PCM)
u-law, a-law North America and Japan Europe Page 14 of 30 Speech Coding Basics
15
Differential PCM (DPCM)
Idea Unfortunately, this does not work on analog sources since dn != dn^ , and thus Pn != Pn^. Leads to error accumulation! Page 15 of 30 Speech Coding Basics
16
Differential PCM (DPCM)
Basic Scheme General Predictive Coding Problem? Page 16 of 30 Speech Coding Basics
17
Differential PCM (DPCM)
Better Structure Page 17 of 30 Speech Coding Basics
18
Adaptive DPCM (ADPCM) Idea Problem? Page 18 of 30 Speech Coding Basics
Unfortunately, this does not work on analog sources since dn != dn^ , and thus Pn != Pn^. Leads to error accumulation! Page 18 of 30 Speech Coding Basics
19
Adaptive DPCM (ADPCM) Size of Quantization Step Page 19 of 30
Speech Coding Basics
20
Speech Compression Concepts
Spectrogram, STFT 3D surface spectrogram of a part from a music piece. Page 20 of 30 Speech Coding Basics
21
Speech Compression Concepts
Spectrogram Spectrogram of a male voice saying ‘nineteenth century’. Page 21 of 30 Speech Coding Basics
22
Speech Compression Concepts
Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Face! Singing Voice Page 22 of 30 Speech Coding Basics
23
Speech Compression Concepts
Formant Page 23 of 30 Speech Coding Basics
24
Linear Predictive Coding (LPC)
Modeling Page 24 of 30 Speech Coding Basics
25
Linear Predictive Coding (LPC)
Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue Predictor for each frame: Page 25 of 30 Speech Coding Basics
26
Linear Predictive Coding (LPC)
Modeling (Hiss or Buzz) Page 26 of 30 Speech Coding Basics
27
Code Excited Linear Prediction
CELP Problem of LPC Where there is both Hiss and Buzz Solution Encode residue Method Vector Quantization (Codebook) Page 27 of 30 Speech Coding Basics
28
Comparison Sample Speech
A lathe is a big tool. Grab every dish of sugar. Page 28 of 30 Speech Coding Basics
29
Comparison Demonstration Original ADPCM LPC CELP Page 29 of 30
Speech Coding Basics
30
Thank You Speech Coding Basics A Tutorial
FIND OUT MORE AT... 1. 2. Page 30 of 30 Speech Coding Basics
31
Animated Title Title Abc Page 31 of 30 Speech Coding Basics
32
Definition of Vanishing Percentage (VP)
Title Title Abc Definition of Vanishing Percentage (VP) Page 32 of 20 Speech Coding Basics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.