A Study on Scalable CELP

Slides:



Advertisements
Similar presentations
LPC10 2.4kbps federal standard in speech coding
Advertisements

Transform-domain Wyner-Ziv Codec for Video 教師 : 楊士萱 老師 學生 : 李桐照 同學.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, ICT '09. TAREK OUNI WALID AYEDI MOHAMED ABID NATIONAL ENGINEERING SCHOOL OF SFAX New Low Complexity.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
電腦的基本單位 類比訊號 (analog signal) 指的是連續的訊號
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
CS :: Fall 2003 MPEG-1 Video (Part 1) Ketan Mayer-Patel.
國小語文科教材教法 主講者:義守大學通識教育中心 副教授 鄭瓊月. 壹、國小國語科課程標準與 語文教學的任務 一、國小語文科課程的演進 二、新頒課程標準的內容 三、國小語文教學的任務.
電腦的基本單位 類比訊號 (analog signal) 指的是連續的訊號
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Speech coding. What’s the need for speech coding ? Necessary in order to represent human speech in a digital form Applications: mobile/telephone communication,
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
 Coding efficiency/Compression ratio:  The loss of information or distortion measure:
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
1 Lossless DNA Microarray Image Compression Source: Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, Vol. 2, Nov. 2003, pp
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
Codec structuretMyn1 Codec structure In an MPEG system, the DCT and motion- compensated interframe prediction are combined. The coder subtracts the motion-compensated.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
H.264/AVC 基於影像複雜度與提早結束之快速 階層運動估計方法 Content-Based Hierarchical Fast Motion Estimation with Early Termination in H.264/AVC 研究生:何銘哲 指導教授:蔣依吾博士 中山大學資訊工程學系.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
PCM & DPCM & DM.
研 究 生:吳濟廷 指導教授:高永安 口試日期: 長庚大學電機所 無線通訊實驗室
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
C.K. Kim, D.Y. Suh, J. Park, B. Jeon ha 強壯 !. DVC bitstream reorganiser.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
(B1) What are the advantages and disadvantages of digital TV systems? Hint: Consider factors on noise, data security, VOD etc. 1.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
MP3 and MP4 Audio By: Krunal Tailor
Digital Communications Chapter 13. Source Coding
Vocoders.
Discrete Cosine Transform
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
The Report of Monographic Study
Mohamed Chibani, Roch Lefebvre and Philippe Gournay
ENEE 631 Project Video Codec and Shot Segmentation
Linear Predictive Coding Methods
Vocoders.
PCM & DPCM & DM.
Speech coding.
Linear Prediction.
Govt. Polytechnic Dhangar(Fatehabad)
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

A Study on Scalable CELP 研究生 :鄒昇龍 指導教授:尤信程 博士 2018/12/3

Outline LPC and The Fundamentals of CELP(Code-Excited Linear Predictive) Scalable CELP The Proposed Scalable Structure Experiment Results Conclusion

LPC (Linear Predictive Coding) LPC is an important technique in speech coding. Why? Speech signal is highly correlated in time domain.

LPC Analysis Auto-correlation method Levinson-Durbin’s recursive algorithm Prediction order vs. prediction gain

Prediction order vs. prediction gain

Pitch Prediction (1) The prediction error includes some periodical signal.

Pitch Prediction (2) We can use a pitch predictor to find pitch delay t. The error of pitch predictor is random noise. There are some approaches to encode the random noise. CELP SELP(Self-Excitation Linear Prediction) MPLPC(Multi-Pulse LPC) RPLPC(Regular-Pulse LPC)

Fundamental of CELP The concept of CELP is to encode the random noise with codebook. AbS(Analysis by Synthesis) coding and PW(Perceptual Weighting) filter are two important procedures in CELP.

AbS Coding Adjust the parameters to minimize the error signal. The main disadvantage of AbS Coding is much more computational.

Perceptual Weighting Filter (1) Perceptual masking effect: When the signal energy is greater than the noise energy, we are not sensitive to the noise. When the signal energy is less than the noise energy, we are sensitive to the noise. PW filter is set according to

Perceptual Weighting Filter (2)

Perceptual Weighting Filter (3)

Combination of LPC and PW Because the denominator of LP Synthesis filter is equal to the numerator of PW filter, we can combine LP Synthesis filter with PW filter for computation reduction.

Steps of CELP (1) Calculate LPC coefficients Determine the prediction error(LP Synthesis). Search the pitch delay. Determine the prediction error(Pitch Predictor). Search the optimum code vector by AbS iteration. Pack all the parameters and send them out.

Scalable CELP (1)

Scalable CELP (2)

Scalable CELP (3) Bitrate Scalable Coder consist of Core Coder and Bitrate Scalable Tool. The disadvantage of Bitrate Scalable Coder is that it is fixed in decoding sequence.

Scalable CELP (4)

Performance of Scalable CELP (1) The quality of different core bitrate: High core bitrate(8k bps) Low core bitrate(3850 bps) Why the quality of low core bitrate is bad? Error signal Pulse position

Performance of Scalable CELP (2)

Performance of Scalable CELP (3)

Experiment of Compensation Which area needs compensation? Area with larger error The start of a syllable What is a syllable? A syllable consists of a consonant and a vowel. The detection of a syllable is important to our proposed scalability structure.

The Proposed Scalable Structure (1) We define a unit, block, to detect syllable and the length of a block is 1600 samples(0.2 sec @ 8kHz). We set the length of a compensation area be 200 samples. There are 400 bits available in each block. Fixed or variable bitrate?

The Proposed Scalable Structure (2) There are 4 procedures in our approach. Error Buffer Syllable Detection and Classification Transform and Quantization Source Coding

Error Buffer Because the block length and frame length are different, we need a buffer to store the error signal and the source signal. When 5 frame(source signal and error signal) are collected, then we do Syllable Detection.

Syllable Detection and Classification (1) block Source ~ 1 2 3 4 5 6 7 8 9 10 38 39 40 40 samples ~ Error 1 2 3 4 5 6 7 8 9 10 38 39 40 Energy ratio ~ 1 2 3 4 5 6 40 ~ > 0.1 … Slope 1 2 3 4 40 1 2

Syllable Detection and Classification (3) compensation area block n block n+1 block n block n+1

Syllable Detection and Classification (4)

Transform and Quantization Concentration vs. Uniform

How to decision? Calculate the DCT coefficients. Divide the coefficients into 10 segments and calculate the energy for each segment. If the result of energy sum of any 4 segments divided by total energy > 0.8, we say that the area is concentrative, else uniform.

Quantization (1) There are 3 situations may occur. When 1 or 2 occurs All compensation areas are concentrative All compensation areas are uniform Both concentrative and uniform exist simultaneously. When 1 or 2 occurs Quantization step size is Sum all the coefficients and divide by 240.

Quantization (2) When 3 occur concentration concentration uniform Quantization step size is Sum all the coefficients and divide by 180. Quantization step size is Sum all the coefficients and divide by 60.

sign bit + (5 bits magnitude) Source Coding (1) DCT coef. Symbol Format Length of Symbol Total bits 0、±1 00 00 + (2 bits length) 01 + (4 bits length) 10 + (8 bits length) 11 + (16 bits length) 1 ~ 4 5 ~ 20 21 ~ 276 277 ~ 65812 2 + 2 + 2n+1 ±2 01 sign bit + (1 bit length) 1 ~ 2 4 ±3 10 ±4 ~ 35 11 sign bit + (5 bits magnitude) 1 8

Source Coding (2) 00 01 1011 00000000000 -2 -2 01 1 1 11 symbols

Bitrate Control We adjust the step size to control the number of bits. Multiply or divide by 0.98 to change the step size.

Bitstream Formatting (1) We must record some information in header. The number of compensation areas Quantization step size The start position of compensation area The range of compensation area

Bitstream Formatting (2) Number of compensation area, n, 2 bits What kind of the step size be selected every area, n bits Step size, 8 or 16 bits The start of every compensation area, 12 or 10 or 6 bits Range of every compensation area, 3 or 6 or 9 bits header

Bitstream Formatting (3) 補償區域之個數 起始點排列之總數 編碼之位元數 3 4060 12 2 595 10 1 40 6

Coding Delay Encoding delay: 5 frame + look ahead Decoding delay: 5 frame Applications: Broadcast Recorder

How to obtain variable bitrate in ISO CELP? (1)

How to obtain variable bitrate in ISO CELP? (2) There are n frames must be adjusted. There are 5 frames in a block. Turn on [ceil(5/n)] layers. 待修正frame個數 啟動bitrate-enhancement bitstream的layer數 2 3 4 5 1

Experiment Result (1) We use CMOS(Comparison Mean Opinion Score) to test the performance of our approach. Ref: The source signal A: Our approach B: Compared Target There are 15 persons helping us in the experiment.

Experiment Result (2) 語音編號 語音名稱 描述 1 Spfc 中文、女聲 2 Spfe 英文、女聲 3 Spff 法文、女聲 4 Spfg 德文、女聲 5 Spfj 日文、女聲 6 Spmc 中文、男聲 7 Spme 英文、男聲 8 Spmf 法文、男聲 9 Spmg 德文、男聲 10 Spmj 日文、男聲

Experiment Result (3) 比較結果 分數 A is better than B +1 A is the same as B A is worse than B -1

Experiment Result (4) 實驗名稱 A的描述 B的描述 實驗一 新方法,core為3850 bps variable bitrate 實驗二 high bitrate,6300 bps 實驗三 high bitrate,8300 bps 實驗四 新方法,core為6300 bps 實驗五 實驗六 新方法,core為8300 bps

A : Our approach, core bitrate = 3850 bps B : Variable bitrate @ core bitrate = 3850 bps 語音編號 Variable enh-bitrate 評+1的個數 評0的個數 評-1的個數 平均分數 1 2411 11 4 0.73 2 2437 13 0.87 3 2422 9 0.47 2407 6 8 0.33 5 2400 12 0.80 2433 7 2415 10 0.60 2410 0.67

A : Our approach, core bitrate = 3850 bps B : High bitrate, 6300 bps 語音編號 評+1的個數 評0的個數 評-1的個數 平均分數 1 4 5 6 -0.13 2 0.13 3 -0.07 7 0.33 8 -0.33 9 -0.27 10 0.20

A : Our approach, core bitrate = 3850 bps B : High bitrate, 8300 bps 語音編號 評+1的個數 評0的個數 評-1的個數 平均分數 1 2 6 7 -0.33 9 5 -0.27 3 10 -0.53 4 8 -0.13 -0.40

A : Our approach, core bitrate = 6300 bps B : Variable bitrate @ core bitrate = 6300 bps 語音編號 Variable enh-bitrate 評+1的個數 評0的個數 評-1的個數 平均分數 1 2494 2 8 5 -0.20 2403 6 4 0.07 3 2465 7 0.33 2460 9 -0.27 2408 0.20 2427 -0.07 2393 2402 2415 10 2419

A : Our approach, core bitrate = 6300 bps B : High bitrate, 8300 bps 語音編號 評+1的個數 評0的個數 評-1的個數 平均分數 1 4 10 -0.60 2 3 12 -0.80 9 -0.33 5 8 7 -0.47 6 -0.53 -0.13

A : Our approach, core bitrate = 8300 bps B : Variable bitrate @ core bitrate = 8300 bps 語音編號 Variable enh-bitrate 評+1的個數 評0的個數 評-1的個數 平均分數 1 2468 5 10 -0.67 2 2438 9 -0.53 3 2444 6 -0.60 4 2437 2410 8 -0.40 2475 7 -0.33 2443 2416 11 2470 -0.47 2400

Conclusion Our approach is effective in low bitrate situation. The limitation of our approach is approximate 6k bps(core bitrate). Our approach also useful with other CELP standard.