ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
1 Chapter 3 Digital Communication Fundamentals for Cognitive Radio Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski,
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Jeopardy Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 16Q 11Q 21 Q 7Q 12 Q 17 Q 22 Q 8 Q 13 Q 18 Q 23 Q 9Q 14Q 19Q 24 Q 10Q 15 Q 20Q 25 Final Jeopardy Waves Terms People.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
ZMQS ZMQS
ABC Technology Project
High Frequency Distortion in Power Grids due to Electronic Equipment Anders Larsson Luleå University of Technology.
VOORBLAD.
VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,
Squares and Square Root WALK. Solve each problem REVIEW:
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
S Transmission Methods in Telecommunication Systems (5 cr)
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
How Cells Obtain Energy from Food
Math Review with Matlab:
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Analysis & Synthesis The Vocoder and its related technology.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Structure of Spoken Language
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
(Extremely) Simplified Model of Speech Production
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Codec 2 ● open source speech codec ● low bit rate (2400 bit/s and below) ● applications include digital speech for HF and VHF radio ● fills gap in open.
Motivation ● The (Ham) world needs an open source, patent free speech codec at bit rates of less than 5000 bit/s ● I know how to build one!
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Speech and Audio Processing
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Presentation transcript:

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering, Indian Institute of Technology, Bombay India

Department of Electrical Engineering, IIT Bombay 2 The MBE Speech Model (Griffin & Lim, 1988) X MBE modeling Original Modeled

Department of Electrical Engineering, IIT Bombay 3 Frame-based analysis Within the window, assume: a constant–amplitude, constant- frequency sinusoidal model

Department of Electrical Engineering, IIT Bombay 4 MBE Speech Model Parameters Pitch Harmonic amplitudes Band-wise voicing decisions Parameter Estimation Windowed speech (Phase is predicted for smoothness)

Department of Electrical Engineering, IIT Bombay 5 MBE Analysis: Parameter Estimation Pitch and Spectral Amplitudes : Analysis-by-synthesis matching of a predicted harmonic spectrum with the actual signal spectrum. Voicing decision per frequency band (3 harmonics): Based on the error between the actual and predicted spectra.

Department of Electrical Engineering, IIT Bombay 6 MBE Analysis: Spectral Matching Voicing thresholds are frame- adapted as determined by experimental tuning.

Department of Electrical Engineering, IIT Bombay 7 MBE Synthesis Voiced amplitudes White noise Unvoiced amplitudes Reconstructed speech Bank of Harmonic Oscillators Pitch Voiced speech Voiced speech synthesis Unvoiced speech Linear Interpolation STFT Replace Envelope Weighted Overlap-Add Unvoiced speech synthesis Voiced speech Unvoiced speech

Department of Electrical Engineering, IIT Bombay 8 The efficient quantisation of MBE parameters has led to: IMBE 4.15 kbps DVSI MBE >2 kbps LR MBE 1.5 kbps Research groups: (Univ. Surrey, UCSB, Sony kbps to 3 kbps Narrowband Speech Coding with MBE modeled reference

Department of Electrical Engineering, IIT Bombay 9 Related Models: Speech Synthesis Harmonics+Noise Model (HNM): Stylianou Harmonic/Stochastic Model (H/S): Dutoit,1996 Emphasis is on natural sounding wideband speech and easy prosody modification. Both use essentially the Griffin & Lim MBE analysis. Important differences: Analysis and synthesis are pitch synchronous Estimated harmonic phases are utilised in synthesis

Department of Electrical Engineering, IIT Bombay 10 MBE Model: Limitations The codec speech quality does not improve with increasing bit rate => the model has its limitations Assumption of frame-level quasi-stationarity: enables the accurate representation only of vowels unvoiced and voiced fricatives (not plosives, onsets,…)

Department of Electrical Engineering, IIT Bombay 11 dark sharp Glottal pulse shape variation (brightness, vocal effort) Pitch cycle variations: Jitter / shimmer (roughness / harshness) Frication and aspiration (friction, breathiness) T2Tm T1 + Glottal pulse Vocal tract response Speech signal Steady Sounds: Voice Quality

Department of Electrical Engineering, IIT Bombay 12 Role of Model Excitation Parameters The glottal spectral shape (glottal waveform shape) can be captured by the spectral envelope parameters. But the perceptual effects of vocal cord vibration aperiodicities aspiration / frication noise must be reproduced (if at all) by the MB excitation.

Department of Electrical Engineering, IIT Bombay 13 Effect of Aperiodicities on MBE Parameters Voice source aperiodicities distort the harmonic spectrum (esp. if the frame contains several pitch cycles). Modulation (jitter-shimmer) aperiodicities => smearing of harmonic lobe structure; noise and subharmonics may be introduced. Aspiration noise => additive noise in harmonic regions

Department of Electrical Engineering, IIT Bombay 14 MBE Analysis: Aperiodic Vowel Increase in the analysis spectrum matching error => MBE synthesis of UV (random noise) frequency bands

Department of Electrical Engineering, IIT Bombay 15 Previous: On Multi-band Excitation Fujimura, 1968: A crude approximation of aperiodicity observed in natural speech can be made by distributing patches of random noise signals in the time-frequency space of the speech signal. Makhoul, 1978: Spectral devoicing due to vocal cord vibration irregularities is an artifact of the spectral estimation, and it may not be appropriate to use a noise source for the synthesis… Griffin and Lim, 1988: Justify MBE model by quoting Fujimura, and also their own observations with speech in noise.

Department of Electrical Engineering, IIT Bombay 16 Synthetic Vowel : Modulation Aperiodicities

Department of Electrical Engineering, IIT Bombay 17 Synthetic Vowel : Modulation Aperiodicities HIGH JITTER HIGH SHIMMER 80 Hz 160 Hz 250 Hz Periodic ref:

Department of Electrical Engineering, IIT Bombay 18 Fujimura-type Experiment Highly jittered vowel / ɑ / Reference MBE (note unfused noise) MBE-modeled with forced decisions

Department of Electrical Engineering, IIT Bombay 19 Experiments with Natural Speech Goal: to study the MBE representation of Unvoiced and voiced fricatives Breathy voice Rough and hoarse voices Speech in noisy background To understand the implications of simplifying the excitation to single-band (SBE) or two-band excitation (TBE)

Department of Electrical Engineering, IIT Bombay 20 VCV: /ɑzɑ/ Reference MBE-Modeled SBE modeled

Department of Electrical Engineering, IIT Bombay 21 VCV: /ɑƷɑ/ Reference MBE-modeled

Department of Electrical Engineering, IIT Bombay 22 Voice quality: Breathy MBE-modeled TBE-modeled (buzzy) Reference

Department of Electrical Engineering, IIT Bombay 23 Voice Quality: Harsh MBE-modeled TBE-modeled Reference

Department of Electrical Engineering, IIT Bombay 24 Voice Quality: Rough MBE-modeled Reference

Department of Electrical Engineering, IIT Bombay 25 Noise Corrupted Speech (15 dB SNR) Reference MBE-modeled TBE-modeled (buzzy)

Department of Electrical Engineering, IIT Bombay 26 Conclusions MB excitation represents frication and aspiration accurately; esp. crucial for noisy speech. Modulation aperiodicities are not captured at high pitches except through devoiced bands. Depending on the setting of thresholds, the noise bands may not fuse perceptually. It is possible to simulate partially the perceptual effects of jitter/shimmer by the controlled devoicing of bands in the t-f space.

Thank you