IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.

Slides:



Advertisements
Similar presentations
An Approach in Reproducing the Auto-Tune Effect Mentees: Dong-San Choi & Tejas Rawal Mentor: David Jun.
Advertisements

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice. I. Speech production 1. The vocal organs
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
G.S.MOZE COLLEGE OF ENGINNERING BALEWADI,PUNE -45.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Analysis & Synthesis The Vocoder and its related technology.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Representing Acoustic Information
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
IIT Bombay AIM , IIT Bombay, 27 June ’03 1 Online Monitoring of Dissipation Factor Dayashankar Dubey (MTech) Suhas P. Solanki,
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.
[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
A Recognition Model for Speech Coding Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture.
Performance Comparison of Speaker and Emotion Recognition
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
The Human Voice. 1. The vocal organs
Vocoders.
Automated Detection of Speech Landmarks Using
The Human Voice. 1. The vocal organs
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
The Production of Speech
Linear Prediction.
A System for Hybridizing Vocal Performance
Auditory Morphing Weyni Clacken
Presentation transcript:

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 1 ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No (Th.P3.17) Harmonic Plus Noise Model Based Speech Synthesis in Hindi and Pitch Modification By P.K. Lehana P.C. Pandey IIT Bombay, India

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 2 ABSTRACT In harmonic plus noise model (HNM), each segment of speech is modeled as two bands: a lower "harmonic" part represented as amplitudes and phases of the harmonics of a fundamental and an upper noise part using an all-pole filter excited by random white noise, with dynamically varying band boundary. HNM based synthesis can be used for good quality output with relatively small number of parameters and it permits pitch and time scaling without explicit estimation of vocal tract parameters. We have investigated its use for synthesis in Hindi which has aspirated stops and lacks voiced fricatives. It was found that good quality synthesis could be carried out, including those of aspirated stops. The upper band of HNM was needed only for the palatal and alveolar fricatives. Sensitivity of output quality to the errors in glottal closure instants was studied and random perturbations exceeding 4% of the local pitch period resulted in noticeable degradation. Synthesis with pitch scaling showed that the frequency scale of the amplitudes and phases of the harmonics of the original signal needed to be modified by a speaker dependent warping function, obtained by studying the relationship between pitch frequency and formant frequencies for the three cardinal vowels spoken with different pitches.

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 3 OVERVIEW  Introduction  Harmonic plus noise model (HNM)  Methodology  Results  Conclusions

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 4 INTRODUCTION Research Objective Use of harmonic plus noise model (HNM) based pitch synchronous synthesis to study: Speech synthesis with phoneme sets in Hindi (“aspiration” a feature for stops) Effect of perturbations in glottal crossing instants (GCI’s) on speech quality, by using electroglottogram (EGG) for accurate specifications of GCI’s Speaker modification

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 5 HARMONIC PLUS NOISE MODEL (1/3) Harmonic plus Noise Model (HNM) of Speech (Stylianou, 1995; 2001) Speech signal divided into: Harmonic part Noise part Parameters: Max. voiced frequency V/UV & pitch Harm. ampl. & phases Noise parameters

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 6 HARMONIC PLUS NOISE MODEL (2/3) Analysis

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 7 HARMONIC PLUS NOISE MODEL (3/3) Synthesis

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 8 METHODOLOGY (1/3) Synthesis with Hindi Phoneme Sets Material : Recordings of speech and electroglottogram (EGG) for Isolated Vowels Syllables Words Sentences Processing : Analysis/synthesis of recorded material using HNM

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 9 METHODOLOGY (2/3) Effect of Pitch Perturbation on Speech Quality Material : 2-channel recording of vowels for male and female speakers Speech signal EGG from imp. glottograph Processing: – Estimation of pitch periods from speech signal EGG – Analysis of vowels for HNM parameters – Resynthesis, with % perturbation in GCIs – Assessment of quality of resynthesized vowels

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 10 METHODOLOGY (3/3) Spectral Modifications Material : Sustained vowels at different notes by male and female speakers Processing: Study of F0 & formants in cardinal vowels Formant synthesis after interchanging the notes Scaling of HNM parameters by pitch-scaling factor

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 11 RESULTS (1/7) ANALYSIS/SYNTHESIS RESULTS Synthesis with Hindi Phoneme Sets All vowels and VCV natural & intelligible, if synthesized using harmonic part only, except / a∫a / and / asa / which require the noise part also. GCIs obtained from glottal signal (EGG) give better synthesis.

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 12 RESULTS (2/7) Example1: Synthesis of / ata / Recorded Synth. (H) Synth. (H+N)

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 13 RESULTS (3/7) Example2: Synthesis of /a∫a/ Recorded Synth. (H) Synth. (H+N)

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 14 RESULTS (4/7) Effect of Pitch Perturbation, Example: vowel / a / Recorded GCIs from Speech GCIs from EGG Syn. Syn. With  (GCI) < 4%

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 15 RESULTS (5/7) Effect of Pitch Perturbation on Vowel Quality Quality  (GCI) GCI’s from SpeechGCI’s from EGG Acceptable< 4 %< 6 % Noticeable degradation % % Unacceptable> 8 %> 10 %

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 16 RESULTS (6/7) F0 and Formant Relations F1 monotonically increases with F0. Interchanging the notes results in unnatural output -> proper relation between F0 and F’s necessary. Speaker dependent relationship between F0 and formants.

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 17 RESULTS (7/7) Scaling of HNM parameters by speaker dependent scaling factor gives more natural o/p. Quality Scaling factor Natural< 1.5 Degradation Unacceptable> 2.0 Recorded / a / ( male, 117Hz) / a / pitch-scaled by 2.1

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 18 CONCLUSIONS Conclusions  HNM based synthesis provided good quality synthesis in Hindi.  GCI perturbations > 4 % → quality degradation. GCIs from EGG → better output, indicating HNM’s sensitivity to pitch estimation errors.  Modest pitch modification possible with linear frequency scaling of HNM parameters

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 19

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 20 ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No (Th.P3.17) Harmonic Plus Based Speech Hindi and Pitch

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 21 Noise Model Synthesis in Modification

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults Conclusions 22 By P.K. Lehana P.C. Pandey EE Dept, IIT Bombay, India