IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

Slides:



Advertisements
Similar presentations
ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
Advertisements

Acoustic/Prosodic Features
Tom Lentz (slides Ivana Brasileiro)
Sounds that “move” Diphthongs, glides and liquids.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Advanced Speech Enhancement in Noisy Environments
Speech Science XII Speech Perception (acoustic cues) Version
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.
Speech perception Relating features of hearing to the perception of speech.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
LE 460 L Acoustics and Experimental Phonetics L-13
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Topics covered in this chapter
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
Speech Science VII Acoustic Structure of Speech Sounds WS
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
國立交通大學 電信工程研究所 National Chiao Tung University Institute of Communication Engineering 1 Phone Boundary Detection using Sample-based Acoustic Parameters.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Performance Comparison of Speaker and Emotion Recognition
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
Speech Perception.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Acoustic Phonetics 3/14/00.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Stop/Plosives.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
Automated Detection of Speech Landmarks Using
Speech Perception.
Speech Perception (acoustic cues)
Presenter: Shih-Hsiang(士翔)
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Presentation transcript:

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments A. R. Jayan P. S. Rajath Bhat P. C. Pandey {arjayan, rajathbhat, EE Dept, IIT Bombay 30 th January, 2011

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 2/21 PRESENTATION OUTLINE 1. Introduction  Speech landmarks  Landmark detection  Clear speech  Automated speech intelligibility enhancement 2. Methodology  Band energy parameters  Spectral moments  Rate of change function 3. Evaluation and results  VCV utterances  Sentences 4. Conclusion

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 3/21 1. INTRODUCTION Speech landmarks Regions, associated with spectral transitions, containing important information for speech perception Landmarks and related events [Park, 2008] Segment typeLandmarkDescription VowelVowel (V)Vowel nucleus GlideGlide (G)Slow formant transitions Consonant Glottis (g) Sonorant (s) Burst (b) Vocal fold vibration Nasal closure / release Turbulence noise

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 4/21 Landmark detection Processing  Extraction of parameters characterizing the landmark  Computation of the rate of change (ROC) of parameters  Locating the landmark using ROC(s) Applications  Intelligibility enhancement  Speech recognition  Vocal tract shape estimation

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 5/21 Clear speech  Speech produced with clear articulation when talking to a hearing- impaired listener, or in a noisy environment More intelligible for ▪ Hearing impaired listeners (~17% higher, Picheny et al.,1985) ▪ Listeners in noisy environments (Payton et al., 1994) ▪ Non-native listeners (Bradlow and Bent, 2002) ▪ Children with learning disabilities (Bradlow et al., 2003)  Pronounced acoustic landmarks

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 6/21 Conv. Clear Example: ‘The book tells a story’ (Recordings from

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 7/21 Automated speech intelligibility enhancement Automated detection of landmarks  High detection rate with low false detections  Good temporal accuracy (5-10 ms)  Computational efficiency Modification of speech characteristics Intensity / duration / spectral modifications around landmarks with minimal perceptual distortions of the acoustic cues in the speech signal

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 8/21 Problems in stop consonant perception  Transient sound with low intensity  Severely affected by noise / hearing impairment Stop landmarks :  Closure  Burst onset  Onset of voicing Example: /apa/

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 9/21 Some of the earlier landmark detection techniques  Liu (1996): Rate-of-rise measures of parameters from a set of fixed spectral bands (Speech recognition, g, s, b landmarks, 80 TIMIT sentences, detection rate: 84 % at ms, 50 % at 5-10 ms)  Salomon et al. (2002): Temporal parameters related to periodicity, envelope, spectral fine structure (Speech recognition, onsets and offsets of vowels, sonorants, & consonants, 120 TIMIT sentences, detection rate: 90 % at 20 ms)  Sainath and Hazan (2006): Sinusoidal model parameters (Speech segmentation, 453 TIMIT sentences, word error rates: 20 % )  Niyogi & Sondhi (2002): Stop landmark detection using total energy, energy above 3 kHz & Wiener entropy (Speech recognition, stop consonants, 320 TIMIT sentences, detection rate: 90 % at 20 ms)  Jayan & Pandey (2009): Stop landmark detection using GMM parameters (Speech enhancement, 50 TIMIT sentences, detection rate: 73 % at 5 ms)

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 10/21 Improving landmark detection  Parameters ▪ Capturing spectral transitions ▪ Adaptation to speech variability  Rate of change measure ▪ Range of parameter variations ▪ Correlation among parameters  Adaptive time steps ▪ Small time step for abrupt variations ▪Large time step for slow variations Objective of the present investigation Detection of burst landmarks for automated intelligibility enhancement

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 11/21 2. METHODOLOGY Band energy parameters Log of spectral peaks in three bands ▪ b1: kHz ▪ b2: kHz ▪ b3: kHz  Mag. spectrum (10 kHz sampling) computed using 512-point DFT, 6 ms Hanning window, 1 frame per ms, and smoothed by 20-point moving average.  Smoothed mag. spectrum X(n, k) used for calculating log of spectral peak in band i n = time index, k = frequency index

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 12/21 Example : Band energy parameters for /aga/ Time (ms) (a) Speech waveform (b) Band energy's

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 13/21 Spectral moments Normalized spectrum  Centroid : frequency of energy concentration n = time index, k = frequency index, N = DFT size  Variance : spread of energy around the centroid  Skewness : measure of spectral symmetry  Kurtosis : measure of spectral peakiness

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 14/21  Example : Band energy parameters & spectral moments for /aga/ Time (ms) (a) Waveform (b) (c) (d)

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 15/21 Measures of rate of change ● First difference based rate of change (ROC) K = time step ● Mahalanobis distance based rate of change (ROC-MD) A single measure indicative of the overall variation, taking care of parameter range and correlation effects y ( n ) = parameter set at time n K = time step  = covariance matrix, pre-calculated using the parameter set from segments with energy above a threshold

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 16/21 Detection of voicing offset and onset ▪ Band energy in Hz ▪ ROC( n ) computed with time step 50 ms ▪ Voicing offset [g-] : ROC( n )  -12 dB ▪ Voicing onset [g+] : ROC( n )  +12 dB Burst onset landmark detection Most prominent peak in the ROC-MD( n ) between g- and g+ Example /aga/ (b) ROC-MD (c) ROC Time (ms) (a) Waveform

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 17/21 3. EVALUTATION & RESULTS Effects of rate of change functions & parameters on burst detection ROC and parameters 1 ) ROC(BE): Sum of normalized ROCs of [ E b1, E b2, E b3 ] 2 ) ROC-MD(BE): ROC-MD of [ E b1, E b2, E b3 ] 3 ) ROC-MD(SM): ROC-MD of [ F c, F , F k, F s ] 4 ) ROC-MD(BE,SM): ROC-MD of [E b1, E b2, E b3, F c, F , F k, F s ] Material: VCV utterances, TIMIT sentences Time steps: 3, 6 ms Temporal accuracies: 3, 5, 10, 15, 20 ms

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 18/21 VCV utterances ▪ 6 stop consonants ( b, d, g, p, t, k ) ▪ 3 vowel contexts ( a, i, u ) ▪ 10 speakers (5 M, 5 F) ▪ 180 tokens

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 19/21 TIMIT Sentences ▪ 5 speakers (2 M, 3 F) ▪ 10 sentences from each speaker ▪ 238 tokens Error type Insertion rates (%) ROC(BE)ROC-MD(BE)ROC-MD(SM)ROC-MD(BE,SM) Vowel / sem. vowel Frication Glottal stops / clicks4334

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 20/21 4. CONCLUSION  Increase in time steps reduced detection accuracy.  Mahalanobis distance based ROC was more effective than first- difference based rate of change.  Spectral moments were useful as additional parameters in improving burst-onset detection.

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 21/21 Thank you