IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

Sounds that “move” Diphthongs, glides and liquids.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Advanced Speech Enhancement in Noisy Environments
Speech Science XII Speech Perception (acoustic cues) Version
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Perceptual Weighting Strategies in Normal Hearing and Hearing Impaired Children and Adults Andrea Pittman, Ph.D. Patricia Stelmachowicz, Ph.D. Dawna Lewis,
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
IIT Bombay Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
♠♠♠♠ 1Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄◄ ► ► 1/161Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄ ► IIT Bombay ICA 2010 : 20th Int.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
From Auditory Masking to Supervised Separation: A Tale of Improving Intelligibility of Noisy Speech for Hearing- impaired Listeners DeLiang Wang Perception.
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Speech Perception 4/4/00.
♠ 1.Intro 2. List. tests 3. Results 4 Concl.♠♠ 1.Intro 2. List. tests 3. Results 4 Concl. ♥♥ ◄◄ ► ► 1/17♥♥◄ ► IIT Bombay ICA 2010 : 20th Int. Congress.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
Performance Comparison of Speaker and Emotion Recognition
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
Speech Perception.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Acoustic Phonetics 3/14/00.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
Automated Detection of Speech Landmarks Using
III. Analysis of Modulation Metrics IV. Modifications
Speech Perception.
Voice source characterisation
Speech Perception (acoustic cues)
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT Bombay 7 / Jan. / 2012

IIT Bombay 2/26 PRESENTATION OUTLINE 1. Introduction  Clear speech  Intelligibility enhancement using properties of clear speech 2. Landmark detection  Speech landmarks  Processing steps  Evaluation 3. CVR modification 4. Evaluation and results 5. Conclusion

IIT Bombay 3/26 1. INTRODUCTION Clear speech  Speech produced with clear articulation when talking to a hearing- impaired listener, or in a noisy environment More intelligible for ▪ Hearing impaired listeners (~17% higher, Picheny et al.,1985) ▪ Listeners in noisy environments (Payton et al., 1994) ▪ Non-native listeners (Bradlow and Bent, 2002) ▪ Children with learning disabilities (Bradlow et al., 2003)  Pronounced acoustic landmarks

IIT Bombay 4/26 Conv. Clear Example : ‘The book tells a story’ (Recordings from Time (s)

IIT Bombay 5/26 Acoustic properties of clear speech  Sentence level ▪ Reduced speaking rate (conv.: 200 wpm, clear: 100 wpm) ▪ More number of pauses, increased pause durations ▪ Increase in in fundamental frequency & its dynamic range ▪ Increased modulation depth of intensity envelope ▪ Increased energy in 1-3 kHz  Word level ▪ Less sound deletions ▪ More sound insertions  Phonetic level ▪ Increased intensity of consonant segments relative to vowel segments → increased consonant-vowel-ratio (CVR) ▪ Context dependent, non-uniform increase in segment durations ▪ More targeted vowel formants

IIT Bombay 6/26 Intelligibility enhancement using properties of clear speech Approaches ▪ CVR modification: Increasing consonant segment intensity relative to nearby vowel ▪ Duration modification: Expansion of consonant duration and compression of steady state vowel segments ▪ Spectral modification: Shifting frequency of energy concentration of bursts and fricatives Challenges ▪ Accurate detection of regions for modification ▪ Speech modification with low processing artifacts ▪ Processing without altering the overall speaking rate

IIT Bombay 7/26 Earlier investigations on CVR modification Manual segmentation and segment-specific modification ▪ Gordon-Salant (1986): CV syllables, CVR +10dB, 12 talker babble noise, NHI, +14% ▪ Montgomery et al. (1987): 100 CVC syllables, CVR adjusted to 0 dB, SNHI, +10.5% ▪ Kennedy et al. (1997): VC syllables, optimum increase in CVR : 8.3 dB (voiced), 10.7 dB (unvoiced), SNHI, ▪ Hazan & Simpson (1998): Burst +12 dB, fric. +6 dB, nas. +6 dB filtering, VCV, SUS, NH, +12% Synthesis: synthesis of speech stimuli with segment-specific modification ▪ Thomas (1996): CVR +3 to +12 dB, CV & VC, broad-band noise, NH, +14% Clear speech 4/5

IIT Bombay 8/26 Automated speech intelligibility enhancement ▪ Colotte & Laprie (2000) ▪ Segmentation by MFCC based spectral variation function ▪ Detection rate (within 20 ms): Bursts (82%), Fricatives (91%) ▪ Processing: Stops & unvoiced fricatives enhanced by +4 dB & time-scaled by 1.8 (TD-PSOLA) ▪ Improvement in missing word identification: 9% (intensity scaling), 14% (time & intensity scaling) ▪ Ortega et al. (2000) ▪ Broad-class HMM classifier ▪ Detection rate: Bursts (80%), Fricatives (75%), Nasals (90%) vowel onsets (78%), offsets (76%) ▪ Processing: Frication and nasals +6 dB, bursts +12 dB ▪ Results for tests with speech-spectrum-shaped noise as masker (-5dB SNR): −1.4 % for VCV

IIT Bombay 9/26 2. LANDMARK DETECTION Speech landmarks Regions, associated with spectral transitions, containing important information for speech perception Landmarks and related events Segment typeLandmarkDescription VowelVowel (V)Vowel nucleus GlideGlide (G)Slow formant transitions Consonant Glottis (g) Sonorant (s) Burst (b) Vocal fold vibration Nasal closure / release Turbulence noise

IIT Bombay 10/26 Stop consonant landmarks  Closure [-g]  Burst onset [b]  Onset of voicing [+g] Example: /apa/ ●Transient sound with low intensity ● Severely affected by noise / hearing impairment

IIT Bombay 11/26 Processing steps ▪ Extraction of parameters characterizing the landmark ▪ Computation of the rate of change (ROC) of parameters ▪ Locating the landmark using ROC(s) Parameters Mag. spectrum (10 kHz sampling) computed using 512-point DFT, 6 ms Hanning window, 1 frame per ms, and smoothed by 20-point moving average. ▪ Log of spectral peak in 0 – 400 Hz, E g (n) and 1.2 – 5 kHz, E b (n) ▪ Spectral centroid of tone-added spectrum (-20 dB w.r.t. max. signal level) where n = time index, k = frequency index

IIT Bombay 12/26 Rate of change measures K = time step y(n) = parameter set, ∑ = covariance matrix Detection of stop consonant landmarks Voicing offset and onset rE g (n) computed with time step 50 ms ▪ Voicing offset [g-] : ROC(n)  -12 dB ▪ Voicing onset [g+] : ROC(n)  +12 dB Burst onset Most prominent peak in the ROC-MD( n ) between g- and g+

IIT Bombay 13/26 Example : /aga/ Time (ms) (a) Waveform (b) ROC-MD (c) ROC

IIT Bombay 14/26 Detection rates (%) for VCV Utterances Material ▪ 180 VCV utterances with manually marked burst onsets (6 stops, 3 vowel contexts, 10 speakers) Detection rate (%) Temporal accuracy (ms) 55  10  15  20 Burst onset (180) Voicing offset (180) Voicing onset (180)769398

IIT Bombay 15/26 3. CVR MODIFICATION Method ● Detected voicing offset (g − ), burst (b), and voicing onset (g + ) landmarks ● Regions selected for modification: ▪ VC transition: [g − - 20 ms, g − ] ▪ CV transition [b-10 ms, g ms] ▪ + 9dB intensity scaling [cosine tapered window, 2.5 ms rise and fall time]

IIT Bombay 16/26 Example: /aga/ (a) Original signal with landmarks, (b) windows selected for CVR modification (c) Scaling function for intensity enhancement, (d) modified signal

IIT Bombay 17/26 /ada/Inf.12 dB6 dB0 dB-6 dB-12 dB Unp. Proc.

IIT Bombay 18/26 4. EVALUATION AND RESULTS Test material ▪ 90 VCV utterances (6 stops ( b, d, g, p, t, k ), 3 vowel contexts ( a, i, u ), 5 speakers: 3 female + 2 male) ▪ CVR modified by +9dB ▪ 2 processing conditions (orig., cvr) ▪ Speech-shaped noise (added uniformly, started 1s before and continued for 1s after each stimuli) ▪ SNR levels ( , +12, +6, 0, -6, -12 dB) ▪ Randomized presentations, 5 subjects

IIT Bombay 19/26 Recognition scores (%) ▪ Identical performance for SNR levels 12 to 6 dB ▪ Nearly 10 % to 20 % improvement at SNR levels below 0 dB SNR (dB) Imp. % p E E-05

IIT Bombay 20/26 Recognition scores: consonant-wise (%) Voiced stops /b, d, g/ Unvoiced stops /p, t, k/

IIT Bombay 21/26 SNR OverallPlaceVoicing orig.cvr.orig.cvr.orig.cvr. Inf dB dB dB dB dB Results of information transmission analysis ▪ At lower SNR levels, loss of information is mainly due to place feature ▪ Intensity modification improves transmission of place feature at lower SNRs ▪ Comparatively lower improvement in transmission of voicing feature

IIT Bombay 22/26 Response time ▪ Response time increases at lower SNR levels ▪ Difference in response times for orig. and cvr. stimuli marginal ▪ No significant increase in perceptual load on subject introduced by intensity modification

IIT Bombay 23/26 5. CONCLUSION ▪ Automated detection of stop consonant landmarks improved by using rate of change of spectral moments ▪ Automated CVR modification can be realized by automated landmark detection ▪ Listening tests with CVR modification - Improvement in recognition score for stop consonants: 7, 18, 25% at SNR levels of 0, -6, -12 dB. - Improved transmission of place feature - More effective for alveolar stop consonants - No change in perceptual load

IIT Bombay 24/26 Future work ▪ Establishing optimum range of CVR ▪ Evaluation on hearing impaired listeners ▪ Real-time implementation

IIT Bombay 25/26 Reference A. R. Jayan and P. C. Pandey, Automated CVR modification for improving perception of stop consonants, Proc. 18 th National Conf. Communications, 3-5 Feb. 2012, IIT Kharagpur, India. Abstract — Increasing the intensity of the consonant segments relative to the nearby vowel segments, known as consonant-vowel ratio (CVR) modification, is reported to be effective in improving perception of stop consonants for listeners in noisy backgrounds and for hearing-impaired listeners. A technique for automated CVR modification using detection of acoustic landmarks corresponding to the stop release bursts, with high temporal accuracy, is investigated. Its effectiveness in improving perception of stop consonants in the presence of speech-spectrum shaped noise is evaluated by conducting listening tests on five normal-hearing subjects with VCV utterances involving six stop consonants and three vowels. The processing improved the recognition scores for stop consonants by nearly 7, 18, and 25% at SNR levels of 0, −6, and −12 dB, respectively.

IIT Bombay 26/26 References Bibliography 1. M. A. Picheny, N. I. Durlach, and L. D. Braida, “Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech,” J. Speech Hear. Res., vol. 28, , K. L. Payton, R. M. Uchanski, and L. D. Braida, “Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing,” J. Acoust. Soc. Am., 95, , A. R. Bradlow, N. Kraus, and E. Hayes, “Speaking clearly for children with learning disabilities”, J. Speech Lang. Hear. Res. 46, 80-97, A. R. Bradlow and T. Bent, “The clear speech effect for non-native listeners,” J. Acoust. Soc. Am., 112, , F. R. Chen, “Acoustic characteristics and intelligibility of clear and conversational speech at the segmental level,” M.S. dissertation, Massachusetts Institute of Technology, M. A. Picheny, N. I. Durlach, and L. D. Braida, “Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech,” J. Speech Hear. Res., 298, , S. Gordon-Salant, “Recognition of time/intensity altered CVs by young and elderly subjects with normal-hearing,” J. Acoust. Soc. Am. 80, , A. A. Montgomery and R. A. Edge, “Evaluation of two speech enhancement techniques to improve intelligibility for hearing-impaired adults,” J. Speech Hear. Res., 31, , E. Kennedy, H. Levitt, A. C. Neuman, and M. Weiss, “Consonant-vowel intensity ratios for maximizing consonant recognition by hearing-impaired listeners,” J. Acoust. Soc. Am., 103, , T. G. Thomas, “Experimental evaluation of improvement in speech perception with consonantal intensity and duration modification,” Ph.D. dissertation, Dept. of Elect. Engg., IIT Bombay, V. Hazan and A. Simpson, “The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise,” Speech Comm., vol. 24, pp , R. W. Guelke, ”Consonant burst enhancement: A possible means to improve intelligibility for the hard of hearing,” J. Rehab. Res. Develop., 24, , V. Colotte and Y. Laprie, “Automatic enhancement of speech intelligibility,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Istanbul, Turkey, , M. Ortega, V. Hazan, and M. Huckwale, “Automatic cue enhancement of natural speech for improved intelligibility”, Speech, Hearing, and Language, 12, 42-56, S. A. Liu, “Landmark detection for distinctive feature based speech recognition,” J. Acoust. Soc. Am., 100, , A. R. Jayan, P.S. Rajath Bhat, and P. C. Pandey, “Detection of burst onset landmarks in speech using rate of change of spectral moments,” in Proc. 17th National Conference on Communications, Bangalore, India, P. C. Mahalanobis, “On the generalized distance in statistics,” in Proc. National Institute of Sciences of India, vol. 2, pp. 49–55, J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-TimeProcessing of Speech Signals, New York: John Wiley, G. E. Miller, P. E. Niceley, An analysis of perceptual confusions among some English consonants. J. Acoust. Soc. Am., 27, No , 1955.

IIT Bombay 27/26 Thank you.