IIT Bombay ICSCN 2008 - International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.

Slides:



Advertisements
Similar presentations
ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
Advertisements

Acoustic/Prosodic Features
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Sounds that “move” Diphthongs, glides and liquids.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Advanced Speech Enhancement in Noisy Environments
Speech Science XII Speech Perception (acoustic cues) Version
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Speech perception Relating features of hearing to the perception of speech.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Representing Acoustic Information
Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
Phonetics: the generation of speech Phonemes “The shortest segment of speech that, if changed, would change the meaning of a word.” hog fog log *Phonemes.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Say “blink” For each segment (phoneme) write a script using terms of the basic articulators that will say “blink.” Consider breathing, voicing, and controlling.
♠ 1.Intro 2. List. tests 3. Results 4 Concl.♠♠ 1.Intro 2. List. tests 3. Results 4 Concl. ♥♥ ◄◄ ► ► 1/17♥♥◄ ► IIT Bombay ICA 2010 : 20th Int. Congress.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Acoustic Phonetics 3/14/00.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Automated Detection of Speech Landmarks Using
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Speech Perception (acoustic cues)
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Automated Detection of Transition Segments for Intensity and Time-Scale Modification for Speech Intelligibility Enhancement by A. R. Jayan, P. C. Pandey, P. K. Lehana EE Dept, IIT Bombay 5 th January, 2008

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 2/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. PAPER OUTLINE 1. Introduction 2. Acoustic Properties of Clear Speech 3.Automated Detection of Transition Segments 4.Intensity and Time-Scale Modification 5.Experimental Results 6.Summary and Conclusion

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 3/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. INTRODUCTION Speech landmarks  Regions in speech containing important information for speech perception  Associated with spectral transitions  Most of the landmarks coincide with phoneme boundaries Landmarks types 1. Abrupt-consonantal (AC) – Tight constrictions of primary articulators 2. Abrupt (A) - Fast glottal or velum activity 3. Non-abrupt (N) - Semi-vowel landmarks, less vocal tract constriction 4. Vocalic (V) - Vowel landmarks, oral cavity maximally open, maximum energy, F1  Abrupt (~68%)  Vocalic (~29%)  Non-abrupt (~3%) Intro. 1/2

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 4/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Objective  To improve speech intelligibility in quiet and noisy environments  Automated detection of landmarks  Speech modification using acoustic properties of clear speech Landmarks Intro. 2/2

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 5/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. ACOUSTIC PROPERTIS OF CLEAR SPEECH Clear speech: speech produced with clear articulation when talking to a hearing impaired listener, or in noisy environments Examples - ‘the book tells a story’ ‘the boy forgot his book’ ConversationalClear Intelligibility of clear speech ▪ More intelligible for different classes of listeners & listening conditions ▪ Picheny et al. (1985): ~17% more intelligible than conversational speech Clear speech 1/5

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 6/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Acoustic properties of clear speech Picheny et al. (1986)  Sentence level Reduced speaking rate (conv: 200 wpm, clr: 100 wpm) Larger variation in fundamental frequency Increased number of pauses, more pause durations  Word level Less sound deletions More sound insertions  Phonetic level Context dependent, non-linear increase in segment durations More targeted vowel formants Increase in consonant intensity Clear speech 2/5

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 7/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Clear speech 3/5 Acoustic cues in clear speech are more robust and discriminable Speech intelligibility of conversational speech can be improved by incorporating properties of clear speech  Consonant-vowel intensity ratio (CVR) enhancement Increasing the ratio of rms energy of consonant segment to nearby vowel  Consonant duration enhancement Increasing VOT, burst duration, formant transition duration Difficulties  Detection of regions for modification  Performing modification with low signal processing artifacts

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 8/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Earlier studies on CVR enhancement  House et al. (1965): MRT, high scores for high consonant level  Gordon-Salant (1986): CVR +10dB, 19 CV, Elderly SNHI, +16%  Guelke (1987): Burst intensity +17 dB, stop CV, NH, +40%  Montgomery et al. (1987): CVR -20 dB to +9 dB, CVC, NH, SNHI, no significant loudness increase  Freyman & Nerbonne (1989): Equated consonant levels across talkers, CV syllables, NH, +12%  Thomas & Pandey (1996): CVR +3 to +12 dB, CV & VC, NH, +16%  Kennedy et al. (1997): CE 0-24 dB, VC, SNHI, max CE: 8.3 dB (voiced), 10.7 dB (unvoiced)  Hazan & Simpson (1998): Burst +12 dB, fric. +6 dB, nas. +6 dB filtering, VCV, SUS, NH, +12% Clear speech 4/5

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 9/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Earlier studies on duration enhancement  Gordon-Salant (1986): DUR +100%, marginal improvement  Thomas & Pandey (1996): BD +100%, FTD +50%, VOT +100% BD, FTD → improved scores, VOT → degraded  Vaughan et al. (2002): Unvoiced consonants expanded by 1.2, effective in noisy condition  Nejime & Moore (1998): Voiced segments expanded by 1.2, 1.5 Degraded performance  Liu & Zeng (2006): Temporal envelope (2-50 Hz) contributes at positive SNRs Fine structure (> 500 Hz) contributes at lower SNRs  Hodoshima et al. (2007): Slowed down, steady-state suppressed speech more intelligible in reverberant environments Clear speech 5/5

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 10/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. AUTOMATED DETECTION OF TRANSITION SEGMENTS Auto.Trans. 1/3 Identifying regions for enhancement - segmentation / landmark detection Manual segmentation  accurate  high detection rate  time consuming  subjective  useful only for research & not for actual application Automated detection of segments  low detection rate  less accurate  consistent Segmentation based on Spectral Transition Measures  maximum spectral transitions coincide with segment boundaries

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 11/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Earlier studies on automated segmentation  Mermelstien (1975): based on loudness variation, low detection rate, slow carefully uttered speech  Glass & Zue (1988): based on auditory critical bands, detection rate 90%, ± 20ms  Sarkar & Sreenivas (2005): based on level crossing rate, adaptive level allocation, detection rate 78.6%, ± 20ms  Alani & Deriche (1999): wavelet transform based, energy in different bands, detection rate 90.9%, ± 20ms  Liu (1996): landmark detection algorithm, energy variation in spectral bands, detection rate 83%, ± 20 ms Auto.Trans. 2/3

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 12/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Earlier studies on automated intelligibility enhancement  Colotte & Laprie (2000)  Segmentation by spectral variation function (82%)  Stops and unvoiced fricatives amplified by +4 dB  Time-scaled by 1.8, 2.0 (TD-PSOLA)  Missing word identification, TIMIT sentences  Improved performance  Skowronski & Harris (2006)  Spectral transition measure based voiced/unvoiced classification  Energy redistribution in voiced / unvoiced segments (ERVU)  Amplifying low energy temporal regions critical to intelligibility  Confusable words TI-46 corpus, 16 talkers, 25 subjects  Improved performance for 9 talkers, no degradation for others  Enhancement useful for native & non-native listeners Auto.Trans. 3/3

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 13/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. PROPOSED METHOD FOR INTELLIGIBILITY ENHANCEMENT  VC and CV transition segments expanded, steady-state segments compressed, overall speech duration kept unaltered  Intensity scaling of transition segments (CVR enhancement)  Objective : reducing the masking of consonantal segments by vowel segments Intel. Enh. 1/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 14/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Liu’s Landmark detection algorithm ▪ Based on energy variation in 6 spectral bands ▪ Segment duration, articulatory, and phonetic class constraints ▪ Glottal, sonorant closures, releases, stop closures, releases ▪ Peak picking based on convex-hull algorithm ▪ Matching of peaks across bands for locating boundaries ▪ Detection rate 83%, accuracy ± 20ms Observations  Assumptions in the method Spectral prominence represented by peak energy in the band One spectral prominence per band  Information regarding frequency location of peak energy not used Intel. Enh. 2/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 15/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Landmark detection using spectral peaks and centroids  Spectrum divided into five non-overlapping bands 0–0.4, 0.4–1.2, 1.2–2.0, 2.0–3.5, 3.5–5.0 kHz Spectral peak and centroid estimated in each band & used for calculating transition index  Peak energy  Centroid frequency  Rate-of-rise functions  Transition index Intel. Enh. 3/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 16/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Spectral peak & centroid variation in bands Example: /aka/  Centroid variation not necessarily in phase with energy variation  Transitions: Some of energy peaks and centroids undergo change kHz kHz kHz kHz kHz Intel. Enh. 4/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 17/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Peak & centroid ROR contours Observation: Product of two RORs near-to-zero during steady-states & peaks during transition segments Example: /aba/ kHz kHz kHz kHz kHz Intel. Enh. 5/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 18/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Detection of transition segments spectrogram transition index boundaries /aba/ Intel. Enh. 6/15 (a) Signal waveform for VCV syllable /aka/ (b) Spectrogram, (c) Transition index (d) transition boundaries detected. waveform

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 19/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. sentence ‘ put the butcher block table’, (b) TIMIT land­marks, and (c) detected landmarks. Manual anno­tation: “bcl”- / b / closure onset, “b”- / b / release burst, etc. Automatic detection: landmarks numbered as 5, 6,..etc. (a) (b) (c) Intel. Enh. 7/15 Evaluation using sentences

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 20/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Evaluation using sentences  50 manually annotated sentences from TIMIT database  5 speakers: 3 female, 2 male Detection rates ST-stop FR-fricative NAS-nasal V-vowel SV-semivowel Intel. Enh. 8/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 21/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Harmonic plus noise model (HNM) (Stylianou 1996) Harmonic part / Deterministic part (quasi periodic components of speech) modeled by harmonics of fundamental frequency Noise part /stochastic part (non periodic components) modeled by LPC coefficients, energy envelope Intel. Enh. 9/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 22/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. HNM parameters ( Lehana and Pandey ) Voiced / Unvoiced Classification (V/UV)  Harmonic part pitch F 0 Maximum voiced frequency F m Amplitudes and phases of harmonics A k  Noise part LPC coefficients Energy envelope Voiced Frame →parameters (Harmonic part + noise part ) Unvoiced Frame → parameters (noise part ) Intel. Enh. 10/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 23/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. HNM based analysis stage  Modification using a small parameter set  Low perceptual distortions, preserves naturalness and intelligibility HNM analysis stage Intel. Enh. 11/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 24/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. HNM based time-scale modification stage Scaling factors Intel. Enh. 12/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 25/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. SNRorig.+6 dB+3 dB0 dB-2 dB-4 dB-6 dB aba Syn. Tsm.  = 1.5 Tsm.  = 2 Tsm.  = 3 Example: VCV syllable /aba/ Time scaling of consonant duration with steady-state compression Intel. Enh. 13/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 26/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. /ama/ Spectrograms: Time-scaled VCV syllable Orig. Synth. β=1.5 β= 2 β= 2.5 Steady-state compression Transition segment expansion Intel. Enh. 14/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 27/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. /aba/ Original Time-scaled Intensity enhanced +6dB Time and Intensity scaling: VCV syllable Intel. Enh. 15/15

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 28/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. EXPERIMENTAL RESULTS  Test material - VCV syllables /aba/, /ada/, /aga/, /apa/, /ata/, /aka/  Time scaling factors : 1.0, 1.2, 1.5, 1.8, 2.0  CVR enhancement : +6 dB 12 processing conditions  Unprocessed: UP  Enhanced CVR without time-scaling: E  Time scaled: TS-1.0, TS-1.2, TS-1.5, TS-1.8, TS ‑ 2.0  Enhanced CVR, time scaled: ETS-1.0, ETS-1.2, ETS-1.5, ETS ‑ 1.8, ETS-2.0 Simulated hearing impairment (adding broadband noise) 6 different SNR levels (inf, 0, -3, -6, -9, and -12 dB) 72 test conditions 60 presentations, 5 tests for each condition,1 subject Exp. Res. 1/2

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 29/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. Results  Time-scaling factors appears to be optimum  Time-scaling improves performance at lower SNR levels  Consonant intensity enhancement more effective Exp. Res. 2/2

IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 30/30 Intro.Intro. Clear speech Trans.Det. Mod. Exp. Res. Sum.Clear speechTrans.Det.Mod.Exp. Res.Sum. SUMMARY & CONCLUSION Processing improved recognition scores for stop consonants  Without increasing overall speech duration  Method found more effective at lower SNR levels  Place feature identification improved significantly by processing  Intensity enhancement found more effective than duration enhancement To be investigated  Optimum scaling factors for different speech material  Testing using different speech material  Testing on more number of subjects & subjects with sensorineural impairment  Analysis in terms of vowel context, consonant category  Quantitative analysis of Intelligibility enhancement - MRT