Presentation is loading. Please wait.

Presentation is loading. Please wait.

IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.

Similar presentations


Presentation on theme: "IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT."— Presentation transcript:

1

2 IIT Bombay 1/26 pcpandey@ee.iitb.ac.in Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT Bombay 7 / Jan. / 2012

3 IIT Bombay 2/26 pcpandey@ee.iitb.ac.in PRESENTATION OUTLINE 1. Introduction  Clear speech  Intelligibility enhancement using properties of clear speech 2. Landmark detection  Speech landmarks  Processing steps  Evaluation 3. CVR modification 4. Evaluation and results 5. Conclusion

4 IIT Bombay 3/26 pcpandey@ee.iitb.ac.in 1. INTRODUCTION Clear speech  Speech produced with clear articulation when talking to a hearing- impaired listener, or in a noisy environment More intelligible for ▪ Hearing impaired listeners (~17% higher, Picheny et al.,1985) ▪ Listeners in noisy environments (Payton et al., 1994) ▪ Non-native listeners (Bradlow and Bent, 2002) ▪ Children with learning disabilities (Bradlow et al., 2003)  Pronounced acoustic landmarks

5 IIT Bombay 4/26 pcpandey@ee.iitb.ac.in Conv. Clear Example : ‘The book tells a story’ (Recordings from http://www.acoustics.org/press/145th/clr-spch-tab.htm) Time (s)

6 IIT Bombay 5/26 pcpandey@ee.iitb.ac.in Acoustic properties of clear speech  Sentence level ▪ Reduced speaking rate (conv.: 200 wpm, clear: 100 wpm) ▪ More number of pauses, increased pause durations ▪ Increase in in fundamental frequency & its dynamic range ▪ Increased modulation depth of intensity envelope ▪ Increased energy in 1-3 kHz  Word level ▪ Less sound deletions ▪ More sound insertions  Phonetic level ▪ Increased intensity of consonant segments relative to vowel segments → increased consonant-vowel-ratio (CVR) ▪ Context dependent, non-uniform increase in segment durations ▪ More targeted vowel formants

7 IIT Bombay 6/26 pcpandey@ee.iitb.ac.in Intelligibility enhancement using properties of clear speech Approaches ▪ CVR modification: Increasing consonant segment intensity relative to nearby vowel ▪ Duration modification: Expansion of consonant duration and compression of steady state vowel segments ▪ Spectral modification: Shifting frequency of energy concentration of bursts and fricatives Challenges ▪ Accurate detection of regions for modification ▪ Speech modification with low processing artifacts ▪ Processing without altering the overall speaking rate

8 IIT Bombay 7/26 pcpandey@ee.iitb.ac.in Earlier investigations on CVR modification Manual segmentation and segment-specific modification ▪ Gordon-Salant (1986): CV syllables, CVR +10dB, 12 talker babble noise, NHI, +14% ▪ Montgomery et al. (1987): 100 CVC syllables, CVR adjusted to 0 dB, SNHI, +10.5% ▪ Kennedy et al. (1997): VC syllables, optimum increase in CVR : 8.3 dB (voiced), 10.7 dB (unvoiced), SNHI, ▪ Hazan & Simpson (1998): Burst +12 dB, fric. +6 dB, nas. +6 dB filtering, VCV, SUS, NH, +12% Synthesis: synthesis of speech stimuli with segment-specific modification ▪ Thomas (1996): CVR +3 to +12 dB, CV & VC, broad-band noise, NH, +14% Clear speech 4/5

9 IIT Bombay 8/26 pcpandey@ee.iitb.ac.in Automated speech intelligibility enhancement ▪ Colotte & Laprie (2000) ▪ Segmentation by MFCC based spectral variation function ▪ Detection rate (within 20 ms): Bursts (82%), Fricatives (91%) ▪ Processing: Stops & unvoiced fricatives enhanced by +4 dB & time-scaled by 1.8 (TD-PSOLA) ▪ Improvement in missing word identification: 9% (intensity scaling), 14% (time & intensity scaling) ▪ Ortega et al. (2000) ▪ Broad-class HMM classifier ▪ Detection rate: Bursts (80%), Fricatives (75%), Nasals (90%) vowel onsets (78%), offsets (76%) ▪ Processing: Frication and nasals +6 dB, bursts +12 dB ▪ Results for tests with speech-spectrum-shaped noise as masker (-5dB SNR): −1.4 % for VCV

10 IIT Bombay 9/26 pcpandey@ee.iitb.ac.in 2. LANDMARK DETECTION Speech landmarks Regions, associated with spectral transitions, containing important information for speech perception Landmarks and related events Segment typeLandmarkDescription VowelVowel (V)Vowel nucleus GlideGlide (G)Slow formant transitions Consonant Glottis (g) Sonorant (s) Burst (b) Vocal fold vibration Nasal closure / release Turbulence noise

11 IIT Bombay 10/26 pcpandey@ee.iitb.ac.in Stop consonant landmarks  Closure [-g]  Burst onset [b]  Onset of voicing [+g] Example: /apa/ ●Transient sound with low intensity ● Severely affected by noise / hearing impairment

12 IIT Bombay 11/26 pcpandey@ee.iitb.ac.in Processing steps ▪ Extraction of parameters characterizing the landmark ▪ Computation of the rate of change (ROC) of parameters ▪ Locating the landmark using ROC(s) Parameters Mag. spectrum (10 kHz sampling) computed using 512-point DFT, 6 ms Hanning window, 1 frame per ms, and smoothed by 20-point moving average. ▪ Log of spectral peak in 0 – 400 Hz, E g (n) and 1.2 – 5 kHz, E b (n) ▪ Spectral centroid of tone-added spectrum (-20 dB w.r.t. max. signal level) where n = time index, k = frequency index

13 IIT Bombay 12/26 pcpandey@ee.iitb.ac.in Rate of change measures K = time step y(n) = parameter set, ∑ = covariance matrix Detection of stop consonant landmarks Voicing offset and onset rE g (n) computed with time step 50 ms ▪ Voicing offset [g-] : ROC(n)  -12 dB ▪ Voicing onset [g+] : ROC(n)  +12 dB Burst onset Most prominent peak in the ROC-MD( n ) between g- and g+

14 IIT Bombay 13/26 pcpandey@ee.iitb.ac.in Example : /aga/ Time (ms) (a) Waveform (b) ROC-MD (c) ROC

15 IIT Bombay 14/26 pcpandey@ee.iitb.ac.in Detection rates (%) for VCV Utterances Material ▪ 180 VCV utterances with manually marked burst onsets (6 stops, 3 vowel contexts, 10 speakers) Detection rate (%) Temporal accuracy (ms) 55  10  15  20 Burst onset (180)969799 Voicing offset (180)36647383 Voicing onset (180)769398

16 IIT Bombay 15/26 pcpandey@ee.iitb.ac.in 3. CVR MODIFICATION Method ● Detected voicing offset (g − ), burst (b), and voicing onset (g + ) landmarks ● Regions selected for modification: ▪ VC transition: [g − - 20 ms, g − ] ▪ CV transition [b-10 ms, g + +10 ms] ▪ + 9dB intensity scaling [cosine tapered window, 2.5 ms rise and fall time]

17 IIT Bombay 16/26 pcpandey@ee.iitb.ac.in Example: /aga/ (a) Original signal with landmarks, (b) windows selected for CVR modification (c) Scaling function for intensity enhancement, (d) modified signal

18 IIT Bombay 17/26 pcpandey@ee.iitb.ac.in /ada/Inf.12 dB6 dB0 dB-6 dB-12 dB Unp. Proc.

19 IIT Bombay 18/26 pcpandey@ee.iitb.ac.in 4. EVALUATION AND RESULTS Test material ▪ 90 VCV utterances (6 stops ( b, d, g, p, t, k ), 3 vowel contexts ( a, i, u ), 5 speakers: 3 female + 2 male) ▪ CVR modified by +9dB ▪ 2 processing conditions (orig., cvr) ▪ Speech-shaped noise (added uniformly, started 1s before and continued for 1s after each stimuli) ▪ SNR levels ( , +12, +6, 0, -6, -12 dB) ▪ Randomized presentations, 5 subjects

20 IIT Bombay 19/26 pcpandey@ee.iitb.ac.in Recognition scores (%) ▪ Identical performance for SNR levels 12 to 6 dB ▪ Nearly 10 % to 20 % improvement at SNR levels below 0 dB SNR (dB) Imp. % p 12-0.20.352 60.40.238 07.40.009 -618.42E-04 -1224.84E-05

21 IIT Bombay 20/26 pcpandey@ee.iitb.ac.in Recognition scores: consonant-wise (%) Voiced stops /b, d, g/ Unvoiced stops /p, t, k/

22 IIT Bombay 21/26 pcpandey@ee.iitb.ac.in SNR OverallPlaceVoicing orig.cvr.orig.cvr.orig.cvr. Inf. 868589877577 +12 dB 79 84826769 +6 dB 7576778066 0 dB 637855816966 -6 dB 457229706169 -12 dB 20436323451 Results of information transmission analysis ▪ At lower SNR levels, loss of information is mainly due to place feature ▪ Intensity modification improves transmission of place feature at lower SNRs ▪ Comparatively lower improvement in transmission of voicing feature

23 IIT Bombay 22/26 pcpandey@ee.iitb.ac.in Response time ▪ Response time increases at lower SNR levels ▪ Difference in response times for orig. and cvr. stimuli marginal ▪ No significant increase in perceptual load on subject introduced by intensity modification

24 IIT Bombay 23/26 pcpandey@ee.iitb.ac.in 5. CONCLUSION ▪ Automated detection of stop consonant landmarks improved by using rate of change of spectral moments ▪ Automated CVR modification can be realized by automated landmark detection ▪ Listening tests with CVR modification - Improvement in recognition score for stop consonants: 7, 18, 25% at SNR levels of 0, -6, -12 dB. - Improved transmission of place feature - More effective for alveolar stop consonants - No change in perceptual load

25 IIT Bombay 24/26 pcpandey@ee.iitb.ac.in Future work ▪ Establishing optimum range of CVR ▪ Evaluation on hearing impaired listeners ▪ Real-time implementation

26 IIT Bombay 25/26 pcpandey@ee.iitb.ac.in Reference A. R. Jayan and P. C. Pandey, Automated CVR modification for improving perception of stop consonants, Proc. 18 th National Conf. Communications, 3-5 Feb. 2012, IIT Kharagpur, India. Abstract — Increasing the intensity of the consonant segments relative to the nearby vowel segments, known as consonant-vowel ratio (CVR) modification, is reported to be effective in improving perception of stop consonants for listeners in noisy backgrounds and for hearing-impaired listeners. A technique for automated CVR modification using detection of acoustic landmarks corresponding to the stop release bursts, with high temporal accuracy, is investigated. Its effectiveness in improving perception of stop consonants in the presence of speech-spectrum shaped noise is evaluated by conducting listening tests on five normal-hearing subjects with VCV utterances involving six stop consonants and three vowels. The processing improved the recognition scores for stop consonants by nearly 7, 18, and 25% at SNR levels of 0, −6, and −12 dB, respectively.

27 IIT Bombay 26/26 pcpandey@ee.iitb.ac.in References Bibliography 1. M. A. Picheny, N. I. Durlach, and L. D. Braida, “Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech,” J. Speech Hear. Res., vol. 28, 96- 103, 1985. 2. K. L. Payton, R. M. Uchanski, and L. D. Braida, “Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing,” J. Acoust. Soc. Am., 95, 1581-1592, 1994. 3. A. R. Bradlow, N. Kraus, and E. Hayes, “Speaking clearly for children with learning disabilities”, J. Speech Lang. Hear. Res. 46, 80-97, 2003. 4. A. R. Bradlow and T. Bent, “The clear speech effect for non-native listeners,” J. Acoust. Soc. Am., 112, 272-284, 2002. 5. F. R. Chen, “Acoustic characteristics and intelligibility of clear and conversational speech at the segmental level,” M.S. dissertation, Massachusetts Institute of Technology, 1980. 6. M. A. Picheny, N. I. Durlach, and L. D. Braida, “Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech,” J. Speech Hear. Res., 298, 434-446, 1986. 7. S. Gordon-Salant, “Recognition of time/intensity altered CVs by young and elderly subjects with normal-hearing,” J. Acoust. Soc. Am. 80, 1599-1607, 1986. 8. A. A. Montgomery and R. A. Edge, “Evaluation of two speech enhancement techniques to improve intelligibility for hearing-impaired adults,” J. Speech Hear. Res., 31, 386-393, 1988. 9. E. Kennedy, H. Levitt, A. C. Neuman, and M. Weiss, “Consonant-vowel intensity ratios for maximizing consonant recognition by hearing-impaired listeners,” J. Acoust. Soc. Am., 103, 1098- 1114, 1997. 10. T. G. Thomas, “Experimental evaluation of improvement in speech perception with consonantal intensity and duration modification,” Ph.D. dissertation, Dept. of Elect. Engg., IIT Bombay, 1996. 11. V. Hazan and A. Simpson, “The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise,” Speech Comm., vol. 24, pp. 211-226, 1998. 12. R. W. Guelke, ”Consonant burst enhancement: A possible means to improve intelligibility for the hard of hearing,” J. Rehab. Res. Develop., 24, 217-220, 1987. 13. V. Colotte and Y. Laprie, “Automatic enhancement of speech intelligibility,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Istanbul, Turkey, 1057-1060, 2000. 14. M. Ortega, V. Hazan, and M. Huckwale, “Automatic cue enhancement of natural speech for improved intelligibility”, Speech, Hearing, and Language, 12, 42-56, 2000. 15. S. A. Liu, “Landmark detection for distinctive feature based speech recognition,” J. Acoust. Soc. Am., 100, 3417-3430, 1996. 16. A. R. Jayan, P.S. Rajath Bhat, and P. C. Pandey, “Detection of burst onset landmarks in speech using rate of change of spectral moments,” in Proc. 17th National Conference on Communications, Bangalore, India, 2011. 17. P. C. Mahalanobis, “On the generalized distance in statistics,” in Proc. National Institute of Sciences of India, vol. 2, pp. 49–55, 1936. 18. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-TimeProcessing of Speech Signals, New York: John Wiley, 2000. 19. G. E. Miller, P. E. Niceley, An analysis of perceptual confusions among some English consonants. J. Acoust. Soc. Am., 27, No.2 338-352, 1955.

28 IIT Bombay 27/26 pcpandey@ee.iitb.ac.in Thank you.


Download ppt "IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT."

Similar presentations


Ads by Google