Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.

Similar presentations


Presentation on theme: "Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal."— Presentation transcript:

1 Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal ______________________________________________________ B. S. Atal AAAS Seattle February 2004 1

2 Introduction NJ Robust performance of automatic speech recognition continues to be a major issue Why ASR systems are not robust? High-dimensional spaces in pattern recognition Frequency versus power spectrum Results with high-dimensional representations _____________________________________________________ B. S. Atal AAAS Seattle February 2004 2

3 Robustness Issue in ASR 1. Speech recognition has to work well for all speakers and in the presence of noise, reverberation, and spectral distortion, present in real speaking environments 2. Why ASR systems often are not robust? (a) Acoustics of the same speech sound can vary, with speaker, his speaking style, speaking rate, and context in the utterance (b) Acoustics in ASR is based on the short-time power spectrum, which is altered if there is mismatch between the training and test conditions ______________________________________________________ B. S. Atal AAAS Seattle February 2004 3

4 Phones in Low-Dimensional Space B. S. Atal AAAS Seattle February 2004 4 1.Two very different sounds /a/ (bought), and /s/ (see) in two dimensional space ______________________________________________________ 2.To deal with this large variability, the acoustic patterns of speech must be represented in a large multi-dimensional space

5 N dimensional Gaussian Distribution ______________________________________________________ 1. 2. B. S. Atal AAAS Seattle February 2004 5

6 Sounds in Continuous Speech 1.Acoustic influence of each sound in speech is spread across neighboring sounds. 2.Speech segment used to represent a sound could thus be fairly large, as much as 500 ms. ______________________________________________________ B. S. Atal AAAS Seattle February 2004 6

7 Acoustic Analysis of Speech 1. The speech signal is filtered into 16 frequency bands. 2. Two line spectral frequencies, are used to represent the frequency content in each band. 3. The analysis window is moved in steps of 25 ms resulting in a sequence of 32-dimensional vectors spaced uniformly at 25 ms intervals. 4. The frequencies in each band are not affected by linear filtering and thus mismatch between the training and test conditions caused by different microphones is eliminated. ______________________________________________________ B. S. Atal AAAS Seattle February 2004 7

8 Filter Bank Frequency Response ______________________________________________________ B. S. Atal AAAS Seattle February 2004 8

9 High Dimensional Representation of Speech Sounds ______________________________________________________ B. S. Atal AAAS Seattle February 2004 9

10 Basic Blocks of a Speech Pattern Recognizer ______________________________________________________ B. S. Atal AAAS Seattle February 2004 10

11 TIMIT Speech Database ______________________________________________________ NoRegion# of Speakers 1New England 49 2Northern102 3North Midland102 4South Midland100 5Southern 98 6New York City 46 7Western100 8Army Brat 33 B. S. Atal AAAS Seattle February 2004 11

12 Frequency response of 540A Handset ______________________________________________________ B. S. Atal AAAS Seattle February 2004 12 Frequency (Hz) 100 1000 10000 2345 2 345 -20 -40 0 dB

13 Results (TIMIT Speech Database) Training - 370 speakers 6 regions Testing - 129 speakers 6 regions ______________________________________________________ BandwidthSNRAccuracy (%) 8 kHzoriginal86 8 kHz+20 dB82 3.5 kHz (540A Handset) +20 dB75 3.5 kHz (560A Handset) +12 dB58 3.5 kHz (560A Handset) + 6 dB48 3.5 kHz (560A Handset) 0 dB35 40 Phonesnbest = 4 B. S. Atal AAAS Seattle February 2004 13

14 Computational Complexity Issue 1.Computation requirements for pattern recognition in high-dimensional spaces will become manageable in future. 2.CPU & DSP Mips increase 100 times in 10 years. ______________________________________________________ B. S. Atal AAAS Seattle February 2004 14

15 Conclusions B. S. Atal AAAS Seattle February 2004 15 1. Advantage of using high-dimensional spaces in pattern recognition is obvious. 2. Acoustic parameters based on frequency rather than power spectrum lead to robust recognition. 3. Use of 160-dimensional space to represent each speech sound provides good results. 4. Desirable to use 2000 or even larger number dimensions to represent each speech sound. ______________________________________________________


Download ppt "Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal."

Similar presentations


Ads by Google