Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Speech Recognition

Similar presentations


Presentation on theme: "Automatic Speech Recognition"— Presentation transcript:

1 Automatic Speech Recognition
Seminar On Automatic Speech Recognition 1

2 Introduction We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. ASR performance has yet to reach the level required for speech to become a truly pervasive user interface. Indeed, even in “clean” acoustic environments, and for a variety of tasks, state of the art ASR system performance lags human speech perception by up to an order of magnitude (Lippmann, 1997). In addition,

3 History First example of speech recognition was in 1952 Could recognize spoken digits Was presented as a replacement for keyboard input Failed because is not reliable or accurate enough Only successful once presented as a supplement to keyboard input. Error rates started very high, now much more reasonable. Less than 10% in the majority of cases for English.

4 Types of Speech Recognition
Speech recognition systems can be separated in several different classes by describing what types of utterances they have the ability to recognize. These classes are based on the fact that one of the difficulties of ASR is the ability to determine when a speaker starts and finishes an utterance. Most packages can fit into more than one class, depending on which mode they're using. a. Isolated Words Isolated word recognizers usually require each utterance to have quiet (lack of an audio signal) on BOTH sides of the sample window. It doesn't mean that it accepts single words, but does require a single utterance at a time. Often, these systems have "Listen/Not-Listen" states, where they require the speaker to wait between utterances (usually doing processing during the pauses). This class of systems might be better called as Isolated Utterance class.

5 Types … b. Connected Words Connect word systems (or more correctly 'connected utterances') are similar to Isolated words, but allow separate utterances to be 'run-together' with a minimal pause between them. c. Continuous Speech Continuous recognition is the next step. Recognizers with continuous speech capabilities are some of the most difficult to create because they must utilize special methods to determine utterance boundaries. Continuous speech recognizers allow users to speak almost naturally, while the computer determines the content. Basically, it's computer dictation.

6 How Does ASR Work? The goal of an ASR system is to accurately and efficiently convert a speech signal into a text message transcription of the spoken words independent of the speaker, environment or the device used to record the speech (i.e. the microphone). This process begins when a speaker decides what to say and actually speaks a sentence. (This is a sequence of words possibly with pauses, uh’s, and um’s.) The software then produces a speech wave form, which embodies the words of the sentence as well as the extraneous sounds and pauses in the spoken input.

7 What is the Benefit of ASR?
There are fundamentally three major reasons why so much research and effort has gone into the problem of trying to teach machines to recognize and understand speech:  Accessibility for the deaf and hard of hearing  Cost reduction through automation  Searchable text capability

8 Comparing ASR systems Factors include
Speaking mode: isolated words vs continuous speech Speaking style: read vs spontaneous “Enrollment”: speaker (in)dependent Vocabulary size (small <20 … large > 20,000) Equipment: good quality noise-cancelling mic … telephone Size of training set (if appropriate) or rule set Recognition method

9 CONCLUSIONS Current commercial speech-to-text ASR systems are designed specifically for a mainstream, non-speech disordered adult population; thus purposely excluding individuals with speaking disorders. The literature reviewed demonstrates the numerous challenges faced by moderately to severely dysarthric speakers in achieving good ASR performance, including type and category of ASR application, amount of system and user training, motivation, fatigue, frustration, error, and the surrounding environment .

10 Refrences Adjoudani, A. and Benoˆıt, C. (1996). On the integration of auditory and visual parameters in an HMM-based ASR. In Stork, D.G. and Hennecke, M.E. (Eds.), Speechreading by Humans and Machines. Berlin, Germany: Springer, pp. 461–471. Adjoudani, A., Guiard-Marigny, T., Le Goff, B., Reveret, L., and Benoˆıt, C. (1997). A multimedia platform for audiovisual speech processing. Proc. European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 1671–1674. Alissali, M., Del´eglise, P., and Rogozan, A. (1996). Asynchronous integration of visual information in an automatic speech recognition system. Proc. International Conference on Spoken Language Processing, Philadelphia, PA, pp. 34–37.


Download ppt "Automatic Speech Recognition"

Similar presentations


Ads by Google