Download presentation
Presentation is loading. Please wait.
Published byMerryl Phelps Modified over 9 years ago
1
BY KALP SHAH Sentence Recognizer
2
Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written entirely in java programing language. Sphinx-4 started out as a port of Sphinx-3 to the Java programming language, but evolved into a recognizer designed to be much more flexible than Sphinx-3, thus becoming an excellent platform for speech research.
3
Introductions Speech recognition is known as automatic speech recognition (computer speech recognition) which converts spoken words to text. There many different techniques has been developed for the speech recognition, but one of the most efficient and accurate technique is speech recognition using sphin4.
4
Sphinx4 Sphinx-4 is a very flexible system capable of performing many different types of recognition tasks. As such, it is difficult to characterize the performance and accuracy of Sphinx-4 with just a few simple numbers such as speed and accuracy. Sphinx-4 is a flexible, modular and pluggable source to help foster new innovations in the core research of hidden Markov model (HMM) recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore.
5
Architecture
6
Sphinx4 The Sphinx-4 is been designed with a top level of flexibility and modularity. There are three basic parts in the sphinx4 1) The front end 2) The decoder 3) The linguistic
7
Sphinx4 Front end: It takes one or more input signals means speech of human and converts them into a sequence of Features.
8
Sphinx4 The Front End comprises one or more parallel chains of replaceable communicating signal processing modules called Data Processors. It supporting multiple data allows simultaneous computation of different types of parameters from the same or different input signals. This enables the creation of systems that can simultaneously decode using different parameter types, such as MFCC even parameter types derived from non-speech signals such as video.
9
Sphinx4 Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. We can get it from a type of cepstral representation of the sound clip. The change between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally placed on the mel scale, which helps to get the human auditory system's response more closely than the linearly- spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound.
10
Sphinx4 Decoder IT has one search manger. The Search Manager uses the Features from the Front End and the Search Graph from the Linguist to perform the actual decoding, generating Results. At any time before to or in between the recognition process, the application can assign Controls to each of the modules, effectively becoming a partner in the recognition process. The Decoder merely tells the Search Manager to recognize a set of Feature frames. At each step of the process, the Search Manager creates a Result object that contains all the paths that have reached a final non-emitting state.
11
Sphinx4 Linguistic The Linguist converts any type of standard language model, with all pronunciation data from the Dictionary and structural data from one or more sets of Acoustic Models, into a Search Graph. The Linguist has three parts : 1) the LanguageModel, 2) the Dictionary 3) the AcousticModel
12
Sphinx4 Language Model The Language Model module of the Linguist provides word-level language structure, which can be represented by any number of pluggable implementations. These implementations typically fall into one of two categories: 1) graph-driven grammars 2) stochastic N-Gram models.
13
Sphinx4 Dictionary The Dictionary has numbers of words found in the Language Model. This pronunciations break words into sequences of sub-word units found in the Acoustic Model. The Dictionary interface also supports the classification of words and allows for a single word to be in multiple classes.
14
Acoustic Model The Acoustic Model gives a mapping between a set of speech and an HMM which can be scored opposite to incoming features provided by the Front End.
15
Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.