Speech Recognition Created By : Kanjariya Hardik G.

Slides:



Advertisements
Similar presentations
By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Advertisements

Multimedia: Digitised Sound Data Section 3. Sound in Multimedia Types: Voice Overs Special Effects Musical Backdrops Sound can make multimedia presentations.
Speech Recognition There are different kinds of voice or speech “_______" that take the sounds of your voice and match it with words. The engine is software.
                      Digital Audio 1.
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Chapter 5 Input and Output. What Is Input? What is input? p. 166 Fig. 5-1 Next  Input device is any hardware component used to enter data or instructions.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld.
Voice-enabled Image Identification System Design Aashish P. Shrestha Ming Ming Zheng Multimedia Signal Processing, University of Bridgeport, Connecticut.
Chapter 1: Introduction Business Data Communications, 4e.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Natural Language Understanding
Describe the purpose, components, and use of speech recognition systems.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Kinect Player Gender Recognition from Speech Analysis
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
T raining on Read&Write GOLD Dick Powers
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
1 Computational Linguistics Ling 200 Spring 2006.
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Introduction to Audio. What is "Audio"? Audio means "of sound" or "of the reproduction of sound“. Specifically, it refers to the range of frequencies.
Speech Technologies and VoiceXML try Department of Computer Science National Cheng-Chi University.
Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.
Math 5 Professor Barnett Timothy G. McManus Anthony P. Pastoors.
Introduction to SOUND.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Speech Recognition Speech Recognition lets you speak into a microphone to control your computer. You can give commands that the computer will carry out.
CSCI-100 Introduction to Computing Hardware Part II.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
© 2013 by Larson Technical Services
Basic structure of sphinx 4
Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.
Natural Language Processing (NLP)
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
1 Speech Recognition. 2 Introduction What is Speech Recognition? - Voice Recognition? Where can it be used? - Dictation - System control/navigation -
IIS for Speech Processing Michael J. Watts
Chapter 1: Introduction to audio signal processing KH WONG, Rm 907, SHB, CSE Dept. CUHK,
How can speech technology be used to help people with disabilities?
Natural Language Processing and Speech Enabled Applications
ARTIFICIAL NEURAL NETWORKS
Artificial Intelligence for Speech Recognition
A presentation on Basics of Speech Recognition Systems
Speech Recognition Amit Sharma CSE 8th.
Multimedia: Digitised Sound Data
Introduction to electronic communication systems
Ch.1: Introduction to audio signal processing
Kocaeli University Introduction to Engineering Applications
Signals and Systems Networks and Communication Department Chapter (1)
Command Me Specification
Ms Jennifer - Senior 4 - Data Representation Introduction
Recap In previous lessons we have looked at how numbers can be stored as binary. We have also seen how images are stored as binary. This lesson we are.
Presentation transcript:

Speech Recognition Created By : Kanjariya Hardik G.

Introduction Speech recognition technology has recently reached a higher level of performance and robustness, allowing it to communicate to another user by talking. Speech Recognization is process of decoding acoustic speech signal captured by microphone or telephone,to a set of words. And with the help of these it will recognize whole speech is recognized word by word.

Types of SR There are two main types of speaker models: speaker independent and speaker dependent. Speaker independent models recognize the speech patterns of a large group of people. Speaker dependent models recognize speech patterns from only one person. Both models use mathematical and statistical formulas to yield the best work match for speech. A third variation of speaker models is now emerging, called speaker adaptive. Speaker adaptive systems usually begin with a speaker independent model and adjust these models more closely to each individual during a brief training period.

Speech produces a sound pressure wave which forms an acoustic signal. The microphone – receives the acoustic signal and converts it to an analogue signal. To store the analogue signal, it must be converted to a digital signal. A speech recognizer tries to transform a digitally encoded acoustic signal in a natural language into text in that language. How does it works?..

Speech Waveform/Spectrogram The spectrogram is an alternative way to characterize speech. The louder the sound the greater the amplitude on the y-axis. s p eech l ab Hz s

Speech Recognition Process Flow

Audio input Grammar Acoustic Model Recognized text The major components

It is important to understand that this audio stream is rarely pristine It contains not only the speech data (what was said) but also background noise. This noise can interfere with the recognition process, and the speech engine must handle (and possibly even adapt to) the environment within which the audio is spoken. Audio I/O

Once the speech data is in the proper format, the engine searches for the best match. It does this by taking into consideration the words and phrases it knows about (the active grammars), along with its knowledge of the environment in which it is operating. The knowledge of the environment is provided in the form of an acoustic model. Once it identifies the most likely match for what was said, it returns what it recognized as a text string. Acoustic+Grammer

About SR Engine SR requires a software application "engine" with logic built in to decipher and act on the spoken word. Sound Card –Converts acoustic signal to digital signal. Function of SR Engine- –SR Engine converts these digital signal to phonemes to word.

Different SR engine  CMU Sphinx  Microsoft SAPI  IBM ViaVoice

Decoding process.

Recognition Process Flow Summary Step 1:User Input The system catches user’s voice in the form of analog acoustic signal. Step 2:Digitization Digitize the analog acoustic signal. Step 3:Phonetic Breakdown Breaking signals into phonemes.

Recognition Process Flow Summary Step 4:Statistical Modeling  Mapping phonemes to their phonetic representation using statistics model. Step 5:Matching  According to grammar, phonetic representation and Dictionary, the system returns an n-best list (I.e.:a word plus a confidence score)  Grammar-the union words or phrases to constraint the range of input or output in the voice application.  Dictionary-the mapping table of phonetic representation and word(EX:thu,thee  the)

REPRESENTATION OF SOFTWARE 15

Challenges and Difficulties of SR Speech Recognition is still a very cumbersome problem. Following are the problem…. Speaker Variability Two speakers or even the same speaker will pronounce the same word differently Channel Variability The quality and position of microphone and background environment will affect the output

Current Software Options for PC Dragon Systems – Naturally Speaking Philips – FreeSpeech IBM – ViaVoice Lernout & Hauspie – Voice Xpress