74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
PHONETICS AND PHONOLOGY
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Natural Language Processing - Speech Processing -
Speech and Natural Language Processing Christel Kemke Department of Computer Science University of Manitobe Presentation for Human-Computer Interaction.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Overview What is in a speech signal?
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Introduction to Speech Production Lecture 1. Phonetics and Phonology Phonetics: The physical manifestation of language in sound waves. –How sounds are.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
COMP 4060 Natural Language Processing Speech Processing.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Natural Language Understanding
Representing Acoustic Information
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.
Source/Filter Theory and Vowels February 4, 2010.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Speech Signal Processing
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Jacob Zurasky ECE5526 – Spring 2011
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
The end of vowels + The beginning of fricatives November 19, 2012.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 8.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Vowel Acoustics March 10, 2014 Some Announcements Today and Wednesday: more resonance + the acoustics of vowels On Friday: identifying vowels from spectrograms.
1 Speech Processing. 2 Speech Processing: Text:  Spoken language processing Huang, Acero, Hon, Prentice Hall, 2000  Discrete time processing of speech.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Performance Comparison of Speaker and Emotion Recognition
Unit 5 Phonetics and Phonology. Phonetics Sounds produced by the human speech organs are called the “phonic/auditory medium” Phonetics is the study of.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
P105 Lecture #27 visuals 20 March 2013.
Acoustic Phonetics 3/14/00.
Speech Recognition Created By : Kanjariya Hardik G.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Course Name: Speech Recognition Course Number: Instructor: Hossein Sameti Department of Computer Engineering Room 706 Phone:
PHONETICS AND PHONOLOGY
Guided By, DINAKAR DAS.C.N ( Assistant professor ECE ) Presented by, ARUN.V.S S7 EC ROLL NO: 2 1.
Speech Recognition
ARTIFICIAL NEURAL NETWORKS
Kocaeli University Introduction to Engineering Applications
Remember me? The number of times this happens in 1 second determines the frequency of the sound wave.
Chapter 2 Phonology.
Artificial Intelligence 2004 Speech & Natural Language Processing
ROBOT CONTROL WITH VOICE
Presentation transcript:

Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural Language Processing written text as input sentences (well-formed or not) Spoken Language Understanding analysis of spoken language (transcribed speech)

Speech & Natural Language Processing Areas in Speech Recognition Signal Processing Phonetics Word Recognition Areas in Natural Language Processing Morphology Grammar & Parsing (syntactic analysis) Semantics Pragamatics Discourse / Dialogue Spoken Language Understanding

Speech Production & Reception Sound and Hearing change in air pressure  sound wave reception through inner ear membrane / microphone break-up into frequency components: receptors in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) → Frequency Spectrum perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)

Phoneme Recognition: HMM, Neural Networks Phonemes Acoustic / sound wave Filtering, Sampling Spectral Analysis; FFT Frequency Spectrum Features (Phonemes; Context) Grammar or Statistics Phoneme Sequences / Words Grammar or Statistics for likely word sequences Word Sequence / Sentence Speech Recognition Signal Processing / Analysis

Speech Signal Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “ windows ” Characteristics of a Speech Signal  formants - strong frequency components; characterize e.g. vowels, gender of speaker; dark stripe in spectrum  pitch – fundamental frequency (baseline for higher frequency harmonics like formants)  place of articulation (recognition model based on model of vocal tract)  change in frequency distribution

Video of glottis and speech signal in lingWAVES (from

Speech Signal Analog-Digital Conversion of Acoustic Signals → Sampling Analysis of Signal in Time Frames (“windows”) Characteristics of a Speech Signal  formants - strong frequency components; characterize e.g. vowels, gender of speaker; dark stripe in spectrum  pitch – fundamental frequency (baseline for higher frequency harmonics like formants)  place of articulation (recognition model based on model of vocal tract)  change in frequency distribution

Speech Recognition Characteristics Speech Recognition vs. Speaker Identification Speaker-dependent vs. speaker independent Single word vs. continuous speech Large vs. small vocabulary

Additional References Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice- Hall, NJ, 2001.