Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Building an ASR using HTK CS4706
Angelo Dalli Department of Intelligent Computing Systems
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
9/20/2004Speech Group Lunch Talk Speaker ID Smorgasbord or How I spent My Summer at ICSI Kofi A. Boakye International Computer Science Institute.
Dynamic Time Warping Applications and Derivation
Why is ASR Hard? Natural speech is continuous
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Introduction to Automatic Speech Recognition
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Study of Word-Level Accent Classification and Gender Factors
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
By Sarita Jondhale1 Pattern Comparison Techniques.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Speech and Language Processing
Psych 156A/ Ling 150: Psychology of Language Learning Lecture 5 Sounds III.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Presentation by Daniel Whiteley AME department
Jacob Zurasky ECE5526 – Spring 2011
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Other Models for Time Series. The Hidden Markov Model (HMM)
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Computational NeuroEngineering Lab
Speaker Identification:
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003

What is Motherese? The way mothers talk to their children when they are young example:

Why use Motherese in Speech Recognition? The exaggerated motherese speech helps infants to distinguish better between phonetic categories examples:

Why use Motherese in Speech Recognition?  Mothers provide a great variety in word pronunciation, simulating many different talkers  If an infant can benefit from this, can an ASR benefit too?

Presentation Outline  Data preparation  Building the recognizers  Training and testing  Performance  Distance measure  Formant analysis  Conclusions

Data preparation  Conversations between mothers and adults, and between mothers and infants  Keyword transcription  Keyword extraction  Screening  Dividing over training and test sets

Keywords BeadKeySheep PotSockTop BootShoeSpoon 

Dividing data into equal sets mother‘bead’‘boot’‘sock’ A                          B                                          set‘bead’‘boot’‘sock’ 1        2        3        4       

Building the recognizers  HTK toolkit  Isolated word recognizers  Speech coding: MFCC, filterbank with 26 channels, 13 coefficients every 10ms  HMM prototype: left-to-right, 8 states, single Gaussians

Training and testing  Flat-start initialization: use global mean and variance of training data for all models, assign equal probability to all states  Repeated embedded training: use all training data to simultaneously update all models

Performance  Infant-directed recognizer worse than adult-directed recognizer on adult-directed speech  But ID recognizer better on AD material than AD recognizer on ID material AD train AD test ID train ID test AD ID

Distance measure  Distance between HMMs λ 0 and λ 1  O T is an observed sequence of length T of feature vectors generated from λ 0  Ergodicity

α-recursion to compute P(O T |λ)  Joint probability b j (o t ) causes underflow  Log probability:

Log of summation  Recursively apply  Make sure that the assumption b>a holds at every step of the recursion

Distances

Formant analysis  Extract vowels from keywords using phone-based speech recognizer  Estimate frequencies of formant 1 and formant 2

Formant 1 vs. formant 2

Conclusions  In general not worthwhile to train an ASR system on motherese speech  More false positives due to reduced selectivity of HMMs  Motherese not required for coverage of all speech sounds; use multiple speakers  Acquiring natural motherese speech is more difficult due to infants’ presence