From Word Spotting to OOV Modeling

Slides:



Advertisements
Similar presentations
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Advertisements

Building an ASR using HTK CS4706
Pronunciation Modeling Lecture 11 Spoken Language Processing Prof. Andrew Rosenberg.
CS 551/651: Structure of Spoken Language Spectrogram Reading: Approximants John-Paul Hosom Fall 2010.
Slide number 1 EE3P BEng Final Year Project Group Session 2 Processing Patterns using a Multi- Layer Perceptron (MLP) Martin Russell.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Automatic Dialect/Accent Recognition Fadi Biadsy April 12 th,
CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech.
On-line Learning with Passive-Aggressive Algorithms Joseph Keshet The Hebrew University Learning Seminar,2004.
Tools for Speech Analysis Julia Hirschberg CS4706 Thanks to Jean-Philippe Goldman, Fadi Biadsy.
MITRE © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. What Works, What Doesn’t -- And What Needs to Work Lynette Hirschman Information Technology Center.
LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.
Automatic Speech Recognition
Automatic Speech Recognition Introduction. The Human Dialogue System.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Structure of Spoken Language
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Speech Signal Processing
Speech Recognition Application
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Automata with Output (sketch) We now have ways to formally define languages, and ways to automatically test whether a given string is a member of a language.
1 Phonetics and Phonemics. 2 Phonetics and Phonemics : Phonetics The principle goal of Phonetics is to provide an exact description of every known speech.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Daniel May Department of Electrical and Computer Engineering Mississippi State University Analysis of Correlation Dimension Across Phones.
Quantitative and qualitative differences in understanding sentences interrupted with noise by young normal-hearing and elderly hearing-impaired listeners.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.
Speech Recognition (Part 2) T. J. Hazen MIT Computer Science and Artificial Intelligence Laboratory.
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Potential team members to date: Karen Livescu (presenter) Simon King Florian Metze Jeff Bilmes Articulatory Feature-based Speech Recognition: A Proposal.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
EcoCar User Interface Mid-Semester Presentation Senior Design I March 1, 2012.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-25: Vowels cntd and a “grand” assignment.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Characterizing and Processing Robot-Directed Speech Paulina Varchavskaia Paul Fitzpatrick Cynthia Breazeal MIT AI Lab Humanoid Robotics Group.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
An Efficient Online Algorithm for Hierarchical Phoneme Classification
Structure of Spoken Language
Automatic Transliteration for Japanese-to-English Text Retrieval
Structure of Spoken Language
Conditional Random Fields for ASR
Office 365 and Calendaring Migration Project
Speech Recognition - H02A CHAPTER 1 Introduction
Speech Technology for Language Learning
Jennifer J. Venditti Postdoctoral Research Associate
Structure of Spoken Language
Ian Ramsey C of E School GCSE ICT On the move Final steps.
Audio Books for Phonetics Research
Spoken Language Processing:Summing Up
Phonetics and Phonemics
Review Key Teaching Points
Machine Translation and MT tools: Giza++ and Moses
TEAM ASSIGNMENT #3 Part 1: Top of page
Automatic Speech Recognition
Machine Translation and MT tools: Giza++ and Moses
The quality of choices determines the quantity of Key words
Automatic Speech Recognition
Phonetics and Phonemics
Presentation transcript:

From Word Spotting to OOV Modeling OOV OOV spotting OOV OOV OOV [f r ah m] OOV spotting [t uw] OOV OOV [f r ah m] [w er d] spotting [t uw] OOV [m aa d el ih ng] From Word Spotting to OOV Modeling Paul Fitzpatrick (6345g11) Goal To automatically extract filler vocabulary for word- spotting Why? So language model has something to work with May improve recognition accuracy on keywords Gives earlier payoff in domain-specific training Scenario Start with small lexicon (e.g. 5-50 words) Start with weak language model Bootstrap by clustering filler vocabulary from large collection of untranscribed data

Methodology Run recognizer Extract OOV fragments Identify competition Identify rarely-used additions Remove from lexicon Add to lexicon Update lexicon, baseforms Hypothesized transcript N-Best hypotheses Update Language Model

Results Initial lexicon email, phone, room, office, address Top 10 OOV clusters found (ranked by frequency) 1. n ah m b er 6. p l iy z 2. w eh r ih z 7. ae ng k y uw 3. w ah t ih z 8. n ow 4. t eh l m iy 9. hh aw ax b aw 5. k ix n y uw 10. g r uw p Example sentence hypothesis (w ah t ih z) (ih t er z uw) room (n ah m b er) What is Victor Zue’s room number?