SpeechTEK August 22, 2007 Better Recognition by manipulation of ASR results Generic concepts for post computation recognizer result components. Emmett.

Slides:



Advertisements
Similar presentations
Learning AiZ Secondary School Learning Leaders Session 6.
Advertisements

Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
TEST-TAKING STRATEGIES FOR THE OHIO ACHIEVEMENT READING ASSESSMENT
Prompts to Support the Use of Strategies To support the control of early reading behaviors: – Read it with your finger. – Did you have enough (or too many)
Helping Your Child Learn to Read
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
KS1 With Miss Parker and Mrs Martin
TECHNOLOGY FOR MOBILE ADVERTISING SEARCH & COMMERCE © 2007 Apptera Inc. Optimizing Software Architecture for Voice Search SpeechTek 2007.
Chapter 3 Listening for intermediate level learners Helgesen, M. & Brown, S. (2007). Listening [w/CD]. McGraw-Hill: New York.
S.T.A.I.R.. General problem solving strategy that can be applied to a range problems.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
 Running are a method of recording a student’s reading behavior. Running Records provide teachers with information that can be analyzed to determine.
Running Records.
Important concepts in software engineering The tools to make it easy to apply common sense!
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Programming Fundamentals (750113) Ch1. Problem Solving
Word Processing. ► This is using a computer for:  Writing  EditingTEXT  Printing  Used to write letters, books, memos and produce posters etc.  A.
Stages of testing + Common test techniques
Effective Questioning in the classroom
 Main Idea/Point-of-View  Specific Detail  Conclusion/Inference  Extrapolation  Vocabulary in Context.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
SAT Prep- Reading Comprehension Strategies- Short Passages
Aims of session Making reading fun Early reading Developing reading
E-LEARNING GUIDELINES. Primary components of e-learning 1. Learner motivation 2. Learner interface 3. Content structure 4. Navigation 5. Interactivity.
The Genre of Test Reading We have read and learned about all types of genre so far. Testing Genre All the practice we have done with these genres has.
Stacey Dahmer Dana Grant
1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle
© Crown copyright 2010, Department for Education These materials have been designed to be reproduced for internal circulation, research and teaching or.
SAMPLE HEURISTIC EVALUATION FOR 680NEWS.COM Glenn Teneycke.
Reading aloud as a literacy learning strategy John Munro
The Writing Section of the SAT Strategies for the Multiple Choice Questions.
Scaffolding Instruction Support for Learners. Adapted (with permission) from: From Apprenticeship to Appropriation : Scaffolding the Development of Academic.
Have you ever had to correct a student? What happened? How did you feel? What did you expect from the student? Do you remember when you were corrected.
Listening Strategies for Tutoring. Listening Students spend 20% of all school related hours just listening. If television watching and just half of the.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Why Design Tips for Sakai? Small teams in higher ed means wearing many hats Not all teams have designers Meant to be a primer for developers doing design.
Tuning Your Application: The Job’s Not Done at Deployment Monday, February 3, 2003 Developing Applications: It’s Not Just the Dialog.
Mistakes and their correction Презентация выполнена учителем английского языка МОУ СОШ № 21 города Ставрополя Борисенко Валентиной Борисовной.
How do I do well on the High School Social Studies Gateway?
Are you ready to play…. Deal or No Deal? Deal or No Deal?
1)Warming up o Students should warm up before taking a test in competition. Elementary tests are good warm ups because they help build confidence.
Task Analysis Methods IST 331. March 16 th
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Tuesday Running A Paper Prototyping Session CS 321 Human-Computer Interaction.
Year 1 Reading Workshop. End of Year Expectations Word ReadingComprehension As above and: Letters and Sounds Phases 4 to 5.  Respond speedily with the.
I have learnt the phonics & remember the sounds { { { { Pronunciation Memory Sentence- building creativity performance autonomy I can repeat new words.
Supporting Early Literacy Learning Session 2 Julie Zrna.
1 Core English 1 Listening Task – p 158 Rhetorical Function Questions.
Let’s talk about timing How important is timing in –reading exercises? –writing exercises? –listening exercises? –speaking exercises? What about handouts?
Working in Teams Communication Support. Communicate Effectively To be a successful team, you need to be able to communicate well together How?
1 Chapter 11 Understanding Randomness. 2 Why Be Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
Running Records 201 Britt Humphries, EdS. Literacy Instructional Facilitator Fort Smith Public Schools brittsliteracyworkshops.pbworks.com.
STUDENT CENTERED What does that mean? STUDENT CENTERED teaching (and learning) –when teaching (including curriculum, goals, activities, etc.) is based.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
BRAIN SCAN  Brain scan is an interactive quiz for use as a revision/ learning reinforcement tool that accompanies the theory package.  To answer a question.
ACT REVIEW. RUN-ONS A complete sentence contains a subject, a verb, and a complete thought. If any of the three is lacking, the sentence is called a.
25 minutes long Must write in pencil Off topic or illegible score will receive a 0 Essay must reflect your original and individual work.
Human Computer Interaction Lecture 21 User Support
LEARNER MISTAKES Гайнуллин Гусман Салихжанович,
Helping Children Learn
Testing testing: developing tests to support our teaching and learning
Friday, October 7, 2016 Write a random number between 1 and 10 on a post- it note on your desk Warm-up Discuss with your group & make a list What games.
Year 2: How to help your child
Programming Fundamentals (750113) Ch1. Problem Solving
Oak Hall Learning Specialists Nourishing Greatness
Bell Teacher Campus July 2018
Presentation transcript:

SpeechTEK August 22, 2007 Better Recognition by manipulation of ASR results Generic concepts for post computation recognizer result components. Emmett Coin Industrial Poet ejTalk, Inc.

SpeechTEK August 22, 2007 Who?  Emmett Coin  Industrial Poet  Rugged solutions via compact and elegant techniques  Focused on creating more powerful and richer dialog methods  ejTalk  Frontiers of Human-Computer conversation  What does it take to “talk with the machine”?  Can we make it meta?

SpeechTEK August 22, 2007 What this talk is about  How applications typically use the recognition result  Why accuracy is not that important, BUT error rate is.  How some generic techniques can sometimes help reduce the effective recognition error rate.

SpeechTEK August 22, 2007 How do most apps deal with recognition?  Specify a grammar (cfg or slm)  Specify a level of “confidence”  Wait for the recognizer to decide what happens (no result, bad, good)  Use the 1 st nbest result when it is “good”  Leave all the errors and uncertainties to the dialog management level

SpeechTEK August 22, 2007 Accuracy: confusing concept  95% accuracy is good, 97% percent is a little better … or is it?  Think of roofing a house.  Do people accurately perceive the ratio of “correct” vs. “incorrect” recognition?  Users hardly notice when you “get it right”. They expect it.  When you get it wrong…

SpeechTEK August 22, 2007 Confidence: What is it?  A sort of “closeness” of fit  Acoustic scores  How well it matches the expected sounds  Language model scores  How much work it took to find the phrase  A splash of recognizer vendor voodoo  How voice-like, admix of noise, etc.  All mixed together and reformed as a number between 0.0 and 1.0 (usually)

SpeechTEK August 22, 2007 Confidence: How good is it?  Does it correlate with how a human would rank things?  Does it behave consistently?  long vs. short utterances?  Different word groups?  What happens when you rely on it?

SpeechTEK August 22, 2007 Can we add more to the model?  We already use  Sounds – the Acoustic Model (AM)  Words – the Language Model (LM)  We can add  Meaning – the Semantic Model (SM)  Rethinking

SpeechTEK August 22, 2007 Strategies that humans use  Rejection  Don’t hear repeated wrong utterances  Also called “skip lists”  Acceptance  Intentionally allowing only the likely utterances  Aka “pass lists”  Anticipation  Asking a question where the answer is known  Sometimes called “hints”

SpeechTEK August 22, 2007 Rejection (skip)  The people and computers should not make the same mistake twice.  Keep a list of confirmed mis-recs  Remove those from the next recognition’s nbest list  But, beware the dark side...  …the Chinese finger puzzle.  Remember: knowing what to reject is based on recognition too!

SpeechTEK August 22, 2007 Acceptance (pass)  It is possible to specify the relative weights in the language model (grammar).  But there is a danger. It is a little like cutting the legs on a chair to make it level. Hasty modifications will have unintended interactions.  Another way is to create a sieve  This has the advantage of not changing the balance of the model. The other parts that do not pass the sieve become a defacto garbage collector.

SpeechTEK August 22, 2007 Anticipation  Explicit  e.g. confirming identity, amounts, etc.  Probabilistic  Dialogs are journeys  Some parts of the route are routine, predictable

SpeechTEK August 22, 2007 What should we disregard?  When is a recognition event truly the human talking to the computer?  The human is speaking  But not to the computer  But saying the wrong thing  Some human is saying something  Other noise  Car horn, mic bump, radio music, etc.  As dialogs get longer we need to politely ignore what we were not intended to respond to

SpeechTEK August 22, 2007 In and Out of Grammar (oog)  The recognizer returned some text  Was it really what was said?  Can we improve over the “confidence”?  Look at the “scores” of the nbest  Use them as a “feature space”  Use example waves to discover clusters in feature space that correlate with “in” and “out” of Vocabulary

SpeechTEK August 22, 2007 Where do we put it?  Where does all this heuristic post analysis go? Out in the dialog?  How can we minimize the cognitive load on the application developer?  We need to wrap up all this extra functionality inside a new container to hide the extra complexity

SpeechTEK August 22, 2007 Re-listening  If an utterance is going to be rejected then try again. (Re-listen to the same wave)  If you can infer a smaller scope then listen with a grammar that “leans” that way.  Merge the nbests via some heuristic  Re-think the combined uttererance to see if it can now be considered “good and in grammar”

SpeechTEK August 22, 2007 Serial Listening  The last utterance is not “good enough”  Prompt for a repeat and listen again (live audio from the user)  If it is “good” by itself then use it  Otherwise, heuristically merge the nbests based on similarities  Re-think the combined uttererance to see if it can now be considered “good and in grammar”

SpeechTEK August 22, 2007 Parallel Listening  Listen on two recognizers  One with the narrow “expectation” grammar  The other with the wide “possible” grammar  If utterance is in both results process the “expectation” results  If not process the “possible” results

SpeechTEK August 22, 2007 Conclusions  Error rate is the metric to watch  There is more information in the recognition result than the 1 st good nbest  Putting conventional recognition inside a heuristic “box” makes sense  The information needed by the “box” is a logical extension of the listening context

SpeechTEK August 22, 2007 Emmett Coin ejTalk, Inc Thank you