Tuning Your Application: The Job’s Not Done at Deployment Monday, February 3, 2003 Developing Applications: It’s Not Just the Dialog.

Slides:



Advertisements
Similar presentations
Olga Boltneva Marina Belousova
Advertisements

Telephony Speech Recognition Application Testing Presentation for IEEE SCV Signal Processing Society March 8, 2004 Copyright CoAssure, Inc., 2004.
Continuous Improvement For Cedar Rapids Community Schools Support Services Action Research Teams Susan Leddick, Jay Marino November 1, 2005.
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Microsoft Dynamics CRM Online Choice Begins Today! Ralph R. Zerbonia President Universe Central Corporation.
Convergys Confidential and Proprietary Copyright © 2007 Convergys Corporation To Confirm, or Not to Confirm … That is the Question Kristie Goss Convergys.
Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
1 Profit from usage data analytics: Recent trends in gathering and analyzing IVR usage data Vasudeva Akula, Convergys Corporation 08/08/2006.
Process Control Charts An Overview. What is Statistical Process Control? Statistical Process Control (SPC) uses statistical tools to observe the performance.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
SpeechTEK August 22, 2007 Better Recognition by manipulation of ASR results Generic concepts for post computation recognizer result components. Emmett.
Building a Learning Progression Rubric Denver Public Schools Assessment, Research and Evaluation, 2015.
Convergys Confidential and Proprietary Copyright © 2007 Convergys Corporation Meeting Business Goals with Speech Speech & Self-Service Strategy.
Managing Complexity: 3rd Generation Speech Applications Roberto Pieraccini August 7, 2006.
Confidence Measures for Speech Recognition Reza Sadraei.
Managing Speech Projects for Maximum Efficiency Christoph Mosing, Vice President of Professional Services, Envox Worldwide.
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
Ch 11 Cognitive Walkthroughs and Heuristic Evaluation Yonglei Tao School of Computing and Info Systems GVSU.
User and Task Analysis Howell Istance Department of Computer Science De Montfort University.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Usability and Evaluation Dov Te’eni. Figure ‎ 7-2: Attitudes, use, performance and satisfaction AttitudesUsePerformance Satisfaction Perceived usability.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
The Habits of Highly Effective Speech Development Teams: What You Don’t Know May Be Hurting Your Projects Melanie D. Polkosky, Ph.D., CCC-SLP Social/Cognitive.
Performance Management (Best Practices) REF: Document ID
Review an existing website Usability in Design. to begin with.. Meeting Organization’s objectives and your Usability goals Meeting User’s Needs Complying.
Example 16.3 Estimating Total Cost for Several Products.
TEL 355: Communication and Information Systems in Organizations Speech-Enabled Interactive Voice Response Systems Professor John F. Clark.
Listening Methods In-Depth IVR and End to End Call Analytics What is the Real Performance of My Voice Self-Service Solution? Can I compare my IVR Design.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Exploring Formulas.
1 A Practical Rollout & Tuning Strategy Phil Shinn 08/06.
Graduate Program in Business Information Systems
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
1. Topics to be discussed Introduction Objectives Testing Life Cycle Verification Vs Validation Testing Methodology Testing Levels 2.
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
Super-charge Your Speech Application With Cross- Functional Reporting Techniques Bjorn Austraat Director, Solution Delivery TuVox, Inc. August 9 th, 2006.
Roster Verification RV Presentation to School Administrators Spring 2012.
Description of Strategies
PG 1 Caroline V Nelson Speech Solutions Team Technical Lead Nortel Enterprise Multimedia Professional Services August 20, 2007 Success Criteria for Speech.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Speech-Enabled.NET Framework Application for CIMS Murat Semerci Çağdaş Kayra Akman
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Create a Resume Employers Want to See! Greater Atlanta Job Fair Candidate Seminar.
Quality Software Project Management Software Size and Reuse Estimating.
Performance Management (Best Practices) REF: Document ID
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
1 Using Conditional Formatting & Data Validation Applications of Spreadsheets.
By Godwin Alemoh. What is usability testing Usability testing: is the process of carrying out experiments to find out specific information about a design.
DIFFERENTIATION A straight forward guide. Define the term  Differentiation-  The action or process of differentiating.
Cervion Systems Customer Service Customer Service Overview.
M ARKET R ESEARCH Topic 3.1. W HAT IS MARKET RESEARCH ? The process of gaining information about customers, products, competitors etc through the collection.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Human Computer Interaction Lecture 21 User Support
The M&M Mystery National University.
LEARNER MISTAKES Гайнуллин Гусман Салихжанович,
Evaluation: Analyzing results
Software Reliability Definition: The probability of failure-free operation of the software for a specified period of time in a specified environment.
MSP Le t’s Get Started!.
Spoken Dialogue Systems
Microsoft Word - Formatting Pages
Spoken Dialogue Systems
How To Be More Assertive
Where Intelligence Lives & Intelligence Management
Polytone Convey volume and emotion through text. By: A Team
Presentation transcript:

Tuning Your Application: The Job’s Not Done at Deployment Monday, February 3, 2003 Developing Applications: It’s Not Just the Dialog

Presentation Overview 1.Potential “Non-Recognizer” Errors 2.System Evaluation Metrics 3.Tuning/Testing Tools Speech applications face perception problems that revolve around numerous “non-recognizer” errors.

Types of Potential Errors  Grammar Errors  Prompt Errors  User Errors  Pronunciation Errors

Types of Potential Errors: Grammar Errors Program Error – Application fails to perform as designed (perhaps a bug or incorrect linkage) Prediction Error – Grammar set does not accurately reflect all the words a caller might use

Types of Potential Errors: Prompt Errors Vague Prompts – Prompt doesn’t provide enough information to help the caller complete their goal Redundant Prompts – Prompt continually repeats information that can easily be inferred Lengthy Prompts – Prompt is too long so that a customer loses track of their options Misleading Prompts – Prompt presents choices where the language may lead a call to an incorrect/inappropriate response

Types of Potential Errors: Prompt/Grammar Coordination Problems - Prompts and grammar sets do not match Pronunciation Variation – Unanticipated pronunciations cause mismatches

Types of Potential Errors: Issues on the Caller’s Side Loud Background Noises Speech directed at a person in the background instead of the system Bad phone or connection Unintentional speech (like exclamations) or speech-like noise (coughs, breath)

System Evaluation Metrics 1.Non-Programmatic Evaluations Customer Satisfaction Word Error Rate (WER) Dialog System Metrics 2.Programmatic Evaluations PARADISE M. A. Walker, D. Litman, C. A. Kamm, and A. Abella. PARADISE: A general framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association of ComputationalLinguistics, ACL/EACL 97, pages , 1997.

Non-Programmatic Evaluation Customer Satisfaction Relies on customer surveys Poses problem with accuracy and missing/inaccurate information Low Response Rate Typically, the most neglected evaluation, yet the most important

Non-Programmatic Evaluation Word Error Rate (WER)/Word Accuracy (WA) WER measures the percentage of words incorrectly recognized WA measures the percentage of words correctly recognized, regardless of insertions, deletions, etc. WER and WA do not measure CPU load, response times, task completion rates, etc., all critical measures for a dialog system

Non-Programmatic Evaluation Dialog System Metrics These are not related to customer satisfaction or WER which generally require someone to count frequency Some typical questions: How often do callers hang-up in the middle of the call flow? How often/soon do callers “pound out”? How often does a caller use a word not in the grammar? How many times does the system ask for confirmation from the caller or the caller corrects the system?

Programmatic Evaluation Research Approach: PARADISE PARADISE is a complex, research approach to perform the same analysis as the non- programmatic approach. It uses a combination of customer satisfaction surveys along with detailed statistical models to produce the measure of the dialog system performance.

Programmatic Evaluation Things You Can Do In PARADISE Measure performance correlations between customer satisfaction and different dialog strategies Measure performance correlations between parts of the system and all of the system Measure performance correlations between very simple and very complex dialog systems/tasks

Programmatic vs Non-Programmatic Solutions PARADISE requires a fairly sophisticated level of programming support, as well as extensive knowledge of statistical analysis The Non-Programmatic Metrics are relatively straightforward to calculate and require very little programming

Tuning/Testing Tools Does the speech recognition company you either currently use or plan on using, provide them? Will the tools be detailed enough to allow someone within your company to evaluate potential errors and general caller satisfaction? If these tools are not available, your company will need to rely upon caller feedback and the diligence of your application provider?

Thank You!