VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES Roberto Pieraccini, CTO, Tell-Eureka Corporation 535 West 34 th Street New York, NY 10001 +1 646.

Slides:



Advertisements
Similar presentations
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
Advertisements

VoiceXML: A Field Evaluation By: Kristy Bradnum Supervisor: Peter Clayton Presented in partial fulfilment of the CS Honours Project.
© 2007 Avaya Inc. All rights reserved. Interactive Voice and Video Response Applications Dr. Valentine C. Matula
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Automatic Switchboard Operator Luboš Šmídl, Tomáš Valenta Department of Cybernetics Faculty of Applied Sciences University of West Bohemia in Pilsen.
Convergys Confidential and Proprietary Copyright © 2007 Convergys Corporation To Confirm, or Not to Confirm … That is the Question Kristie Goss Convergys.
1 August 9, David Claiborn SLM Tuning: Lessons Learned.
Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM.
Which development tool is right for you? Commercial Tools John Fuentes – Principal Solutions Architect
Managing Complexity: 3rd Generation Speech Applications Roberto Pieraccini August 7, 2006.
© GyrusLogic, Inc AI and Natural Language VUI design Peter Trompetter, VP Global Development GyrusLogic, Inc. Tuesday, August 21 at 1:30 PM - C202.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)
Automatic Speech Recognition: An Overview
Revolution Yet to Happen1 The Revolution Yet to Happen Gordon Bell & James N. Gray (from Beyond Calculation, Chapter 1) Rivier College, CS699 Professional.
Automatic Speech Recognition: An Overview
Alicia Abella AT&T Labs – Research Florham Park, New Jersey
1 Automatic Speech Recognition: An Overview Julia Hirschberg CS 4706 (special thanks to Roberto Pieraccini)
CONFIDENTIAL | © Nuance Communications, Inc. All rights reserved. ENTERPRISE SOLUTIONS 1 Parteek Singh.
CONNECT EVERYTHING. ACHIEVE ANYTHING. ™ Top Ten Enterprise Service Bus (ESB) Myths Gordon Van Huizen CTO, Sonic Software March 17, 2005.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
1 A Practical Rollout & Tuning Strategy Phil Shinn 08/06.
IBM Proof of Technology Discovering the Value of SOA with WebSphere Process Integration © 2005 IBM Corporation SOA on your terms and our expertise WebSphere.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
© GyrusLogic, Inc A Conversational System That Reduces Development Cost Luis Valles, Chief Scientist GyrusLogic, Inc. Monday, August 7 at 1:30 PM.
Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services
Introduction to Automatic Speech Recognition
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
SpeechCycle Confidential Confidential 1 Optimizing Natural Language Interfaces: No Data Like More Data SpeechTEK New York, 2007 Jonathan Bloom & Roberto.
Building Applications with Vision Media Servers Getting Your Ideas to Market Fast David Asher, Director, Product Marketing, NMS Michael Kuperstein, CEO,
Voice Communications: Moving from Appliance to Software Serge Forest – CEO Paraxip, a Sangoma Company.
VOICE USER INTERFACE BY R.SELVI K.PRIYA I-MCA. INTRODUCTION voice portal can be defined as “speech  enabled access to Web based information”. In other.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas P. Way Department of Computing Sciences, Villanova.
Conversational Applications Workshop Introduction Jim Larson.
1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle
© GyrusLogic, Inc. Improved call completion with Natural Language Processing Peter Trompetter Vice President Global Development
VUI: Tuning the VUI, Not Tuning The Parameters Alex Massie, Director of Business Solutions, ClickFox Tuesday, August 8, 2006.
Generational Computing CSCI 1060 Fall CSCI 1060 — Fall 2006 — 2 First Generation Large computers, difficult to program Primarily used by scientists.
Alignment (horizontal / vertical): in center! 0cm (center) Voxeo VoiceObject Overview.
Integrating VoiceXML with SIP services
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
From Objects to Assets: The Fungibility of Knowledge Christopher W. Higgins, Esq.
1 Components of Spoken Dialogue Systems Julia Hirschberg LSA 353.
Voice-based generic UPnP Control Point Andreas BobekUniversity of Rostock Faculty of Computer Science and Electrical Engineering Andreas Bobek, Hendrik.
Introduction to Computational Linguistics
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Developing an Effective Wireless Middleware Strategy.
Phone Mashups Integrating Telephony & the Web Irv Shapiro CEO, Ifbyphone, Inc.
© 2013 by Larson Technical Services
Bringing Speech Technologies to the Enterprise Ken Waln C.T.O. and V.P. of Engineering Edify Corporation
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Speech-World Call Center Applications Panel Kipton Heuertz Vice President Product Marketing, Americas Eicon Networks Corporation.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 CustomerSoft ESP Contact Operations.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
A service Oriented Architecture & Web Service Technology.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Automatic Speech Recognition: An Overview
Automatic Speech Recognition, Text-to-Speech, and Natural Language Understanding Technologies Julia Hirschberg LSA.
Presentation transcript:

VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES Roberto Pieraccini, CTO, Tell-Eureka Corporation 535 West 34 th Street New York, NY

The vision

Recreating the Speech Chain DIALOG SEMANTICS SYNTAX LEXICON MORPHOLOG Y PHONETICS VOCAL-TRACT ARTICULATORS INNER EAR ACOUSTIC NERVE SPEECH RECOGNITION DIALOG MANAGEMENT SPOKEN LANGUAGE UNDERSTANDING SPEECH SYNTHESIS

The technology

Talking Machines: First Steps into Spoken Language Technology Joseph Faber (1835) Von Kempelen (1791) Homer Dudley Bell Labs (1939)

Speech Recognition: the Early Years  1952 – Automatic Digit Recognition (AUDREY)  Davis, Biddulph, Balashek (Bell Laboratories)

1960’s – Speech Processing and Digital Computers  AD/DA converters and digital computers start appearing in the labs James Flanagan Bell Laboratories

The Illusion of Segmentation... or... Why Speech Recognition is so Difficult m I n & m b & r i s e v & n th r E n I n z E r o t ü s e v & n f O r MY NUMBER IS SEVEN THREE NINE ZERO TWO SEVEN FOUR NP VP (user:Roberto (attribute:telephone-num value: ))

The Illusion of Segmentation... or... Why Speech Recognition is so Difficult m I n & m b & r i s e v & n th r E n I n z E r o t ü s e v & n f O r MY NUMBER IS SEVEN THREE NINE ZERO TWO SEVEN FOUR NP VP (user:Roberto (attribute:telephone-num value: )) errors Intra-speaker variability Noise/reverberation Coarticulation Context-dependency Word confusability Word variations Speaker Dependency Multiple Interpretations Limited vocabulary Ellipses and Anaphors rules

1969 – Whither Speech Recognition? […] General purpose speech recognition seems far away. Social-purpose speech recognition is severely limited. It would seem appropriate for people to ask themselves why they are working in the field and what they can expect to accomplish. […] It would be too simple to say that work in speech recognition is carried out simply because one can get money for it. That is a necessary but no sufficient condition. We are safe in asserting that speech recognition is attractive to money. The attraction is perhaps similar to the attraction of schemes for turning water into gasoline, extracting gold from the sea, curing cancer, or going to the moon. One doesn’t attract thoughtlessly given dollars by means of schemes for cutting the cost of soap by 10%. To sell suckers, one uses deceit and offers glamour. […] Most recognizers behave, not like scientists, but like mad inventors or untrustworthy engineers. The typical recognizer gets it into his head that he can solve “the problem.” The basis for this is either individual inspiration (the “mad inventor” source of knowledge) or acceptance of untested rules, schemes, or information (the untrustworthy engineer approach). The Journal of the Acoustical Society of America, June 1969 J. R. Pierce Executive Director, Bell Laboratories

: The ARPA SUR project  In spite of the anti-speech recognition campaign headed by the Pierce Commission ARPA launches into a 5 year program on Spoken Understanding Research  REQUIREMENTS: 1000 word vocabulary, 90%understanding rate, near real time on a 100 MIPS machine  4 Systems built by the end of the program  SDC (24%)  BBN’s HWIM (44%)  CMU’s Hearsay II (74%)  CMU’s HARPY (95% times real time!)  HARPY was based on an engineering approach  search on a network representing all the possible utterances  Lack of a scientific evaluation approach  Speech Understanding: too early for its time The project was not extended. Raj Reddy -- CMU LESSON LEARNED: Hand-built knowledge does not scale up Need of a global “optimization” criterion

Vintage Speech Recognition

1970’s – Dynamic Time Warping The Brute Force of the Engineering Approach TEMPLATE (WORD 7) UNKNOWN WORD T.K. Vyntsyuk (1969) H. Sakoe, S. Chiba (1970) Isolated Words Speaker Dependent Connected Words Speaker Independent Sub-Word Units

1980s -- The Statistical Approach  Based on work on Hidden Markov Models done by Leonard Baum at IDA, Princeton in the late 1960s  Purely statistical approach pursued by Fred Jelinek and Jim Baker at IBM T.J.Watson Research  Foundations of modern speech recognition engines Fred Jelinek S1S1 S2S2 S3S3 a 11 a 12 a 22 a 23 a 33 Acoustic HMMs Word Tri-grams  No Data Like More Data  Whenever I fire a linguist, our system performance improves (1988)  Some of my best friends are linguists (2004) Jim Baker

– The statistical approach becomes ubiquitous  Lawrence Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceeding of the IEEE, Vol. 77, No. 2, February 1989.

1980s-1990s – The Power of Evaluation Pros and Cons of DARPA programs + Continuous incremental improvement - Loss of “bio-diversity” SPOKEN DIALOG INDUSTRY SPEECHWORKS NUANCE MIT SRI TECHNOLOGY VENDORS PLATFORM INTEGRATORS APPLICATION DEVELOPERS HOSTING TOOLS STANDARDS

The business of speech

Voice User Interface (VUI) Design—the Quantum Leap in Dialog Systems  The WildFire Effect  Change of perspective: From technology driven to user centered  RESEARCH: Natural Language free form  Commercial: Task completion and usability.  Persona: the personality of the application (TTS vs. Recording)  Speech recognition accuracy is important, but success is determined by the VUI.  The importance of a repeatable, streamlined, teachable, development process

The Speech Application Lifecycle requirements VUI design usability 1 VUI development speech science high level system design system engineering integration partial deployment full deployment Analyst VUI Designer Speech Scientist VUI Designer Architect, App Developer Engineer Project Manager

Get Origin Account Get Destination Account Get Amount Enter Transfer amount > origin account? Play Wrong Amount Message YES Play Confirmation confirmed? What is wrong? Go to Main Menu NO YES NO amount destination account origin account Voice User Interface Design Get AmountInteraction Module PROMPTS TypeWordingSource InitialPlease say the amount you would like to transfer from yourget_amount_I_1.wav TTS to yourget_amount_I_2.wav TTS in dollars and cents.get_amount_I_3.wav Retry 1Please say the amount you would like to transfer from yourget_amount_I_1.wav TTS to yourget_amount_I_2.wav TTS in dollars and cents.get_amount_I_3.wav Retry 2Please say the amount you would like to have transferred, like one hundred dollars and fifty cents.get_amount_R_2_1.wav Timeout 1 I'm sorry, I didn't hear you.get_amount_T_1_1.wav Please say the amount you would like to transfer from yourget_amount_I_1.wav TTS to yourget_amount_I_2.wav TTS Timeout 2 I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents.get_amount_T_2_1.wav Help Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents.get_amount_H.wav ACTIONS CONDITIONACTION if amount greater than amount in Go to "Play Wrong Amount Message" elseGo to "Play Confirmation"

Speech Science: Tuning for performance Recognition Correctly Recognize Mis- recognize In Vocabulary Out of Vocabulary Falsely Reject False rejection Falsely Accept Correctly Reject Correct rejection False acceptance - out Accept Confirm False acceptance - in False confirmation Accept Confirm Correct acceptance Correct confirmation

Speech Science: Tuning for performance utt#sub-err%fa-err%fr-err%rej%OOV%fa-oov% WaitPowerBothUp WaitHowMuchSnow MissingOneChannel WPAllChannels PictureBack WaitFindInputSource PictureProb DM Utt# = Number of utterances Sub-err% = percent of in-voc utterances wrongly recognized Fa-err% = percent of utterances wrongly accepted Fr-err% = percent of utterances wrongly rejected Rej% = total percent of all utterances rejected OOV% = percent of out-voc utterances Fa-oov% = percent of out-voc utterances wrongly accepted ACTION -Prioritize grammars that need improvement -Use transcriptions to improve grammars

The Architectural Evolution of Spoken Dialog Native Code Proprietary IVR Systems Standard Clients (VoiceXML) Standard Application servers

Telephony Platform The Voice Web Web Server Voice Browser Internet ASR TTS Telephone VoiceXML /SALT MRCP SSML, SRGF EMMA? SCXML? CCXML

The Evolution of the Interface and the Research-Industry Chasm Natural Language Directed Dialog Research Systems a-la DARPA Communicator Small Vocabulary Menu Based Large Vocabulary, Dialog Modules SLU: Statistical Language Understanding Spoken dialog as an anthropomorphic system Spoken dialog as a tool

The evolution of the market and the industry TECHNOLOGY VENDORS SPEECH RECOGNITION, TTS PLATFORM INTEGRATORS IVR, VoiceXML, CTI,… TOOLS – AUTHORING, TUNING, PREPACKAGED APPLICATIONS APPLICATION DEVELOPERS PROFESSIONAL SERVICES HOSTING 600 to 1,000M$ revenue > 8000 apps worldwide New evolving standards guarantee interoperability of engines and platforms.

Third generation dialog systems LOW MEDIUM HIGH COMPLEXITY FLIGHT STATUS STOCK TRADING PACKAGE TRACKING FLIGHT/TRAIN RESERVATION BANKING CUSTOMER CARE TECHNICAL SUPPORT 1 st Generation INFORMATIONAL 2 nd Generation TRANSACTIONAL 3 RD Generation PROBLEM SOLVING

Spoken Dialog goes to Saturday Night Live