Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Dialogs with EMMA

Similar presentations


Presentation on theme: "Improving Dialogs with EMMA"— Presentation transcript:

1 Improving Dialogs with EMMA
Deborah A. Dahl Principal, Conversational Technologies Chair, W3C Multimodal Interaction Working Group SpeechTEK 2009 August 24-27, 2009 New York

2 What information does speech recognition contribute to a dialog system?

3 What the person said and what it means
The basics: What the person said and what it means But there’s a lot more information available! VoiceXML represents some of this – Other alternatives Confidence

4 Information about the Context
When they said it How long it took to say Other possibilities about what they might have said Recognizer’s confidence This additional information can be extremely useful We’ll talk about a few ideas today

5 How can we get more information?
Low level recognition API’s like SAPI or JSAPI can provide a lot of other information But not all recognizers support these API’s They can be very complex to use They only support speech, not multimodal inputs EMMA can provide much more information

6 A New W3C Standard: EMMA EMMA (Extensible MultiModal Annotation) provides a Standard XML-based Multimodal way to represent detailed information about user inputs and their contexts from speech, handwriting, typing, biometrics, haptics and many other modalities

7 EMMA adds Timestamps Processor Source Signal Endpoints
Application-specific information Grammar Groupings of related inputs Stages of interpretation – speech, natural language Alternatives – nbest or lattices

8 An EMMA Document from a Speech Recognizer
<emma:emma version="1.0" xmlns=" xmlns:emma=" xmlns:xsi=" xsi:schemaLocation=" <emma:info> <application>music</application> </emma:info> <emma:one-of emma:dialog-turn="1" emma:duration="1860" emma:end=" " emma:function="dialog" emma:grammar-ref="gram-4" emma:lang="en-us" emma:medium="acoustic" emma:mode="speech" emma:start=" " emma:verbal="true" id="oneof2"> <emma:interpretation emma:confidence=" " emma:tokens="Beethoven third symphony" id="interp10"> <composer>ludwig_van_beethoven</composer> <name>opus 55</name> </emma:interpretation> <emma:interpretation emma:confidence=" " emma:tokens="Beethoven's ninth symphony" id="interp8"> <name>opus 125</name> </emma:one-of> </emma:emma>

9 Not very human-friendly…

10 But, since it’s an XML language,
many standard tools can process it and provide useful visualizations

11 How can we use the additional information available in EMMA to improve dialogs?

12 1. Improving dialogs in real time
Example: Timestamps, along with the words spoken, can be used to compute speech rate dialog can then be speeded up or slowed down to accommodate the user

13 Log Analysis Timestamps can tell you how long it took for the person to start talking after the end of the prompt a longer time might indicate that the person was confused by the prompt Looking at confidence across semantics could indicate problems with specific words

14 Test Example 300 EMMA documents from a demo music-playing application
Play something by Beethoven I’d like to hear Mozart Brandenburg Concertos EMMA documents imported into Excel 2007

15 File of Music Queries in Excel
Duration Start End Confidence Tokens Composer Action Name Artist 2123 something by Beethoven ludwig_van_beethoven anything I want to hear something by Beethoven play I want to listen to Beethoven

16 Problem: Speech Rate Some users think that the application speaks too slowly Other users think that it speaks too fast If we dynamically adjust the system’s speech rate to the user’s speech rate, we can accommodate both kinds of users

17 Speech Rate Data from Music Example
Bimodal distribution may point to two kinds of users Match UI to different kinds of users Slow down for novices Speed up For experts Words per minute

18 Calculating Users’ Speech Rate in real time from EMMA
EMMA provides the words that the user spoke and the duration of the user’s speech # tokens/duration in minutes = words per minute We can measure the user’s speech rate and match the system’s speech rate in real time

19 Log Analysis Simple yet powerful log analysis can be done by using EMMA with common tools like spreadsheets

20 Problem: Application Performance in a Specific State has Deteriorated
More misrecognitions More noinputs

21 EMMA timestamps tell when the user has started speaking
If we know when the prompt ended, we can measure the lag between the end of the prompt and the beginning of speech Longer times indicate uncertainty

22 Original Distribution of Start of Speech vs. Prompt
ASR timeout Barge-in Start of speech relative to prompt Length of full prompt

23 Distribution with New Prompt
Uh oh! People are waiting too long to speak! ASR timeout ASR is timing out too often Barge-in Start of speech relative to prompt Length of full prompt

24 Conclusion – the new prompt may be complex or confusing

25 Another Example: Confidence for Different Semantics
What responses have consistently lower confidences? Can the grammar or dictionary be tuned, or prompts clarified to get better inputs?

26 Problem: Accuracy needs to be improved

27 EMMA makes it easy to compare confidences for different semantics
Maybe the problem lies just with certain requests? If so, are those requests frequent?

28 Confidence for Different Semantics
Should there be a dictionary entry for “Beethoven”?

29 Wolfgang Amadeus Mozart
Frequency of Requests Johann Christian Bach 12% Wolfgang Amadeus Mozart 14% Johann Sebastian Bach 8% Ludwig van Beethoven 66%

30 So, we do want to look more closely at Beethoven!

31 Conclusion The additional information in EMMA can be useful for both real time and later analysis

32 There’s still more information in EMMA!
Alternatives the nbest list is traditional I want to hear beethoven play beethoven play bach I’d like mozart But nbest is very verbose and coarse-grained

33 Two Ways to Represent Alternatives in EMMA
Traditional Nbest Lattice: A compact representation of many alternatives

34 A lattice can provide detailed information about each word or concept, including
Start and end time Confidence Semantic interpretation

35 Finally, Extensions Information not in standard EMMA can be added using <info> <emma:info> <state>ask_for_music</state> </emma:info> <prompt start= “ ” end=“ ” tokens=“what music would you like?”/>

36 Why do we need a standard
Why do we need a standard? These kinds of analysis can be done in proprietary ways but are much easier with a standard With a standard, you aren’t tied to a certain recognizer’s analysis tools Third party analysis tools are feasible

37 Organizations with EMMA Implementations
ATT Avaya Conversational Technologies Deustche Telekom DFKI Kyoto Institute of Technology Loquendo Microsoft Nuance University of Trento

38 Available Implementations
AT&T Speech Mashup -- Cloud-based ASR Conversational Technologies NLWorkbench– tools for illustrating principles of natural language processing At SpeechTEK, Thursday’s Natural Language Processing Tutorial

39 More Information EMMA specification

40 Summary EMMA provides a rich, standard, and easy to use representation of users’ inputs This information can be exploited to improve dialogs Improvements can be made in both real-time and after the fact We’ve seen a few examples, but there are many more possibilities


Download ppt "Improving Dialogs with EMMA"

Similar presentations


Ads by Google