Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

Similar presentations


Presentation on theme: "1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006."— Presentation transcript:

1 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006

2 2 Big picture (i.e. thesis statement)  A system of shaping and adaptivity can be used to induce more efficient user interactions with spoken dialog systems.  This strategy can increase efficiency by increasing the amount of user input that is actually understood by the system, leading to increased task completion rates and higher user satisfaction.  This strategy can also reduce upfront training time, thus accelerating the process of reaching optimally efficient interaction.

3 3 This study Speech Graffiti? (target) shapeable? (expanded ) {confsig} User input resul t shaping prompt yes no yes no

4 4 My approach, graphically Speech Graffiti? (target) shapeable? (expanded ) intelligent shaping help User input resul t shaping prompt yes no yes no

5 5 Speech Graffiti  Standardized framework of syntax, keywords, and principles  Domain-specific vocabulary Theater is Showcase North Theater Showcase Cinemas Pittsburgh North Genre is drama Drama What movies are playing? {confsig} [an error beep, since previous utterance is not in grammar] WHERE WAS I? Theater is Showcase Cinemas Pittsburgh North, genre is drama OPTIONS You can specify or ask about title, show time, rating, {ellsig} [a 3-beep list continuation signal] What is title? 2 matches: Dark Water, War of the Worlds START OVER Starting over Theater is Northway Mall Cinemas Eight Northway Mall Cinemas 8 What is address? 1 match: 8000 McKnight Road in Pittsburgh

6 6 Expanded grammar  Exploit the fact that knowledge of speaking to a limited-language system restricts input  Create a grammar that will accept more natural language input cf. SG  This grammar is opaque for users  Why have two grammars? Lower perplexity LMs  lower error rates Some applications may be SG-only  Restriction: linear mapping from EXP input to TGT equivalent

7 7 Shaping strategy  Handle user input accepted by expanded grammar but not target  Balance current task success with future interaction efficiency  Baseline strategy – this study: Confirm expanded grammar input with full, explicit slot+value confirmation Give result if appropriate for query

8 8 Study participants  “Normal” adults, i.e. not CMU students  15 males, 14 females, aged 23-54  Native speakers of American Eng.  Little/no computer programming exp  New to Speech Graffiti

9 9 Study design  Between-subjects  3 conditions non-shaping+tutorial (BT) shaping+tutorial (ST) shaping+no_tutorial (SN)  Tutorial 9-slide.ppt presentation 5 minutes

10 10 Study tasks  15 tasks  4 difficulty levels # of slots to be specified/queried  40 minutes or when all tasks completed Only one user did not get to attempt all 15 tasks in 40 minutes  Afterwards: SASSI questionnaire

11 11 Results  In short, the baseline shaping strategy didn’t have an effect   Efficiency  Mean results from shaping subjects are only slightly better – non-significant

12 12 User satisfaction  Again, no significant differences  No differences on individual SASSI factors  No efficiency/satisfaction differences between tutorial/non-tutorial, either

13 13 Grammaticality  How often did users speak within the Target SG grammar?  From Q1 to Q4, both groups showed significant increases in TGT gram

14 14 Error rates - WER  For non-shaping: 39.9% 30.3% for grammatical utts 38.3% utt-level concept error  For shaping: a bit harder to figure, because of 2-pass ASR Each shaping input generated a TGT hyp & a EXP hyp Selection based on AM/LM score and a few simple heuristics

15 15 Error rates – WER  Shaping: For selected hypothesis: 37.3% All TGT: 40.9% All EXP: 64.2%  25.6% utt-level concept error

16 16 So – what happened?  Shaping users had success with NL-ish input, and shaping prompts were not strong enough to change behavior.

17 17 Biggest problem  Using NL or slot-only query formats My theory: is specification format is very structured. what is sounds structured to me, but to users it sounds like  In new versions, query format will be list Users don’t seem to have too much trouble adapting to a structure – but the structure needs to be clear. Will also shape more explicitly by confirming with “I think you meant, ‘list movies’”  Also for more explicit shaping of specifications

18 18 Other problems  Not using start over to clear context  Confusion about semantics of location  Long utterances  Using next instead of more  Pacing  These will be addressed via targeted help messages

19 19 Current hang-up  Can we improve WER? LM improvements? COTS recognizer?  Dragon: Using Results Issues

20 20 A little bit about trying DNS  Dragon Naturally Speaking 8 Distribution from Jahanzeb  Set up for dictation – i.e. mic input So, no telephone models  To compare with Sphinx Test set of utterances from this study Rerecorded with head mic (so, read) at 16kHz Downsampled to 8kHz for Sphinx

21 21 More Dragon stuff  Two groups TGT  Sphinx mean 56.4% ( Worse than 8k telephone model (?)  Dragon mean 35.9%  Mean diff: Dragon 18.8pts less (ns) EXP  Sphinx mean 68.5%  Dragon mean 45.4%  Mean diff: Dragon 22.3pts less (s)

22 22 More Dragon stuff  But – Dragon rates are not that different from original Sphinx WER rates Sphinx WER in this test might be fishy  Setup seems tricky – can I still do 2-pass decoding?  Would need to change to mic setup  Black-box LM stuff Mysterious adaptation? – not good for user studies!  So, sticking with Sphinx.


Download ppt "1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006."

Similar presentations


Ads by Google