Download presentation
Presentation is loading. Please wait.
Published byJulius Lynch Modified over 9 years ago
1
lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie Mellon University Presented by: Thomas Kevin Harris
2
lti 2 outline introduction –related work method results –user perceptions discussion
3
lti 3 introduction (all other things being equal) spoken dialog systems perform best when users speak within the grammar that the system understands –make grammar more accepting? –get users to speak within the smaller grammar? ok, how do we do this? –this study is a preliminary step
4
lti 4 related work shaping is pretty ubiquitous –users adapt to features of system prompts formality (Ringle & Halstead-Nussloch 1989) length (Zoltan-Ford 1991) vocabulary (Brennan 1996; Gustafson et al 1997) –user also simplify input under higher WER conditions (Shriberg et al 1992)
5
lti 5 foundation: speech graffiti a structured, subset language interaction protocol for interacting with simple machines user input –slot+value pairs: theater is the Galleria Six –what-questions: what are the movies? system output –terse restatement of value: Galleria six user initiative
6
lti 6 previous speech graffiti studies shown to be an effective interaction style –compared to a natural language interface higher user satisfaction lower task completion times similar task completion rates but… users can have difficulty learning & speaking subset language when users tried NL with speech graffiti system, their utterances were simpler than NL to an NL system
7
lti 7 initial questions about shaping how can different instructions influence user input? how will users shape input in response to –rejection of conversational, NL speech –speech graffiti-style, terse, value-only confirmations how will this work in a user-initiative environment? wizard-of-oz study
8
lti 8 outline introduction –related work method results –user perceptions discussion
9
lti 9 wizard-of-oz study 18 participants, mostly CMU students –most with non-technical backgrounds –most had used ASR systems before but not regularly interact with a telephone information system providing movie schedules & airline flight data 10 tasks, e.g. –a friend told you that Miracle was pretty good. where is this movie playing? –a friend has told you that she's flying to San Francisco on United flight 500. when will she get there?
10
lti 10 instructions 3 conditions: short - medium - long welcome: Welcome to the InfoLine. instruction-short: The system you are talking to only understands very simple English, so please speak to it as simply as you can. instruction-medium: The system you are talking to only understands very simple English, so please speak to it as simply as you can. It will understand you best if you tell it only one idea at a time. instruction-long: The system you are talking to only understands very simple English, so please speak to it as simply as you can. It will understand you best if you tell it only one idea at a time. This system understands only keywords, and not the structure of sentences. example: For instance, you might say "movie The Lord of the Rings,” or “airline is United,” or “what are show times?" prompt: You can now start speaking whenever you're ready.
11
lti 11 wizard rules reject: –non-task conversational words could you tell me… –task-based non-content items what movies are showing in West Mifflin? –task-based OOV words films; earliest flight rejection messages –excuse me? –I'm sorry, I didn't understand that. –[replay instruction & example strings from introduction]
12
lti 12 sample user interaction TASK A: user: where is Miracle movie playing? wizard: excuse me? u: Miracle movie w: Miracle u: where is it playing? w: excuse me? u: which theaters? w: 3 theaters: Destinta Theatres Plaza 22, Loews Waterfront Theatre, Northway Mall Cinemas 8 TASK B: u: United flight 500 to San Francisco, what time does it arrive? w: United, to San Francisco, flight 500 u: Arrival time w: arriving at 12:33 pm
13
lti 13 outline introduction –related work method results –user perceptions discussion
14
lti 14 results overall completion rate: 96.1% users generally used value-only no effect of instruction length on number of utterances per session longer instruction length shorter user utterances –due to extra content? instruction condition mean # words per utterance short4.49 medium3.36 long2.98
15
lti 15 rejected input on average, about 22% of user's utterances were rejected –no effect of instruction condition –users only repeated input verbatim in 7 cases sequential rejection instance # of occurrences 1 st 123 2 nd 52 3 rd 15 4 th 3 rejected input shaped after this point 50% 75% 84% 85% excuse me? I'm sorry… the system you are talking to…
16
lti 16 user perceptions participants clearly aware of limited style participants mentioned –simplification –minimization –keywords (key words) these ideas parallel the instruction conditions –speak simply –one idea at a time –use keywords but comments did not match condition
17
lti 17 outline introduction –related work method results –user perceptions discussion
18
lti 18 discussion (1) how can different instructions influence user input? –more explanation of simplification shorter utterances how will users shape input in response to –rejection of conversational, NL speech 50% of rejected utts shaped after one rejection 75% shaped after two rejections –speech graffiti-style, terse, value-only confirmations most shaped user input mimicked this style
19
lti 19 discussion (2) how will this work in a user-initiative environment? –96% task completion rate without explicit system prompts future work –shape input more precisely shape with slot+value confirmation, to avoid ambiguity? shape to specific acoustically distinct vocabulary? –can input be shaped even if some NL is handled?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.