Download presentation
Presentation is loading. Please wait.
1
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation1 Module u1: Speech in the Interface 4: User-centered Design and Evaluation Jacques Terken
2
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation2 Contents n Methodological issues: design n Evaluation methodology
3
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation3 The design process n Requirements n Specifications of prototype n Evaluation 1: Wizard-of-Oz experiments “bionic wizards” n Redesign and implementation: V1 n Evaluation 2: Objective and subjective measurements (laboratory tests) n Redesign and implementation: V2 n Evaluation 3: Lab tests, field tests
4
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation4 Requirements n Source of requirements: –you yourself –potential end users –customer –manufacturer n Checklist –consistency –feasibility (w. r. to performance and price)
5
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation5 Interface design n success of design depends on consideration of –task demands –knowledge, needs and expectations of user population –capabilities of technology
6
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation6 Task demands n exploit structure in task to make interaction more transparent –E.g. form-filling metaphor
7
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation7 User expectations n Users may bring advance knowledge of domain n Users may bring too high expectations of communicative capabilities of system, especially if quality of output speech is high; this will lead to user utterances that the system can’t handle n Instruction of limited value n Interactive tutorial more useful (kamm et al., icslp98) n Can also include training on how to speak to the system n Edutainment approach (weevers, 2004)
8
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation8 Capabilities of technology n Awareness of ASR and NLP limitations n Necessary modelling of domain knowledge through ontology n Understanding of needs w.r. to cooperative communication: rationality; inferencing n Understanding of needs w.r. to conversational dynamics, including mechanisms for graceful recovery from errors
9
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation9 Specifications: check ui design principles Shneiderman (1986) n continuous representation of objects and actions of interest (transparency) n rapid, incremental, reversible operations with immediately visible impact n physical actions or labelled button presses, not complex syntax~nl
10
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation10 Application to speech interfaces Kamm & Walker (1997) n continuous representation: –may be impossible or undesirable as such in speech interfaces open question - pause – options (zooming) subset of vocabulary with consistent meaning throughout (“help me out”, “cancel”) n immediate impact agent:anny here, what can i do for you user:call lyn walker agent:calling lyn walker
11
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation11 n incrementality user: i want to go from boston to san francisco agent:san francisco has two airports: ….. n reversibility –“cancel” n NB Discussion topic –Schneiderman heuristic 7: Locus of control vs mixed control dialogue
12
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation12 Contents n Methodological issues: design n evaluation methodology
13
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation13 Aim of evaluation n diagnostic test/formative evaluation: –To inform the design team –Ensure that the system meets the expectations and requirements of end users –To improve the design where possible n Benchmarking/summative evaluation: –To inform the manufacturer about quality of system relative to those of competitors or previous releases
14
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation14 Benchmarking n Requires accepted, standardised test n No accepted solution for benchmarking of complete spoken dialogue systems n Stand-alone tests of separate components both for diagnostic and benchmarking purposes (glass box approach )
15
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation15 Glass box / black box n Black box: system evaluation (e.g. “how will it perform in an application”) n Glass box: performance of individual modules (both for benchmarking and diagnostic purposes) –with perfect input from previous modules –or with real input (always imperfect!) –evaluation methods: statistical, performance- based (objective/subjective)
16
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation16 n problem of componentiality: –relation between performance of individual components and performance of whole system
17
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation17 Anchoring: choosing the right contrast condition n In the absence of validated standards: Need for reference condition to evaluate performance of test system(s) n speech output: often natural speech used as reference n will lead to compression effects for experimental systems when evaluation is conducted by means of rating scales n anchoring preferably in context of objective evaluation and with preference judgements
18
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation18 Evaluation tools/frameworks n Hone and Graham: Sassi questionnaire tuned towards evaluation of speech interfaces n Walker et al: Paradise Establishing connections between objective and subjective measures n Extension of Paradise to multimodal interfaces: Promise
19
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation19 Sassi n Subjective Assessment of Speech System Interfaces n http://people.brunel.ac.uk/~csstksh/sassi.html and pdf http://people.brunel.ac.uk/~csstksh/sassi.htmlpdf n Likert type questions n Factors: –Response accuracy –Likeability –Cognitive demand –Annoyance –Habitability (match between mental model and actual system) –Speed
20
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation20 Examples of questions ( n= 36) n The system is accurate n The system is unreliable n The interaction with the system is unpredictable n The system is pleasant n The system is friendly n I was able to recover easily from errors n I enjoyed using the system n It is clear how to speak to the system n The interaction with the system is frustrating. n The system is too inflexible n I sometimes wondered if I was using the right word n I always knew what to say to the system n It is easy to lose track of where you are in an interaction with the system n The interaction with the system is fast n The system responds too slowly
21
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation21 Paradise n User satisfaction (subjective) brought in connection with task success and costs (objective measure) pdfpdf –Users perform scenario-based tasks –Measure task success for scenarios, correcting for chance on the basis of attribute value matrices denoting the number of possible options (measure: kappa; κ = 1 if all scenarios were successfully completed) –Obtain objective measures of costs: Efficiency measures (number of utterances, dialogue time, …) Qualitative measures (repair ratio, inappropriate utterance ratio, …) –Normalize task success and cost measures across subjects by taking the z-scores
22
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation22 –Measure user satisfaction (Mean Opinion Scores across one or more scales) –estimate performance function z appa - ( w i z cost i ) compute value of and w i by multiple linear regression w i indicates the relative weight of the individual cost components cost i w i gives information about what are the primary cost factors, i.e. which factors have most influence on (the lack of) usability of the system
23
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation23 n case study: performance =.40 z appa -.78 cost 2 with cost 2 is number of repetitions n once the weights have been established and validated, user satisfaction can be predicted from objective data n The typical finding is that user satisfaction as measured by the questionnaire is primarily determined by the quality of the speech recognition (which is not very informative) n Concerns: –“Conservative” scoring on semantic scales –Not all cost functions may be linear
24
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation24 Promise n Evaluation of multimodal interfaces n References: Pdf1 and pdf2Pdf1pdf2 n Basic idea same as for PARADISE but differences in the way task success is calculated and the correlations are computed
25
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation25 Where to evaluate: Laboratory tests n Use of scenarios gives some degree of experimental control n Objective and subjective measurements aimed at identifying problem sources and testing potential solutions n Interviews n BUT: Scenarios implicitly specify domain n AND: subjects may be co-operative of overly non-co- operative (exploring the limits of the system)
26
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation26 Where: Field tests n Advantage: gives information about performance of system with actual end users with self-defined, real goals in realistic situations n Mainly diagnostic (how does the system perform in realistic conditions) n BUT: no information about reasons for particular actions in the dialogue
27
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation27 Additional considerations n evaluation also in terms of suitability of system given the technological and cost constraints imposed by the application –Cpu consumption, real-time performance –bandwidth, memory consumption –cost
28
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation28 Project n Wizard-of-Oz –usual assumption is that subjects are made to believe that they are interacting with a real system –most suited when system to be developed is very complex, or when performance of individual modules strongly affects overall performance –Full vs bionic wizard
29
SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation29 WOZ: General set-up subject wizard interface wizard scenarios data collection (logging) simulation tools assistant user interface
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.