SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation1 Module u1: Speech in the Interface 4: User-centered Design and Evaluation Jacques.

Slides:



Advertisements
Similar presentations
DEVELOPING A METHODOLOGY FOR MS3305 CW2 Some guidance.
Advertisements

Testing Relational Database
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
What is Software Design?. Systems Development Life- Cycle Planning Analysis Design Implementation Design.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
IS214 Recap. IS214 Understanding Users and Their Work –User and task analysis –Ethnographic methods –Site visits: observation, interviews –Contextual.
Chapter 14: Usability testing and field studies. 2 FJK User-Centered Design and Development Instructor: Franz J. Kurfess Computer Science Dept.
Usability Inspection n Usability inspection is a generic name for a set of methods based on having evaluators inspect or examine usability-related issues.
The Process of Interaction Design. What is Interaction Design? It is a process: — a goal-directed problem solving activity informed by intended use, target.
Software Engineering CSE470: Requirements Analysis 1 Requirements Analysis Defining the WHAT.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Data-collection techniques. Contents Types of data Observations Event logs Questionnaires Interview.
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
An evaluation framework
Usability 2004 J T Burns1 Usability & Usability Engineering.
Principles and Methods
Usability Specifications
Chapter 7 design rules.
Task analysis 1 © Copyright De Montfort University 1998 All Rights Reserved Task Analysis Preece et al Chapter 7.
Usability and Evaluation Dov Te’eni. Figure ‎ 7-2: Attitudes, use, performance and satisfaction AttitudesUsePerformance Satisfaction Perceived usability.
U1, Speech in the interface: 1. Introduction1 Module u1: Speech in the Interface 1: Introduction Jacques Terken HG room 2:40 tel. (247) 5254
Software Process and Product Metrics
1 User Interface Design CIS 375 Bruce R. Maxim UM-Dearborn.
User Interface Evaluation CIS 376 Bruce R. Maxim UM-Dearborn.
Review an existing website Usability in Design. to begin with.. Meeting Organization’s objectives and your Usability goals Meeting User’s Needs Complying.
1. Learning Outcomes At the end of this lecture, you should be able to: –Define the term “Usability Engineering” –Describe the various steps involved.
류 현 정류 현 정 Human Computer Interaction Introducing evaluation.
S/W Project Management
Chapter 11: Interaction Styles. Interaction Styles Introduction: Interaction styles are primarily different ways in which a user and computer system can.
Chapter 3 – Agile Software Development 1Chapter 3 Agile software development.
Principles of User Centred Design Howell Istance.
Evaluation Framework Prevention vs. Intervention CHONG POH WAN 21 JUNE 2011.
Chapter 11: An Evaluation Framework Group 4: Tony Masi, Sam Esswein, Brian Rood, & Chris Troisi.
ITEC224 Database Programming
Demystifying the Business Analysis Body of Knowledge Central Iowa IIBA Chapter December 7, 2005.
Evaluation of Adaptive Web Sites 3954 Doctoral Seminar 1 Evaluation of Adaptive Web Sites Elizabeth LaRue by.
CSC 480 Software Engineering Lecture 19 Nov 11, 2002.
Human Computer Interaction
10 Usability Heuristics for User Interface Design.
Usability testing. Goals & questions focus on how well users perform tasks with the product. – typical users – doing typical tasks. Comparison of products.
Software Development Software Testing. Testing Definitions There are many tests going under various names. The following is a general list to get a feel.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Human Computer Interaction
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
Lecture 7: Requirements Engineering
Design 2 (Chapter 5) Conceptual Design Physical Design Evaluation
CS2003 Usability Engineering Human-Centred Design Dr Steve Love.
Chapter 8 Usability Specification Techniques Hix & Hartson.
Evaluating a UI Design Expert inspection methods Cognitive Walkthrough
Human-computer interaction: users, tasks & designs User modelling in user-centred system design (UCSD) Use with Human Computer Interaction by Serengul.
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
Human Computer Interaction CITB 243 Chapter 1 What is HCI
Usability 1 Usability evaluation Without users - analytical techniques With users - survey and observational techniques.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2006.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.
Usability Engineering Dr. Dania Bilal IS 592 Spring 2005.
The Structure of the User Interface Lecture # 8 1 Gabriel Spitz.
Conceptual Design Dr. Dania Bilal IS588 Spring 2008.
1 Usability evaluation and testing User interfaces Jaana Holvikivi Metropolia.
Oct 211 The next two weeks Oct 21 & 23: Lectures on user interface evaluation Oct 28: Lecture by Dr. Maurice Masliah No office hours (out of town) Oct.
Fall 2002CS/PSY Predictive Evaluation (Evaluation Without Users) Gathering data about usability of a design by a specified group of users for a particular.
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Putting it in Practice: CD Ch. 20 Monday Fun with Icons CS 321 Human-Computer.
Usability Engineering Dr. Dania Bilal IS 582 Spring 2007.
Usability Engineering Dr. Dania Bilal IS 587 Fall 2007.
Dillon: CSE470: ANALYSIS1 Requirements l Specify functionality »model objects and resources »model behavior l Specify data interfaces »type, quantity,
Human Computer Interaction Lecture 21 User Support
SIE 515 Design Evaluation Lecture 7.
COMP444 Human Computer Interaction Usability Engineering
CSM18 Usability Engineering
Presentation transcript:

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation1 Module u1: Speech in the Interface 4: User-centered Design and Evaluation Jacques Terken

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation2 Contents n Methodological issues: design n Evaluation methodology

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation3 The design process n Requirements  n Specifications of prototype n Evaluation 1: Wizard-of-Oz experiments “bionic wizards” n Redesign and implementation: V1 n Evaluation 2: Objective and subjective measurements (laboratory tests) n Redesign and implementation: V2 n Evaluation 3: Lab tests, field tests

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation4 Requirements n Source of requirements: –you yourself –potential end users –customer –manufacturer n Checklist –consistency –feasibility (w. r. to performance and price)

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation5 Interface design n success of design depends on consideration of –task demands  –knowledge, needs and expectations of user population  –capabilities of technology 

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation6 Task demands n exploit structure in task to make interaction more transparent –E.g. form-filling metaphor

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation7 User expectations n Users may bring advance knowledge of domain n Users may bring too high expectations of communicative capabilities of system, especially if quality of output speech is high; this will lead to user utterances that the system can’t handle n Instruction of limited value n Interactive tutorial more useful (kamm et al., icslp98) n Can also include training on how to speak to the system n Edutainment approach (weevers, 2004)

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation8 Capabilities of technology n Awareness of ASR and NLP limitations n Necessary modelling of domain knowledge through ontology n Understanding of needs w.r. to cooperative communication: rationality; inferencing n Understanding of needs w.r. to conversational dynamics, including mechanisms for graceful recovery from errors

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation9 Specifications: check ui design principles Shneiderman (1986) n continuous representation of objects and actions of interest (transparency) n rapid, incremental, reversible operations with immediately visible impact n physical actions or labelled button presses, not complex syntax~nl

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation10 Application to speech interfaces Kamm & Walker (1997) n continuous representation: –may be impossible or undesirable as such in speech interfaces open question - pause – options (zooming) subset of vocabulary with consistent meaning throughout (“help me out”, “cancel”) n immediate impact agent:anny here, what can i do for you user:call lyn walker agent:calling lyn walker

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation11 n incrementality user: i want to go from boston to san francisco agent:san francisco has two airports: ….. n reversibility –“cancel” n NB Discussion topic –Schneiderman heuristic 7: Locus of control vs mixed control dialogue

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation12 Contents n Methodological issues: design n evaluation methodology

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation13 Aim of evaluation n diagnostic test/formative evaluation: –To inform the design team –Ensure that the system meets the expectations and requirements of end users –To improve the design where possible n Benchmarking/summative evaluation: –To inform the manufacturer about quality of system relative to those of competitors or previous releases 

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation14 Benchmarking n Requires accepted, standardised test n No accepted solution for benchmarking of complete spoken dialogue systems n Stand-alone tests of separate components both for diagnostic and benchmarking purposes (glass box approach  )

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation15 Glass box / black box n Black box: system evaluation (e.g. “how will it perform in an application”) n Glass box: performance of individual modules (both for benchmarking and diagnostic purposes) –with perfect input from previous modules –or with real input (always imperfect!) –evaluation methods: statistical, performance- based (objective/subjective)

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation16 n problem of componentiality: –relation between performance of individual components and performance of whole system

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation17 Anchoring: choosing the right contrast condition n In the absence of validated standards: Need for reference condition to evaluate performance of test system(s) n speech output: often natural speech used as reference n will lead to compression effects for experimental systems when evaluation is conducted by means of rating scales n anchoring preferably in context of objective evaluation and with preference judgements

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation18 Evaluation tools/frameworks n Hone and Graham: Sassi questionnaire tuned towards evaluation of speech interfaces n Walker et al: Paradise Establishing connections between objective and subjective measures n Extension of Paradise to multimodal interfaces: Promise

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation19 Sassi n Subjective Assessment of Speech System Interfaces n and pdf n Likert type questions n Factors: –Response accuracy –Likeability –Cognitive demand –Annoyance –Habitability (match between mental model and actual system) –Speed

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation20 Examples of questions ( n= 36) n The system is accurate n The system is unreliable n The interaction with the system is unpredictable n The system is pleasant n The system is friendly n I was able to recover easily from errors n I enjoyed using the system n It is clear how to speak to the system n The interaction with the system is frustrating. n The system is too inflexible n I sometimes wondered if I was using the right word n I always knew what to say to the system n It is easy to lose track of where you are in an interaction with the system n The interaction with the system is fast n The system responds too slowly

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation21 Paradise n User satisfaction (subjective) brought in connection with task success and costs (objective measure) pdfpdf –Users perform scenario-based tasks –Measure task success for scenarios, correcting for chance on the basis of attribute value matrices denoting the number of possible options (measure: kappa; κ = 1 if all scenarios were successfully completed) –Obtain objective measures of costs: Efficiency measures (number of utterances, dialogue time, …) Qualitative measures (repair ratio, inappropriate utterance ratio, …) –Normalize task success and cost measures across subjects by taking the z-scores

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation22 –Measure user satisfaction (Mean Opinion Scores across one or more scales) –estimate performance function   z  appa - (  w i  z cost i ) compute value of  and w i by multiple linear regression w i indicates the relative weight of the individual cost components cost i w i gives information about what are the primary cost factors, i.e. which factors have most influence on (the lack of) usability of the system

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation23 n case study: performance =.40 z  appa -.78 cost 2 with cost 2 is number of repetitions n once the weights have been established and validated, user satisfaction can be predicted from objective data n The typical finding is that user satisfaction as measured by the questionnaire is primarily determined by the quality of the speech recognition (which is not very informative) n Concerns: –“Conservative” scoring on semantic scales –Not all cost functions may be linear

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation24 Promise n Evaluation of multimodal interfaces n References: Pdf1 and pdf2Pdf1pdf2 n Basic idea same as for PARADISE but differences in the way task success is calculated and the correlations are computed

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation25 Where to evaluate: Laboratory tests n Use of scenarios gives some degree of experimental control n Objective and subjective measurements aimed at identifying problem sources and testing potential solutions n Interviews n BUT: Scenarios implicitly specify domain n AND: subjects may be co-operative of overly non-co- operative (exploring the limits of the system)

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation26 Where: Field tests n Advantage: gives information about performance of system with actual end users with self-defined, real goals in realistic situations n Mainly diagnostic (how does the system perform in realistic conditions) n BUT: no information about reasons for particular actions in the dialogue

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation27 Additional considerations n evaluation also in terms of suitability of system given the technological and cost constraints imposed by the application –Cpu consumption, real-time performance –bandwidth, memory consumption –cost

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation28 Project n Wizard-of-Oz –usual assumption is that subjects are made to believe that they are interacting with a real system –most suited when system to be developed is very complex, or when performance of individual modules strongly affects overall performance –Full vs bionic wizard

SAI User-System Interaction U1, Speech in the Interface: 4. Evaluation29 WOZ: General set-up subject wizard interface wizard scenarios data collection (logging) simulation tools assistant user interface