Definition Methods Without measurement, success is undefined

Slides:



Advertisements
Similar presentations
Chapter 15: Analytical evaluation
Advertisements

Chapter 2 The Process of Experimentation
Analysis by design Statistics is involved in the analysis of data generated from an experiment. It is essential to spend time and effort in advance to.
Animal, Plant & Soil Science
User Modeling CIS 376 Bruce R. Maxim UM-Dearborn.
CS305: HCI in SW Development Evaluation (Return to…)
Quantitative Evaluation
User Testing & Experiments. Objectives Explain the process of running a user testing or experiment session. Describe evaluation scripts and pilot tests.
1 User Centered Design and Evaluation. 2 Overview Why involve users at all? What is a user-centered approach? Evaluation strategies Examples from “Snap-Together.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 13, 2003.
Evaluation Methodologies
Evaluating Hypotheses
An evaluation framework
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Predictive Evaluation Predicting performance. Predictive Models Translate empirical evidence into theories and models that can influence design. Performance.
Analytical Evaluations 2. Field Studies
Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland
Fig Theory construction. A good theory will generate a host of testable hypotheses. In a typical study, only one or a few of these hypotheses can.
©2011 1www.id-book.com Analytical evaluation Chapter 15.
Usability testing and field studies
Chapter 5 Models and theories 1. Cognitive modeling If we can build a model of how a user works, then we can predict how s/he will interact with the interface.
Chapter 1: Introduction to Statistics
Research Problem.
1 Brief Review of Research Model / Hypothesis. 2 Research is Argument.
Ch 14. Testing & modeling users
Evaluation Techniques Material from Authors of Human Computer Interaction Alan Dix, et al.
Human Computer Interaction
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Conducting a User Study Human-Computer Interaction.
Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.
Research Methods in Psychology (Pp ). IB Internal Assessment The IB Psychology Guide states that SL students are required to replicate a simple.
Assumes that events are governed by some lawful order
Evaluation of User Interface Design 4. Predictive Evaluation continued Different kinds of predictive evaluation: 1.Inspection methods 2.Usage simulations.
For ABA Importance of Individual Subjects Enables applied behavior analysts to discover and refine effective interventions for socially significant behaviors.
©2010 John Wiley and Sons Chapter 3 Research Methods in Human-Computer Interaction Chapter 3- Experimental Design.
Chapter 15: Analytical evaluation. Inspections Heuristic evaluation Walkthroughs.
Evaluation Techniques Evaluation –tests usability and functionality of system –occurs in laboratory, field and/or in collaboration with users –evaluates.
Experimental Method. METHODS IN PSYCHOLOGY 1.Experimental Method 2.Observation Method 3.Clinical Method.
Chapter 15: Analytical evaluation Q1, 2. Inspections Heuristic evaluation Walkthroughs Start Q3 Reviewers tend to use guidelines, heuristics and checklists.
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
Evaluating a UI Design Expert inspection methods Cognitive Walkthrough
AMSc Research Methods Research approach IV: Experimental [1] Jane Reid
1 MP2 Experimental Design Review HCI W2014 Acknowledgement: Much of the material in this lecture is based on material prepared for similar courses by Saul.
Task Analysis CSCI 4800/6800 Feb 27, Goals of task analysis Elicit descriptions of what people do Represent those descriptions Predict difficulties,
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Using Between-Subjects and Within- Subjects Experimental Designs.
Question paper 1997.
© Simeon Keates 2009 Usability with Project Lecture 14 – 30/10/09 Dr. Simeon Keates.
ITM 734 Introduction to Human Factors in Information Systems
Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.
1 Human-Computer Interaction Usability Evaluation: 2 Expert and Empirical Methods.
Evaluation Using Modeling. Testing Methods Same as Formative Surveys/questionnaires Interviews Observation Documentation Automatic data recording/tracking.
Major Science Project Process A blueprint for experiment success.
Chapter 15: Analytical evaluation. Aims: Describe inspection methods. Show how heuristic evaluation can be adapted to evaluate different products. Explain.
Research Methods in Psychology Introduction to Psychology.
Dependant + Independent variables Independent = directly manipulated by the experimenter Dependant = the variable affected by the independent variable.
Lesson 4. In a laboratory experiment involving a medical consultation role-play, participants were randomly allocated to one of two conditions. In Condition.
Research design By Dr.Ali Almesrawi asst. professor Ph.D.
Statistical Experiments What is Experimental Design.
Human Computer Interaction Lecture 23 Cognitive Models
Human Computer Interaction
Task Analysis CSCI 4800/6800 Feb 27, 2003.
CIS 376 Bruce R. Maxim UM-Dearborn
Qualitative vs. Quantitative
Evaluation techniques
Evaluation.
HCI Evaluation Techniques
Testing & modeling users
Biological Science Applications in Agriculture
Presentation transcript:

Quantitative Evaluation John Kelleher, IT Sligo

Definition Methods Without measurement, success is undefined Performance/Predictive Modeling GOMS/KLM Fitts’ Law Controlled Experiments & Statistical Analysis Without measurement, success is undefined Formal Usability Study to compare two designs on measurable aspects time required number of errors effectiveness for achieving very specific tasks

GOMS Model Card, Moran & Newell (1983) Model the knowledge and cognitive processes involved when users interact with systems. Goals refer to particular state the user wants to achieve Operators refer to the cognitive processes and physical actions that need to be performed in order to attain those goals Methods are learned procedures for accomplishing the goals, consisting of exact sequence of steps required Selection Rules Are used to determine which method to select when there is more than one available for a given stage of a task.

GOMS: Example of deleting word in MS Word Goal: delete a word in a sentence Method for accomplishing goal of deleting a word using menu option: Step 1: Recall that word to be deleted has to be highlighted Step 2: Recall that command is ‘cut’ Step 3: Recall that command ‘cut’ is in edit menu Step 4: Accomplish goal of selecting and executing the ‘cut’ command Step 5: Return with goal accomplished

GOMS: Example of deleting word in MS Word Method for accomplishing goal of deleting a word using delete key: Step 1: Recall where to position cursor in relation to word to be deleted Step 2: Recall which key is delete key Step 3: Press ‘delete’ key to delete each letter Step 4: Return with goal accomplished

GOMS: Example of deleting word in MS Word Operators to use in above methods: Click mouse Drag cursor over text Select menu Move cursor to command Press keyboard key Selection Rules to decide which method to use: 1: Delete text using mouse and selecting from menu if large amount of text is to be deleted 2: Delete text using delete key if small number of letters is to be deleted

Keystroke Level Model Well-known analytic evaluation technique Derived from MHP1 Provides detailed quantitative (numerical) information of user performance Sufficient for predicting speed of interaction with a user interface Basic time prediction components empirically derived 1 Model Human Processor by Card, Moran, Newell (1983)

KLM Constants Operator Name Description Time (Sec) K Pressing a single key or button Skilled typist (55 wpm) Average typist (40 wpm) User unfamiliar with the keyboard Pressing shift or control key 0.35 (average) 0.22 0.28 1.20 0.08 P Point with a mouse or other device to a target on a display Clicking the mouse or similar device 1.10 0.20 H Homing hands on the keyboard or other device 0.40 D Draw a line using a mouse Variable depending on the length of line M Mentally prepare to do something (e.g. make a decision) 1.35 R(t) System response time – counted only if it causes the user to wait when carrying out their task t

Task in Text Editor Using GOMS Create new file Type in “Hello, World.” Save document as “Hello” Print document Exit editor Assume system response is 0, or comparable across systems (constant) Average typist (55wpm) (K = 0.2) Editor is started, hands in lap

All Mouse

Shortcuts

KLM Applicability Caveats User interface w/ limited number of features Repetitive task execution Really only useful for comparative study among alternatives albeit sensitive to minor changes Project Ernestine Caveats assumes expert behaviour – no errors tolerated user already knows the sequence of operations that he or she is going to perform time estimates best followed-up by empirical studies ambiguity regarding M operator assumes serial processing

Fitts’ Law Predicts time taken to reach a target using a pointing device T = k log2(D/S + 0.5), k ~ 100 msec. where T = time to move the hand to a target D = distance between hand and target S = size of target Highlights corners of screen as good targets

Performance measures Time: easy to measure and suitable for statistical analysis. E.g. learning time, task completion time. Errors: shows where problem exist within a system. Suggests the cause of a difficulty. Patterns of system use: study the patterns of use in different sections. Preference and avoidance of sections in a system. Amount of work done in a given time.

Other measures Subjective impression measures Composite measures Attitude measures: Use questionnaires or interviews Rated aesthetics Rated ease of learning Stated decision to purchase Composite measures Weighted averages of the above E.g. efficiency = throughput / number of errors

Controlled experiments Designed to test predictions arising from an explicit hypothesis that arises out of an underlying theory Allows comparison of systems, fine-tuning of details ... Strives for lucid and testable hypothesis quantitative measurement measure of confidence in results obtained (statistics) replicability of experiment control of variables and conditions removal of experimenter bias

Ben Shneiderman (Univ. Maryland US) Experiments have: Two Parents: ‘a practical problem’ ‘a theoretical foundation’ Three Children: ‘Help in resolving the practical problems’ ‘refinements to the theory’ ‘advice to future experimenters who work on the same problem’

Designing Experiments Formulating the hypotheses Developing predictions from the hypotheses Choosing a means to test the predictions Identifying all the variables that might affect the results of the experiment Deciding which are the independent variables, dependent variables and which variables need to be controlled by some means

Usability Laboratory

Usability Laboratory

Designing Experiments (contd.) Designing the experimental task and method Subject selection Deciding the experimental design, data collection method and controlling confounding variables Deciding on the appropriate statistical or other analysis Carrying out a pilot study

The Experimental Method a) Begin with a lucid, testable hypothesis Example 1: “ there is no difference in the number of cavities in children and teenagers using crest and no-teeth toothpaste”

The Experimental Method Example 2: “ there is no difference in user performance (time and error rate) when selecting a single item from a pop-up or a pull down menu, regardless of the subject’s previous expertise in using a mouse or using the different menu types”

The Experimental Method b) Explicitly state the independent variables that are to be altered independent variable the things you manipulate independent of how a subject behaves determines a modification to the conditions the subjects undergo may arise from subjects being classified into different groups In toothpaste experiment toothpaste type: uses Crest or No-teeth toothpaste age: <= 11 years or > 11 years In menu experiment menu type: pop-up or pull-down menu length: 3, 6, 9, 12, 15 subject type (expert or novice)

The Experimental Method c) Carefully choose the dependent variables that will be measured Dependent variables Measures to demonstrate the effects of the independent variables Properties Readily observable Stable and reliable so that they do not vary under constant experimental conditions Sensitive to the effects of the independent variables Readily related to some scale of measurement

Dependent variables Some commonly used dependent variables Number of errors made Time taken to complete a given task Time taken to recover from an error In menu experiment time to select an item selection errors made In toothpaste experiment number of cavities frequency of brushing

What is an experiment? Three criteria The experimenter must systematically manipulate one or more independent variables in the domain under investigation The manipulation must be made under controlled conditions, such that all variables which could affect the outcome of the experiment are controlled see confounding variables, next. The experimenter must measure some un-manipulated feature that changes, or is assumed to change, as a function of the manipulated independent variable

Confounding variables Variables that are not independent variables but are permitted to vary along in the experiment “The logic of experiments is to hold variables-not-of-interest constant among conditions, systematically manipulate independent variables, and observe the effects of the manipulation on the dependent variables.”

Sources of variation Variations in the task performed The effect of the treatment (i.e. the user interface improvements that we made) Individual differences between experimental subjects (e.g. IQ) Different stimuli for each task Distractions during the trial (sneezing, dropping things) Motivation of the subject Accidental hints or intervention by the experimenter Other random factors.

Examples of Confounding Order effects Tasks done early in testing are slower and more prone to error. Tasks done late in testing may be affected by user fatigue. Carry-over effects A difference occurs if one condition follows another. E.g. Learning text editor commands. Experience factors People in one condition have more/less relevant experience than in others. Experimenter/subject bias The experimenter systematically treats some subjects different from others, or when subjects have different motivation levels. Other uncontrolled variables Time of day, system load.

Confounding Prevention Randomization Negates the order effect. Random assignment to conditions is used to ensure that any effect due to unknown differences among users or conditions is random. Counterbalancing Order and carry-over effect. Test half of the users in condition 1 first, and the other half in condition II first. Different permutations of condition order can be used.

Allocation of participants Judiciously select and assign subjects to groups to control variability a) Between-Groups Experiment Two groups of test users, same tasks for both groups. Randomly assign users to two equally-sized groups. Group A uses only system A, group B only system B. b) Within-Groups Experiment One group of test users Each user performs equivalent tasks on both systems. Randomly assign users to two equally-sized pools. Pool A uses system A first, pool B system B first. c) Matched-pairs

Example Designs Between Groups System A System B John Dave James May Mary Ann Stuart Phil Within Groups Participant Sequence Elizabeth A,B Michael B,A Steven Richard Requires more participants No transfer of learning effects Less arduous on participants large individual variation in user skills Is more powerful statistically (can compare the same person across different conditions, thus isolating effects of individual differences) Requires fewer participants than between-groups Learning effects Fatigue effects

Experimental Details Order of tasks choose one simple order (simple -> complex) unless doing within groups experiment Training depends on how real system will be used What if someone doesn’t finish assign very large time & large # of errors Pilot study helps you fix problems with the study do 2, first with colleagues, then with real users

Sample Size Depends on desired confidence level and confidence interval. Confidence level of 95% often used for research, 80% ok for practical development. Rule of thumb: 16-20 test users.

Analysing the numbers Example: trying to get task time <=30 min. test gives: 20, 15, 35, 80, 10, 20 mean (average) = 30 looks good! wrong answer, not certain of anything always chart results Factors contributing to our uncertainty small number of test users (n = 6) results are very variable (standard deviation = 32) std. dev. measures dispersal from the mean

Experimental Evaluation  Advantages Disadvantages Powerful method (depending on the effects investigated) Quantitative data for statistical analysis Can compare different groups of users Reliability and validity good Replicable High resource demands Requires knowledge of experimental method Time spent on experiments can mean evaluation is difficult to integrate into design cycle Tasks can be artificial and restricted Cannot always generalise to full system in typical working situation all human behaviour variables cannot be controlled little recognition of work, time, motivational & social context subject’s ideas, thoughts, beliefs largely ignored (Preece Ch 31 pp641 - 649) This method involves users carrying out specified tasks under controlled conditions and may make use of a mixture of some of the other methods used so far. For example, questionnaires/interviews might be used to establish the users previous experience and, after the ‘experiment’, to elicit, say, their subjective judgements of the interface. The experiment itself might make use of techniques such as observation (including timing of performance), talk-aloud, data-logging, and so on. An important consideration in the design of the experiment is cost : cost of setting up the controlled conditions, finding and paying the subjects, running the experiment, analysing the results, and so on. Advantages of the method are that the results are usually reliable and valid (assuming a good design) and can be replicated any number of times. It can be used to compare the performance and reactions of different groups of users who have been subjected to the same experimental conditions. Disadvantages are that it can be costly, it requires people who are knowledgeable about experimental method, and it can be difficult to fit in to the design cycle because of the time it normally takes. Again, the experimental tasks can sometimes appear artificial and restricted so it is difficult to generalise and to know for sure how the interface, and users, will perform in a real, typical working environment.

Summary Allows comparison of alternative designs Collects objective, quantitative data (bottom-line data) Needs significant number of test users (16-20) Usable only later in development process Requires administrator expertise Cannot provide why-information (process data) Formal studies can reveal detailed information but take extensive time/effort Applicability: system location dangerous or impractical for constrained single user systems to allow controlled manipulation of use

Summary (contd.) Suitable... Advantages and Dis-advantages system location dangerous or impractical for constrained single user systems to allow controlled manipulation of use Advantages and Dis-advantages sophisticated & expensive equipment uninterrupted environment Hawthorne principle