I213: User Interface Design & Development Marti Hearst March 5, 2007.

Slides:



Advertisements
Similar presentations
Chapter 14: Usability testing and field studies
Advertisements

Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
Multi-Modal Text Entry and Selection on a Mobile Device David Dearman 1, Amy Karlson 2, Brian Meyers 2 and Ben Bederson 3 1 University of Toronto 2 Microsoft.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Data analysis and interpretation. Agenda Part 2 comments – Average score: 87 Part 3: due in 2 weeks Data analysis.
User Testing & Experiments. Objectives Explain the process of running a user testing or experiment session. Describe evaluation scripts and pilot tests.
XP Exploring the Basics of Microsoft Windows XP1 Exploring the Basics of Windows XP.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
Automating Tasks With Macros
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 13, 2003.
Useability.
I213: User Interface Design & Development Marti Hearst March 1, 2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
SIMS 213: User Interface Design & Development Marti Hearst March 9 and 16, 2006.
SIMS 213: User Interface Design & Development Marti Hearst March 9 & 11, 2004.
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Single Item Search on PDAs: Effects of Font Size & Menu Style By Ketan Babaria, Sasha Giacoppo, Ugur Kuter.
Proposal 13 HUMAN CENTRIC COMPUTING (COMP106) ASSIGNMENT 2.
Analytical Evaluations 2. Field Studies
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
1. Learning Outcomes At the end of this lecture, you should be able to: –Define the term “Usability Engineering” –Describe the various steps involved.
Chapter 13: Inference in Regression
Chapter 14: Usability testing and field studies
Google Training By: Amy Shannon and Dave Auwerda.
Predictive Evaluation
Conducting a User Study Human-Computer Interaction.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
Statistics for Education Research Lecture 5 Tests on Two Means: Two Independent Samples Independent-Sample t Tests Instructor: Dr. Tung-hsien He
User Models Predicting a user’s behaviour. Fitts’ Law.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Ch 14. Testing & modeling users
Single-Factor Experimental Designs
Evaluation Techniques Material from Authors of Human Computer Interaction Alan Dix, et al.
User Study Evaluation Human-Computer Interaction.
Fall 2002CS/PSY Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results.
Usability Evaluation June 8, Why do we need to do usability evaluation?
Designing Interface Components. Components Navigation components - the user uses these components to give instructions. Input – Components that are used.
Evaluation Techniques zEvaluation ytests usability and functionality of system yoccurs in laboratory, field and/or in collaboration with users yevaluates.
Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.
Usability Testing Chapter 6. Reliability Can you repeat the test?
Evaluation of User Interface Design 4. Predictive Evaluation continued Different kinds of predictive evaluation: 1.Inspection methods 2.Usage simulations.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Using Between-Subjects and Within- Subjects Experimental Designs.
L643: Evaluation of Information Systems Week 13: March, 2008.
Formal User Studies Marti Hearst (UCB SIMS) SIMS 213, UI Design & Development April 13, 1999.
The Effects of Feedback on Targeting with Multiple Moving Targets David Mould and Carl Gutwin.
WERST – Methodology Group
Chapter 15: Analytical evaluation. Aims: Describe inspection methods. Show how heuristic evaluation can be adapted to evaluate different products. Explain.
Importance of user interface design – Useful, useable, used Three golden rules – Place the user in control – Reduce the user’s memory load – Make the.
Day 10 Analysing usability test results. Objectives  To learn more about how to understand and report quantitative test results  To learn about some.
1 Unit E-Guidelines (c) elsaddik SEG 3210 User Interface Design & Implementation Prof. Dr.-Ing. Abdulmotaleb.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
GroupMap Starter’s Guide Think Better Together Plan, brainstorm, discuss and prioritise for action. © GroupMap Pty Ltd |
Wave Menus: Improving the novice mode of Hierarchical Marking Menus Gilles Bailly 1,2 Eric Lecolinet 2 Laurence Nigay 1 LIG Grenoble 1 France 2 ENST Paris.
User Interface Evaluation
Qualitative vs. Quantitative
Data analysis and interpretation
12 Inferential Analysis.
i213: User Interface Design & Development
12 Inferential Analysis.
Inspecting your data Analyzing & interpreting results
Evaluation.
COMP444 Human Computer Interaction Usability Engineering
HCI Evaluation Techniques
Testing & modeling users
Evaluation Techniques
Retrieval Performance Evaluation - Measures
Presentation transcript:

i213: User Interface Design & Development Marti Hearst March 5, 2007

Example Study Sample study by White et al –Studying query-biased search results summaries First did an informal assessment to determine responses to the state of the art –6 participants –Compared AltaVista & Google Versions from 2001 or maybe 2000 Google’s summaries were query-biased, AltaVista’s weren’t Ranking wasn’t as good then –Findings: Summaries were ambiguous and too short First thing they saw was the hit count – discouraging Had to scroll to see more than a few results Main conclusion: the document summaries were not descriptive enough

Study Goals Evaluate a new form of query-biased summaries for web search Hypothesis: –The presence of query-biased summaries will improve search effectiveness

Experiment Design Independent Variable: –Search Interface Levels: –Google, Google + summaries, AV, AV + summaries –Task types Levels: 4 different tasks Dependent Variables: –Participant satisfaction –Task completion success –Task completion time

Blocking Number of participants: 24 Within-participants –They each use all 4 interfaces –They each do 4 tasks They control for: –Effects of task (some harder than others) –Effects of order of exposure to system (seeing one can influence the effects of seeing the next) They do not control for: –Order of task

Latin-Square Design Start with an ordering Rotate the order, moving one position per line

Latin Square Design G G+ A A A+ A G T1T2T within partic. design A+ G G+ A+GG+A T4 6 # partic. Per Cond.

Latin Square Design Start with an ordering Rotate the order, moving one position per line Note that this doesn’t give you every possible ordering! –(e.g., don’t see AV right after G) –The hope is the outcome isn’t that sensitive to ordering

Study Procedure Participants came by, one at a time. Session lasted about 1.5 hours Procedure: –Introductory orientation session –Background questionnaire –The following 4 steps for each task: Short training session with the new system Receive a hard-copy task description 10 minute search session Post-search questionnaire –Final questionnaire –Information discussion (optional)

Study Procedure Data collection –Questionnaires 5 point Likert scales 3 on task, 4 on search process, 4 on summaries –Think-aloud –Automatic logging # docs returned # summaries requested and returned # results pages viewed Time for each session

Questions Asked

Subjective Results No interaction effects for task All groups preferred the enhanced summaries All groups feel they benefited from the summaries G + enhanced summaries was significantly different from the rest on 3 out of 4 (relaxing, interesting, restful) –Except for easy/difficult, where there was no difference System ranking in the final questionnaire: –23 out of 24 choose AV+ or G+ as first or second –19 choose AV+ and G+ as the top two

More Qualitative Results Participants liked both styles of results summaries Participants disliked –Scrolling –Moving the mouse to see enhanced summaries –Hiding the url in enhanced summaries –Not seeing query terms in context (AV)

Quantitative Results Task time –Artificial cutoff of 10 minutes assigned even if task not completed –Participants significantly faster with the enhanced summaries –There was a slight correlation between system and task completion, but not strong

Experiment Design Example: Marking Menus Based on Kurtenbach, Sellen, and Buxton, Some Articulartory and Cognitive Aspects of “Marking Menus”, Graphics Interface ‘94,

Experiment Design Example: Marking Menus Pie marking menus can reveal –the available options –the relationship between mark and command 1. User presses down with stylus 2. Menu appears 3. User marks the choice, an ink trail follows

Why Marking Menus? Same movement for selecting command as for executing it Supporting markings with pie menus should help transition between novice and expert Useful for keyboardless devices Useful for large screens Pie menus have been shown to be faster than linear menus in certain situations

What do we want to know? Are marking menus better than pie menus? –Do users have to see the menu? –Does leaving an “ink trail” make a difference? –Do people improve on these new menus as they practice? Related questions: –What, if any, are the effects of different input devices? –What, if any, are the effects of different size menus?

Experiment Factors Isolate the following factors (independent variables): –Menu condition exposed, hidden, hidden w/marks (E,H,M) –Input device mouse, stylus, track ball (M,S,T) –Number of items in menu 4,5,7,8,11,12 (note: both odd and even) Response variables (dependent variables): –Response Time –Number of Errors

Experiment Hypotheses Note these are stated in terms of the factors (independent variables) 1.Exposed menus will yield faster response times and lower error rates, but not when menu size is small 2.Response variables will monotonically increase with menu size for exposed menus 3.Response time will be sensitive to number of menu choices for hidden menus (familiar ones will be easier, e.g., 8 and 12) 4.Stylus better than Mouse better than Track ball

Experiment Hypotheses 5.Device performance is independent of menu type 6.Performance on hidden menus (both marking and hidden) will improve steadily across trials. Performance on exposed menus will remain constant.

Experiment Design Participants –36 right-handed people usually gender distribution is stated –considerable mouse experience –(almost) no trackball, stylus experience

Experiment Design Task –Select target “slices” from a series of different pie menus as quickly and accurately as possible (a) exposed (b) hidden Can move mouse to select, as long as butten held down –Menus were simply numbered segments (meaningful items would have longer learning times) –Participants saw running scores Shown grayed-out feedback about which selected Lose points for wrong selection

Experiment Design 36 participants One between-subjects factor –Menu View Type Three levels: E, H, or M (Exposed, Hidden, Marking) Two within-subjects factors –Device Type Three levels: M, T, or S (Mouse, Trackball, Stylus) –Number of Menu Items Six levels: 4, 5, 7, 8, 11, 12 How should we arrange these?

Experiment Design EHM 12 Between subjects design How to arrange the devices?

Experiment Design M T S T S M S M T EHM 12 A Latin Square No row or column share labels (Note: each of 12 participants does everything in one column)

Experiment Design M T S T S M S M T EHM How to arrange the menu sizes? Block by size then randomize the blocks.

Experiment Design M T S T S M S M T EHM Block by size then randomize the blocks. (Note: the order of each set of menu size blocks will differ for each participant in each square)

Experiment Design M T S T S M S M T EHM trials per block (Note: these blocks will look different for each participant.)

Experiment Overall Results So exposing menus is faster … or is it? Let’s factor things out more.

A Learning Effect When we graph over the number of trials, we find a difference between exposed and hidden menus. This suggests that participants may eventually become faster using marking menus (was hypothesized). A later study verified this.

Factoring to Expose Interactions Increasing menu size increases selection time and number of errors (was hypothesized). No differences across menu groups in terms of response time. That is, until we factor by menu size AND menu group –Then we see that menu size has interaction effects on Hidden groups not seen in Exposed group –This was hypothesized (12 easier than 11)

Factoring to Expose Interactions Stylus and mouse outperformed trackball (hypothesized) Stylus and mouse the same (not hypothesized) Initially, effect of input device did not interact with menu type –this is when comparing globally –BUT... More detailed analysis: –Compare both by menu type and device type –Stylus significantly faster with Marking group –Trackball significantly slower with Exposed group –Not hypothesized!

Average response time and errors as a function of device, menu size, and menu type. Potential explanations: Markings provide feedback for when stylus is pressed properly. Ink trail is consistent with the metaphor of using a pen.

Experiment Design M T S T S M S M T EHM How can we tell if order in which the device appears has an effect on the final outcome? Some evidence: There is no significant difference among devices in the Hidden group. Trackball was slowest and most error prone in all three cases. Still, there may be some hidden interactions, but unlikely to be strong given the previous graph.

Statistical Tests Need to test for statistical significance –This is a big area –Assuming a normal distribution: Students t-test to compare two variables ANOVA to compare more than two variables

Summary Formal studies can reveal detailed information but take extensive time/effort Human participants entail special requirements Experiment design involves –Factors, levels, participants, tasks, hypotheses –Important to consider which factors are likely to have real effects on the results, and isolate these Analysis –Often need to involve a statistician to do it right –Need to determine statistical significance –Important to make plots and explore the data

Longitudinal Studies Trace the use of an interface over time Do people continue to use it, or drop it? How does people’s use change over time?

Longitudinal Studies Dumais et al –Studied use of desktop search –Some people had sort by date as default, others had sort by relevance as default –A number of people switched from relevance to date; few went the other way Kaki 2005 –Studied use of term-grouping search interface –People used the groups only for certain types of queries –People’s queries got shorter, since the interface could disambiguate for them.

Followup Work Hierarchical Markup Menu study

Followup Work Results of use of marking menus over an extended period of time –two person extended study –participants became much faster using gestures without viewing the menus

Followup Work Results of use of marking menus over an extended period of time –participants temporarily returned to “novice” mode when they had been away from the system for a while

Wizard of Oz Studies (discussed briefly in Nielsen) Useful for simulating a smart program in order to get participant responses Examples: Test out: –a speech interface –a question-answering interface There is a man behind the curtain!

Discuss Jeffries et al. Compared 4 Evaluation Techniques –Heuristic Evaluation –Software Guidelines –Cognitive Walkthroughs –Usability Testing Findings?