Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Member FINRA/SIPCThursday, November 12, 2009 Resource Menu Changes - Report User Experience Study | Kevin Cornwall.
Multifactorial Designs
Experimental Design, Response Surface Analysis, and Optimization
 1 Notes from Heim Chapter 8 and
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
USABILITY AND EVALUATION Motivations and Methods.
The art and science of measuring people l Reliability l Validity l Operationalizing.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Matching level of measurement to statistical procedures
Chi-square Test of Independence
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
ICS 463, Intro to Human Computer Interaction Design: 9. Experiments Dan Suthers.
Brown, Suter, and Churchill Basic Marketing Research (8 th Edition) © 2014 CENGAGE Learning Basic Marketing Research Customer Insights and Managerial Action.
Today Concepts underlying inferential statistics
Chapter 14 Inferential Data Analysis
User Interface Design Chapter 11. Objectives  Understand several fundamental user interface (UI) design principles.  Understand the process of UI design.
Understanding Research Results
Objectives of Multiple Regression
Copyright © 2011 Pearson Education, Inc. All rights reserved. Doing Research in Behavior Modification Chapter 22.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
1 Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University.
Spring /6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistics Definition Methods of organizing and analyzing quantitative data Types Descriptive statistics –Central tendency, variability, etc. Inferential.
A statistical method for testing whether two or more dependent variable means are equal (i.e., the probability that any differences in means across several.
 Economics is about choosing from alternative ways to use scare resources to accomplish goals  All economic analysis focuses on how people choose.
Eng.Mosab I. Tabash Applied Statistics. Eng.Mosab I. Tabash Session 1 : Lesson 1 IntroductiontoStatisticsIntroductiontoStatistics.
Analyzing and Interpreting Quantitative Data
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Can Money Buy Happiness? Evidence from the Discounting of Uncertain Happiness Tracy A. Tufenk & Daniel D. Holt Psychology Department, University of Wisconsin-Eau.
Context-Aware Interactive Content Adaptation Iqbal Mohomed, Jim Cai, Sina Chavoshi, Eyal de Lara Department of Computer Science University of Toronto MobiSys2006.
McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All rights reserved. Milkovich/Newman: Compensation, Ninth Edition Chapter 12 The.
Statistical Significance of Data
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
What Are Employee Benefits? That part of the total compensation package, other than pay for time worked, provided to employees in whole or in part by.
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS Introduction.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Extended Cognitive Walkthrough Judy Kay CHAI: Computer human adapted interaction research group School of Information Technologies.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Psychological Research Methods Psychology: Chapter 2, Section 2.
Issues concerning the interpretation of statistical significance tests.
Question paper 1997.
Chapter 10 The t Test for Two Independent Samples
EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)
Chapter 6: Analyzing and Interpreting Quantitative Data
Importance of user interface design – Useful, useable, used Three golden rules – Place the user in control – Reduce the user’s memory load – Make the.
Comparing the Means of Two Dependent Populations.
Experiments Uniquely suited to identify cause-effect relationships To study effect of one variable (treatment) on another (outcome/dependent variable)
Single-Subject and Correlational Research Bring Schraw et al.
Independent Samples T-Test. Outline of Today’s Discussion 1.About T-Tests 2.The One-Sample T-Test 3.Independent Samples T-Tests 4.Two Tails or One? 5.Independent.
Korea University User Interface Lab Copyright 2008 by User Interface Lab Human Action Laws in Electronic Virtual Worlds – An Empirical Study of Path Steering.
Web Foundations TUESDAY, OCTOBER 15, 2013 LECTURE 12: CARD SORTING, USABILITY TESTING.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
8.1 Confidence Intervals: The Basics Objectives SWBAT: DETERMINE the point estimate and margin of error from a confidence interval. INTERPRET a confidence.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Chapter 8: Estimating with Confidence
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Measurement Systems for Sustainability Arrow’10 Inclusive wealth – one particular metric Parris & Kates Review 12 indicator initiatives  How do we choose.
Chapter 9 Introduction to the t Statistic
Introduction to Marketing Research
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15

Statistics Branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. Practice or science of.
One-Factor Experiments
Presentation transcript:

Utility of Human-Computer Interactions: Toward a Science of Preference Measurement Michael Toomim, Travis Kriplean Claus Pörtner and James A. Landay University of Washington, dub Group CHI 2011

Discretionary Use of Interfaces CHI research community grew from discretionary use of computer interfaces (starting from 1980s), meaning free choices (i.e., people choose which interfaces to use to accomplish their tasks) Now, task (and its goal) is a choice (e.g., blogs, web browsing, SNS, Wikipedia), ubiquitous applications (e.g., smartphones, Nike+iPod) Widely accepted evaluation metrics in CHI research: – Indirect prediction about whether an interface will be preferred over other alternatives – Examples: time-on-task, # of errors, subjective interpretations of think-aloud, survey reports

Evaluating “User Choices” Industry: A/B testing (split testing, bucket testing) – Method of marketing testing by which multiple versions of one element are tested against a metric to define which is more successful – These versions undergo testing simultaneously to determine which is better – Conversions are measured from the different sets of users (between-subjects) Yet, A/B testing is challenging: large up-front investment and large existing user-base to deploy/test (say, thousands of people) vs. Sample size matters Control (baseline) Treatment A Treatment B  Statistical significance test (e.g., t-test or chi-square)

Measuring User’s Preference Proposal: a semi-automated approach – Post thousands of “interface test tasks” to M-Turk – Observe how workers choose to complete the tasks (and how many times they do so) – Analyze the data to measure the preference How?

Example: Fitts’ law test Fittsʼ law models the time required to click a widget of a size and width—this technique can model how much people prefer to use a widget Width Distance Difficulty = f(width, distance) Each time they clicked on the bar, it moved to the opposite side of the screen Bar moves Click! For a given job, subjects are asked to click on a blue rectangle 60 times

Example: Fitts’ law test Participants were assigned one of three index of difficulty conditions. Each point is the number of clicks a participant completed before quitting (points jittered to show spread) Participants preferred big buttons to small buttons (p < 0.10) Participants were allowed a maximum of 3,060 clicks each The regression line accounts for this maximum using a Tobit analysis

Utility Utility in Economics: – The degree to which a person prefers a particular choice among options available When a user chooses to use system A instead of B, it’s said that Utility(A) > Utility(B) Use economic utility to quantify aggregate user preference – Example: If a user has no preference between (1) being paid $0.25 for using system A, and (2) being paid $0.50 for using system B – Money-metric of utility: |Utility(A) – Utility(B)| = $0.25

Measuring Utility Utility = f(task, interface, context) – A user finds values in completing a task, but takes some actions with a computer through some interface – And the user’s context matters (e.g., demographics, social, moral status, etc.) Preference measurement begins with determining how much you must pay people to convince them to use an interface for a task

Measuring Utility Reservation wage: the wage below which a worker will not take a task Present a worker with a job at a price and observe their behavior: the worker will either complete a task at a given price or not Gather/analyze all the data: (Interface ID, Worker ID, Wage, Number of Completions)

Measuring Utility Posting all scenarios/conditions simultaneously to M-Turk Handling selection bias via a mystery task with “??? price” Setting a limit on sub-tasks that a single worker can complete (e.g., 50) Handling market price fluctuations (as people likes to take high paying tasks)

Fitts’ Law Study Subjects clicked on a blue rectangle 60 times Each time they clicked on the bar, it moved to the opposite side of the screen Width Distance Difficulty = f(width, distance)

Fitts’ Law Study Price range: $0.01-$0.06 Difficulty: easy, medium, hard Each task: 60 clicks Upper limit of # tasks: 51 5 hours 15 minutes, $970

Aesthetics: CAPTCHAs

Survival graph shows how many workers made it through how many tasks, for each of the four experimental conditions Pretty and ugly lines are separated at the left, but converge toward the right – This suggests either that the utility effect of aesthetics fades over time, or that the types of users who complete many CAPTCHAs are more concerned with pay than aesthetics. The shaded regions are 95% confidence intervals