Conducting a User Study Human-Computer Interaction.

Slides:



Advertisements
Similar presentations
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Advertisements

Hypothesis Testing making decisions using sample data.
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Statistical Issues in Research Planning and Evaluation
Conducting a User Study Human-Computer Interaction.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
BHS Methods in Behavioral Sciences I April 25, 2003 Chapter 6 (Ray) The Logic of Hypothesis Testing.
Experimental Design, Statistical Analysis CSCI 4800/6800 University of Georgia Spring 2007 Eileen Kraemer.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Research Curriculum Session III – Estimating Sample Size and Power Jim Quinn MD MS Research Director, Division of Emergency Medicine Stanford University.
Inferential Stats for Two-Group Designs. Inferential Statistics Used to infer conclusions about the population based on data collected from sample Do.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Understanding Statistics in Research
Lecture 9: One Way ANOVA Between Subjects
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Chapter 14 Inferential Data Analysis
Inferential Statistics
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
AM Recitation 2/10/11.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Comparing Means From Two Sets of Data
Statistical Analysis Statistical Analysis
Conducting a User Study Human-Computer Interaction.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Statistical Power The ability to find a difference when one really exists.
Comparing Two Population Means
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Individual values of X Frequency How many individuals   Distribution of a population.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
User Study Evaluation Human-Computer Interaction.
Conducting a User Study Human-Computer Interaction.
Exam Exam starts two weeks from today. Amusing Statistics Use what you know about normal distributions to evaluate this finding: The study, published.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
© 2002 Prentice-Hall, Inc.Chap 7-1 Business Statistics: A First course 4th Edition Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
PowerPoint presentation to accompany Research Design Explained 6th edition ; ©2007 Mark Mitchell & Janina Jolley Chapter 10 The Simple Experiment.
Statistics for Managers 5th Edition Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
Experimental Design and Statistics. Scientific Method
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
F, t, and p Basic Statistics for Computer Scientists (aka knowing enough to be critical of user studies) April 4, 2002 Benjamin Lok.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Welcome to MM570 Psychological Statistics
Chapter 10 The t Test for Two Independent Samples
BHS Methods in Behavioral Sciences I May 9, 2003 Chapter 6 and 7 (Ray) Control: The Keystone of the Experimental Method.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Welcome to MM570 Psychological Statistics Unit 5 Introduction to Hypothesis Testing Dr. Ami M. Gates.
Inferential Statistics Psych 231: Research Methods in Psychology.
Confidence Intervals and Hypothesis Tests Week 5.
Psychological Experimentation The Experimental Method: Discovering the Causes of Behavior Experiment: A controlled situation in which the researcher.
Qualitative vs. Quantitative
Understanding Results
Conducting a User Study
Hypothesis Testing.
Type I and Type II Errors
BHS Methods in Behavioral Sciences I
Presentation transcript:

Conducting a User Study Human-Computer Interaction

Overview What is a study? What is a study? Empirically testing a hypothesis Empirically testing a hypothesis Evaluate interfaces Evaluate interfaces Why run a study? Why run a study? Determine ‘truth’ Determine ‘truth’ Evaluate if a statement is true Evaluate if a statement is true

Example Overview Ex. The heavier a person weighs, the higher their blood pressure Ex. The heavier a person weighs, the higher their blood pressure Many ways to do this: Many ways to do this: Look at data from a doctor’s office Look at data from a doctor’s office Descriptive design: What’s the pros and cons? Descriptive design: What’s the pros and cons? Get a group of people to get weighed and measure their BP Get a group of people to get weighed and measure their BP Analytic design: What’s the pros and cons? Analytic design: What’s the pros and cons? Ideally? Ideally? Ideal solution: have everyone in the world get weighed and BP Ideal solution: have everyone in the world get weighed and BP Participants are a sample of the population Participants are a sample of the population You should immediately question this! You should immediately question this! Restrict population Restrict population

Study Components Design Design Hypothesis Hypothesis Population Population Task Task Metrics Metrics Procedure Procedure Data Analysis Data Analysis Conclusions Conclusions Confounds/Biases Confounds/Biases

Study Design How are we going to evaluate the interface? How are we going to evaluate the interface? Hypothesis Hypothesis What statement do you want to evaluate? What statement do you want to evaluate? Population Population Who? Who? Metrics Metrics How will you measure? How will you measure?

Hypothesis Statement that you want to evaluate Statement that you want to evaluate Ex. A mouse is faster than a keyboard for numeric entry Ex. A mouse is faster than a keyboard for numeric entry Create a hypothesis Create a hypothesis Ex. Participants using a keyboard to enter a string of numbers will take less time than participants using a mouse. Ex. Participants using a keyboard to enter a string of numbers will take less time than participants using a mouse. Identify Independent and Dependent Variables Identify Independent and Dependent Variables Independent Variable – the variable that is being manipulated by the experimenter (interaction method) Independent Variable – the variable that is being manipulated by the experimenter (interaction method) Dependent Variable – the variable that is caused by the independent variable. (time) Dependent Variable – the variable that is caused by the independent variable. (time)

Hypothesis Testing Hypothesis: Hypothesis: People who use a mouse and keyboard will be faster to fill out a form than keyboard alone. People who use a mouse and keyboard will be faster to fill out a form than keyboard alone. US Court system: Innocent until proven guilty US Court system: Innocent until proven guilty NULL Hypothesis: Assume people who use a mouse and keyboard will fill out a form than keyboard alone in the same amount of time NULL Hypothesis: Assume people who use a mouse and keyboard will fill out a form than keyboard alone in the same amount of time Your job to prove that the NULL hypothesis isn’t true! Your job to prove that the NULL hypothesis isn’t true! Alternate Hypothesis 1: People who use a mouse and keyboard will fill out a form either faster or slower than keyboard alone. Alternate Hypothesis 1: People who use a mouse and keyboard will fill out a form either faster or slower than keyboard alone. Alternate Hypothesis 2: People who use a mouse and keyboard will fill out a form faster than keyboard alone. Alternate Hypothesis 2: People who use a mouse and keyboard will fill out a form faster than keyboard alone.

Population The people going through your study The people going through your study Anonymity Anonymity Type - Two general approaches Type - Two general approaches Have lots of people from the general public Have lots of people from the general public Results are generalizable Results are generalizable Logistically difficult Logistically difficult People will always surprise you with their variance People will always surprise you with their variance Select a niche population Select a niche population Results more constrained Results more constrained Lower variance Lower variance Logistically easier Logistically easier Number Number The more, the better The more, the better How many is enough? How many is enough? Logistics Logistics Recruiting (n>20 is pretty good) Recruiting (n>20 is pretty good)

Two Group Design Design Study Design Study Groups of participants are called conditions Groups of participants are called conditions How many participants? How many participants? Do the groups need the same # of participants? Do the groups need the same # of participants? Task Task What is the task? What is the task? What are considerations for task? What are considerations for task?

Design External validity – do your results mean anything? External validity – do your results mean anything? Results should be similar to other similar studies Results should be similar to other similar studies Use accepted questionnaires, methods Use accepted questionnaires, methods Power – how much meaning do your results have? Power – how much meaning do your results have? The more people the more you can say that the participants are a sample of the population The more people the more you can say that the participants are a sample of the population Pilot your study Pilot your study Generalization – how much do your results apply to the true state of things Generalization – how much do your results apply to the true state of things

Design People who use a mouse and keyboard will be faster to fill out a form than keyboard alone. People who use a mouse and keyboard will be faster to fill out a form than keyboard alone. Let’s create a study design Let’s create a study design Hypothesis Hypothesis Population Population Procedure Procedure Two types: Two types: Between Subjects Between Subjects Within Subjects Within Subjects

Procedure Formally have all participants sign up for a time slot (if individual testing is needed) Formally have all participants sign up for a time slot (if individual testing is needed) Informed Consent (let’s look at one) Informed Consent (let’s look at one) Execute study Execute study Questionnaires/Debriefing (let’s look at one) Questionnaires/Debriefing (let’s look at one)

IRB Let’s look at a completed one Let’s look at a completed one You MUST turn one in before you complete a study to the TA You MUST turn one in before you complete a study to the TA Must have OKed before running study Must have OKed before running study

Biases Hypothesis Guessing Hypothesis Guessing Participants guess what you are trying hypothesis Participants guess what you are trying hypothesis Learning Bias Learning Bias User’s get better as they become more familiar with the task User’s get better as they become more familiar with the task Experimenter Bias Experimenter Bias Subconscious bias of data and evaluation to find what you want to find Subconscious bias of data and evaluation to find what you want to find Systematic Bias Systematic Bias Bias resulting from a flaw integral to the system Bias resulting from a flaw integral to the system E.g. An incorrectly calibrated thermostat E.g. An incorrectly calibrated thermostat List of biases List of biases

Thought Experiment You are creating a new interface for Windows. You are creating a new interface for Windows. You are having your friends test your interface, what are their biases? You are having your friends test your interface, what are their biases? You are having your family test your interface, what are their biases? You are having your family test your interface, what are their biases? You are going to go through the Gainesville phonebook and call people to test your interface, what are their biases? You are going to go through the Gainesville phonebook and call people to test your interface, what are their biases?

Confounds Confounding factors – factors that affect outcomes, but are not related to the study Confounding factors – factors that affect outcomes, but are not related to the study Population confounds Population confounds Who you get? Who you get? How you get them? How you get them? How you reimburse them? How you reimburse them? How do you know groups are equivalent? How do you know groups are equivalent? Design confounds Design confounds Unequal treatment of conditions Unequal treatment of conditions Learning Learning Time spent Time spent

Metrics What you are measuring What you are measuring Types of metrics Types of metrics Objective Objective Time to complete task Time to complete task Errors Errors Ordinal/Continuous Ordinal/Continuous Subjective Subjective Satisfaction Satisfaction Pros/Cons of each type? Pros/Cons of each type?

Analysis Most of what we do involves: Most of what we do involves: Normal Distributed Results Normal Distributed Results Independent Testing Independent Testing Homogenous Population Homogenous Population Recall, we are testing the hypothesis by trying to prove the NULL hypothesis false Recall, we are testing the hypothesis by trying to prove the NULL hypothesis false

Raw Data Keyboard times Keyboard times What does mean mean? What does mean mean? What does variance and standard deviation mean? What does variance and standard deviation mean? E.g. 3.4, 4.4, 5.2, 4.8, 10.1, 1.1, 2.2 E.g. 3.4, 4.4, 5.2, 4.8, 10.1, 1.1, 2.2 Mean = 4.46 Mean = 4.46 Variance = 7.14 (Excel’s VARP) Variance = 7.14 (Excel’s VARP) Standard deviation = 2.67 (sqrt variance) Standard deviation = 2.67 (sqrt variance) What do the different statistical data tell us? What do the different statistical data tell us? User study.xlsx User study.xlsx

What does Raw Data Mean?

Roll of Chance How do we know how much is the ‘truth’ and how much is ‘chance’? How do we know how much is the ‘truth’ and how much is ‘chance’? How much confidence do we have in our answer? How much confidence do we have in our answer?

Hypothesis We assumed the means are “equal” We assumed the means are “equal” But are they? But are they? Or is the difference due to chance? Or is the difference due to chance? Ex. A μ 0 = 4, μ 1 = 4.1 Ex. A μ 0 = 4, μ 1 = 4.1 Ex. B μ 0 = 4, μ 1 = 6 Ex. B μ 0 = 4, μ 1 = 6

T - test T – test – statistical test used to determine whether two observed means are statistically different T – test – statistical test used to determine whether two observed means are statistically different

T-test Distributions Distributions

T – test (rule of thumb) Good values of t > 1.96 (rule of thumb) Good values of t > 1.96 Look at what contributes to t Look at what contributes to t htm htm

F statistic, p values F statistic – assesses the extent to which the means of the experimental conditions differ more than would be expected by chance F statistic – assesses the extent to which the means of the experimental conditions differ more than would be expected by chance t is related to F statistic t is related to F statistic Look up a table, get the p value. Compare to α Look up a table, get the p value. Compare to α α value – probability of making a Type I error (rejecting null hypothesis when really true) α value – probability of making a Type I error (rejecting null hypothesis when really true) p value – statistical likelihood of an observed pattern of data, calculated on the basis of the sampling distribution of the statistic. (% chance it was due to chance) p value – statistical likelihood of an observed pattern of data, calculated on the basis of the sampling distribution of the statistic. (% chance it was due to chance)

T and alpha values

Small PatternLarge Pattern t – test with unequal variance p – value t – test with unequal variance p - value PVE – RSE vs. VFHE – RSE ** *** PVE – RSE vs. HE – RSE ** * VFHE – RSE vs. HE – RSE

Significance What does it mean to be significant? What does it mean to be significant? You have some confidence it was not due to chance. You have some confidence it was not due to chance. But difference between statistical significance and meaningful significance But difference between statistical significance and meaningful significance Always know: Always know: samples (n) samples (n) p value p value variance/standard deviation variance/standard deviation means means