Controlled Experiments Part 2. Basic Statistical Tests Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material.

Slides:



Advertisements
Similar presentations
Win8 on Intel Programming Course Win8 for developers, in detail Cédric Andreolli Intel.
Advertisements

10+10 Descending the Design Funnel Chapter 1.4 in Sketching User Experiences: The Workbook.
The State Transition Diagram
Win8 on Intel Programming Course Win8 and Intel Paul Guermonprez Intel Software
Why Should I Sketch? Chapter 1.2 in Sketching User Experiences: The Workbook.
The Keyboard Study Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material in this deck is used from other.
Evaluation - Controlled Experiments What is experimental design? What is an experimental hypothesis? How do I plan an experiment? Why are statistics used?
Controlled experiments Traditional scientific method Reductionist –clear convincing result on specific issues In HCI: –insights into cognitive process,
Controlled Experiments
The Branching Storyboard Chapter 4.3 in Sketching the User Interface: The Workbook Image from:
Statistics. Review of Statistics Levels of Measurement Descriptive and Inferential Statistics.
Statistical Tests Karen H. Hagglund, M.S.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Sources of Data Levin and Fox Ch. 1: The Experiment The Survey Content Analysis Participant Observation Secondary Analysis 1.
Social Research Methods
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Sequential Storyboards Chapter 4.1 in Sketching the User Interface: The Workbook Image from:
Six Sigma Training Dr. Robert O. Neidigh Dr. Robert Setaputra.
CPSC 581 Human Computer Interaction II Interaction Design Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material.
Statistical hypothesis testing – Inferential statistics I.
Collecting Images & Clippings Chapter 2.3 in Sketching User Experiences: The Workbook.
Graphical Screen Design Part 1: Contrast, Repetition, Alignment, Proximity Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada.
CHAPTER 4 Research in Psychology: Methods & Design
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Statistics in psychology Describing and analyzing the data.
Win8 on Intel Programming Course The challenge Paul Guermonprez Intel Software
Evaluation - Controlled Experiments What is experimental design? What is an experimental hypothesis? How do I plan an experiment? Why are statistics used?
HCI 510 : HCI Methods I HCI Methods –Controlled Experiments.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
What is a sketch? Chapter 1.2 addendum Sketching User Experiences: The Workbook.
Chapter 15 Data Analysis: Testing for Significant Differences.
Internet of Things with Intel Edison Compiling and running Pierre Collet Intel Software.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
User Study Evaluation Human-Computer Interaction.
Analyzing and Interpreting Quantitative Data
Yes….. I expect you to read it! Maybe more than once!
The Animated Sequence Chapter 5.1 in Sketching User Experiences: The Workbook.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
The Sketchbook Chapter 1.4 in Sketching User Experiences: The Workbook.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Experimental Psychology PSY 433 Appendix B Statistics.
Sketching Vocabulary Chapter 3.4 in Sketching User Experiences: The Workbook Drawing objects, people, and their activities.
Win8 on Intel Programming Course Paul Guermonprez Intel Software
Chapter Eight: Using Statistics to Answer Questions.
Design of Everyday Things Part 2: Useful Designs? Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Images from:
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
1 Outline 1. Why do we need statistics? 2. Descriptive statistics 3. Inferential statistics 4. Measurement scales 5. Frequency distributions 6. Z scores.
Descriptive and Inferential Statistics Or How I Learned to Stop Worrying and Love My IA.
The Narrative Storyboard Chapter 4.4 in Sketching User Experiences: The Workbook.
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 7 Analyzing and Interpreting Quantitative Data.
STATISTICS IN PSYCHOLOGY Describing and analyzing the data.
Appendix I A Refresher on some Statistical Terms and Tests.
Outline Sampling Measurement Descriptive Statistics:
Controlled Experiments
Statistics in psychology
Controlled experiments Traditional scientific method Reductionist clear convincing result on specific issues In HCI: insights into cognitive process,
Sketching Vocabulary Chapter 3.4 in Sketching User Experiences: The Workbook Drawing objects, people, and their activities.
CHAPTER 4 Research in Psychology: Methods & Design
Data analysis Research methods.
Statistics in psychology
Agenda Video pre-presentations Digital sketches & photo traces
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
Controlled Experiments
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Methodology Overview 2 basics in user studies Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material in this.
Analyzing and Interpreting Quantitative Data
Introduction to Statistics
DESCRIPTIVE STATISTICS BORAM KANG
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Controlled Experiments Part 2. Basic Statistical Tests Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material in this deck is used from other sources without permission. Credit to the original source is given if it is known,

Outline Scales of measurements Logic of null hypothesis testing

Scales of Measurements Four major scales of measurements Nominal Ordinal Interval Ratio

Nominal Scale Classification into named or numbered unordered categories country of birth, user groups, gender… Allowable manipulations whether an item belongs in a category counting items in a category Statistics number of cases in each category most frequent category no means, medians… With permission of Ron Wardell

Nominal Scale Sources of error agreement in labeling, vague labels, vague differences in objects Testing for error agreement between different judges for same object With permission of Ron Wardell

Ordinal Scale Classification into named or numbered ordered categories no information on magnitude of differences between categories e.g. preference, social status, gold/silver/bronze medals Allowable manipulations as with interval scale, plus merge adjacent classes transitive: if A > B > C, then A > C Statistics median (central value) percentiles, e.g., 30% were less than B Sources of error as in nominal With permission of Ron Wardell

Interval Scale Classification into ordered categories with equal differences between categories zero only by convention e.g. temperature (C or F), time of day Allowable manipulations add, subtract cannot multiply as this needs an absolute zero Statistics mean, standard deviation, range, variance Sources of error instrument calibration, reproducibility and readability human error, skill… With permission of Ron Wardell

Ratio Scale Interval scale with absolute, non-arbitrary zero e.g. temperature (K), length, weight, time periods Allowable manipulations multiply, divide With permission of Ron Wardell

Example: Apples Nominal: apple variety oMacintosh, Delicious, Gala… Ordinal: apple quality oUS. Extra Fancy oU.S. Fancy, oU.S. Combination Extra Fancy / Fancy oU.S. No. 1 oU.S. Early oU.S. Utility oU.S. Hail With permission of Ron Wardell

Example: Apples Interval: apple ‘Liking scale’ Marin, A. Consumers’ evaluation of apple quality. Washington Tree Postharvest Conference After taking at least 2 bites how much do you like the apple? Dislike extremelyNeither like or dislikeLike extremely Ratio: apple weight, size, … With permission of Ron Wardell

Running Example: which interface is best for tapping (speed)? “Reciprocal tapping task” – alternately tap these buttons 100 times Click me! Get 10 people to complete this tapping task for each of the three input techniques.

Statistics Descriptive Statistics: used to describe the sample Central tendency: mean, median, mode Dispersion: range (min, max), standard deviation … and so on Median: useful when there are outliers in the sample data

Standard Deviation σ = standard deviation N = # of samples you have μ = mean x i = data point value Note: there are slight differences for sample and population standard deviations, but this is the basic idea

Running Example: which interface is best for tapping (speed)? “Reciprocal tapping task” – alternately tap these buttons 100 times Click me! Average time per click (s) 1.00s1.50s1.55s Get 10 people to complete this tapping task for each of the three input techniques.

Statistics Descriptive Statistics: used to describe the sample Central tendency: mean, median, mode Dispersion: range (min, max), standard deviation … and so on Inferential Statistics: statistical tests applied to sample data to make inferences about the population || tools that help us to make decisions/make statements about the population based on the sample

Null Hypothesis Testing Procedure 1.Decide on a null hypothesis null hypothesis: mouse speed = touch speed 2.Decide on an alternate hypothesis alternate hypothesis: mouse speed != touch speed 3.Decide on an alpha level alpha = 0.05  fairly typical (by convention) 4.Run your inferential statistics test. If p < alpha, then reject null hypothesis; otherwise, you do not reject the null hypothesis.

Type I vs. Type II Errors Spam filter: decision making on whether an is good mail (true) or spam (false) Note: the mp3 file podcast describes this figure with the axes transposed

Null Hypothesis Testing Procedure 1.Decide on a null hypothesis null hypothesis: mouse speed = touch speed 2.Decide on an alternate hypothesis alternate hypothesis: mouse speed != touch speed 3.Decide on an alpha level alpha = 0.05  fairly typical (by convention) 4.Run your inferential statistics test. If p < alpha, then reject null hypothesis; otherwise, you do not reject the null hypothesis. Parametric vs. non-parametric tests

alpha and p-value alpha value = probability of Type I error (i.e. probability of our rejecting the null hypothesis when it is in fact true) – often 0.05 or 0.01 by convention p-value = probability of seeing the data if the null hypothesis were true

Running Example: which interface is best for tapping (speed)? Average time per click (s) 1.00s1.50s1.55s Get 10 people to complete this tapping task for each of the three input techniques. Is it possible to collect samples of data where the averages work out the way they have here, even if the true values are equivalent (i.e. null hypothesis is true)? YES!

Logic of Null Hypothesis Testing We have an incumbent theory about the world. We might have something new to try (e.g. new drug, new interaction technique, etc.). The null hypothesis is that these things are basically the same (i.e. we have done no better). The alternate hypothesis is that they are different. We need to make a call about how willing we are to be wrong about the decision that we make about the population based on the sample. This is setting the alpha level. When we run the test, we simply take the p-value (likelihood of having our data if the null hypothesis were true), and compare it to alpha.

Some clarifications about p-value and alpha p-value and “more significant”. All this means is probability of seeing the data if the null hypothesis were true. p= is not “more significant” than p=0.01. alpha value of 0.05 or 0.01 is not magical. It was completely arbitrary. If my alpha was 0.05, and my p-value was 0.06, is this “borderline significant”? No. You have failed to reject the null hypothesis. That’s it. Statistical significance is not the same as practical significance.

Permissions You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution — You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work) by citing: “Lecture materials by Saul Greenberg, University of Calgary, AB, Canada. Noncommercial — You may not use this work for commercial purposes, except to assist one’s own teaching and training within commercial organizations. Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. With the understanding that: Not all material have transferable rights — materials from other sources which are included here are cited Waiver — Any of the above conditions can be waived if you get permission from the copyright holder. Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license. Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.