Fall 2002CS/PSY 67501 Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results.

Slides:



Advertisements
Similar presentations
Interaksi Manusia Komputer Evaluasi Empiris1/68 EVALUASI EMPIRIS Pengenalan Evaluasi Empiris Perancangan Eksperimen Partisipasi, IRB dan Etika Pengumpulan.
Advertisements

Formulating The Method
Significance and probability Type I and II errors Practical Psychology 1 Week 10.
Unit 1: Science of Psychology
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 8 Copyright © 2015 by R. Halstead. All rights reserved.
Data analysis and interpretation. Agenda Part 2 comments – Average score: 87 Part 3: due in 2 weeks Data analysis.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Using Statistics in Research Psych 231: Research Methods in Psychology.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Experiments Testing hypotheses…. Agenda Homework assignment Review evaluation planning Observation continued Empirical studies In-class practice.
THE QUANTITATIVE RESEARCH APPROACH
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
Experiments Testing hypotheses…. Recall: Evaluation techniques  Predictive modeling  Questionnaire  Experiments  Heuristic evaluation  Cognitive.
Lecture 9: One Way ANOVA Between Subjects
Intro to Evaluation See how (un)usable your software really is…
Today Concepts underlying inferential statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Inferential Statistics
AM Recitation 2/10/11.
CHAPTER 4 Research in Psychology: Methods & Design
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Spring /6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis.
User Study Evaluation Human-Computer Interaction.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Statistics Psych 231: Research Methods in Psychology.
Skewness and Curves 10/1/2013. Readings Chapter 2 Measuring and Describing Variables (Pollock) (pp.37-44) Chapter 6. Foundations of Statistical Inference.
Choosing the Appropriate Statistics Dr. Erin Devers October 17, 2012.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Data analysis and interpretation. Project part 3 Watch for comments on your evaluation plans Finish your plan – Finalize questions, tasks – Prepare scripts.
Chapter 1: Science of Psychology Daily Objective (concept map): Apply basic statistical concepts to explain research findings: - Descriptive Statistics:
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Chapter 8 Usability Specification Techniques Hix & Hartson.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Chapter 6: Analyzing and Interpreting Quantitative Data
Tuesday, April 8 n Inferential statistics – Part 2 n Hypothesis testing n Statistical significance n continued….
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
Observation & Experiments Watch, listen, and learn…
Part 4: Evaluation Days 25, 27, 29, 31 Chapter 20: Why evaluate? Chapter 21: Deciding on what to evaluate: the strategy Chapter 22: Planning who, what,
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 13 Understanding research results: statistical inference.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Inferential Statistics Psych 231: Research Methods in Psychology.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Appendix I A Refresher on some Statistical Terms and Tests.
Statistics In Research
User Interface Evaluation
CHAPTER 4 Research in Psychology: Methods & Design
Data analysis and interpretation
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Inspecting your data Analyzing & interpreting results
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Analyzing Reliability and Validity in Outcomes Assessment
Collecting and Interpreting Quantitative Data
Presentation transcript:

Fall 2002CS/PSY Empirical Evaluation Analyzing data, Informing design, Usability Specifications Inspecting your data Analyzing & interpreting results Using the results in your design Usability specifications

Fall 2002CS/PSY Data Inspection Look at the results First look at each participant’s data  Were there outliers, people who fell asleep, anyone who tried to mess up the study, etc.? Then look at aggregate results and descriptive statistics

Fall 2002CS/PSY Inspecting Your Data “What happened in this study?” Keep in mind the goals and hypotheses you had at the beginning Questions:  Overall, how did people do?  “5 W’s” (Where, what, why, when, and for whom were the problems?)

Fall 2002CS/PSY Descriptive Statistics For all variables, get a feel for results: Total scores, times, ratings, etc. Minimum, maximum Mean, median, ranges, etc. What is the difference between mean & median? Why use one or the other?  e.g. “Twenty participants completed both sessions (10 males, 10 females; mean age 22.4, range years).”  e.g. “The median time to complete the task in the mouse-input group was 34.5 s (min=19.2, max=305 s).”

Fall 2002CS/PSY Subgroup Stats Look at descriptive stats (means, medians, ranges, etc.) for any subgroups  e.g. “The mean error rate for the mouse- input group was 3.4%. The mean error rate for the keyboard group was 5.6%.”  e.g. “The median completion time (in seconds) for the three groups were: novices: 4.4, moderate users: 4.6, and experts: 2.6.”

Fall 2002CS/PSY Plot the Data Look for the trends graphically

Fall 2002CS/PSY Other Presentation Methods 0 20 Mean lowhigh Middle 50% Time in secs. Age Box plot Scatter plot

Fall 2002CS/PSY Experimental Results How does one know if an experiment’s results mean anything or confirm any beliefs? Example: 40 people participated, 28 preferred interface 1, 12 preferred interface 2 What do you conclude?

Fall 2002CS/PSY Inferential (Diagnostic) Stats Tests to determine if what you see in the data (e.g., differences in the means) are reliable (replicable), and if they are likely caused by the independent variables, and not due to random effects  e.g., t-test to compare two means  e.g., ANOVA (Analysis of Variance) to compare several means  e.g., test “significance level” of a correlation between two variables

Fall 2002CS/PSY Means Not Always Perfect Experiment 1 Group 1 Group 2 Mean: 7 Mean: 10 1,10,10 3,6,21 Experiment 2 Group 1 Group 2 Mean: 7 Mean: 10 6,7,8 8,11,11

Fall 2002CS/PSY Inferential Stats and the Data Ask diagnostic questions about the data Are these really different? What would that mean?

Fall 2002CS/PSY Hypothesis Testing Recall: We set up a “null hypothesis”  e.g., there should be no difference between the completion times of the three groups  Or, H 0 : Time Novice = Time Moderate = Time Expert Our real hypothesis was, say, that experts should perform more quickly than novices

Fall 2002CS/PSY Hypothesis Testing “Significance level” (p):  The probability that your null hypothesis was wrong, simply by chance  Can also think of this as the probability that your “real” hypothesis (not the null), is wrong  The cutoff or threshold level of p (“alpha” level) is often set at 0.05, or 5% of the time you’ll get the result you saw, just by chance  e.g. If your statistical t-test (testing the difference between two means) returns a t-value of t=4.5, and a p-value of p=.01, the difference between the means is statistically significant

Fall 2002CS/PSY Errors Errors in analysis do occur Main Types:  Type I/False positive - You conclude there is a difference, when in fact there isn’t  Type II/False negative - You conclude there is no different when there is  Dreaded Type III

Fall 2002CS/PSY Drawing Conclusions Make your conclusions based on the descriptive stats, but back them up with inferential stats  e.g., “The expert group performed faster than the novice group t(1,34) = 4.6, p >.01.” Translate the stats into words that regular people can understand  e.g., “Thus, those who have computer experience will be able to perform better, right from the beginning…”

Fall 2002CS/PSY Beyond the Scope… Note: We cannot teach you statistics in this class, but make sure you get a good grasp of the basics during your student career, perhaps taking a stats class.

Fall 2002CS/PSY Feeding Back Into Design Your study, was designed to yield information you can use to redesign your interface What were the conclusions you reached? How can you improve on the design? What are quantitative benefits of the redesign?  e.g., 2 minutes saved per transaction, which means 24% increase in production, or $45,000,000 per year in increased profit What are qualitative, less tangible benefit(s)?  e.g., workers will be less bored, less tired, and therefore more interested --> better cust. service

Fall 2002CS/PSY Usability Specifications “Is it good enough… …to stop working on it? …to get paid?” Quantitative usability goals, used a guide for knowing when interface is “good enough” Should be established as early as possible  Generally a large part of the Requirements Specifications at the center of a design contract  Evaluation is often used to demonstrate the design meets certain requirements (and so the designer/developer should get paid)  Often driven by competition’s usability, features, or performance

Fall 2002CS/PSY Formulating Specifications They’re often more useful than this…

Fall 2002CS/PSY Measurement Process “If you can’t measure it, you can’t manage it” Need to keep gathering data on each iterative evaluation and refinement Compare benchmark task performance to specified levels Know when to get it out the door!

Fall 2002CS/PSY What is Included? Common usability attributes that are often captured in usability specs:  Initial performance  Long-term performance  Learnability  Retainability  Advanced feature usage  First impression  Long-term user satisfaction

Fall 2002CS/PSY Assessment Technique Usability Measure Value to Current Worst Planned Best poss Observ attribute instrum. be meas. level perf. level target level level results Initial Benchmk Length of 15 secs 30 secs 20 secs 10 secs perf task time to (manual) successfully add appointment on the first trial First Quest ?? impression Explain How will you judge whether your design meets the criteria?

Fall 2002CS/PSY Fields Measuring Instrument  Questionnaires, Benchmark tasks Value to be measured  Time to complete task  Number of percentage of errors  Percent of task completed in given time  Ratio of successes to failures  Number of commands used  Frequency of help usage Target level  Often established by comparison with competing system or non- computer based task

Fall 2002CS/PSY Summary Usability specs can be useful in tracking the effectiveness of redesign efforts They are often part of a contract Designers can set their own usability specs, even if the project does not specify them in advance Know when it is good enough, and be confident to move on to the next project