Useful Statistical Tools February 19, 2010. Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys.

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

TEST-TAKING STRATEGIES FOR THE OHIO ACHIEVEMENT READING ASSESSMENT
Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Statistical Issues in Research Planning and Evaluation
Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 11, 2012.
Conducting a User Study Human-Computer Interaction.
Analysis of frequency counts with Chi square
1 Matched Samples The paired t test. 2 Sometimes in a statistical setting we will have information about the same person at different points in time.
The Basics of Regression continued
Lecture 9: One Way ANOVA Between Subjects
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Meta-analysis & psychotherapy outcome research
One-way Between Groups Analysis of Variance
Chapter 14 Inferential Data Analysis
Test Preparation Strategies
Relationships Among Variables
Inferential Statistics
Survey Experiments. Defined Uses a survey question as its measurement device Manipulates the content, order, format, or other characteristics of the survey.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Chapter Eleven Inferential Tests of Significance I: t tests – Analyzing Experiments with Two Groups PowerPoint Presentation created by Dr. Susan R. Burns.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
Data Annotation for Classification. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Statistical Techniques I EXST7005 Conceptual Intro to ANOVA.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
CORRELATION & REGRESSION
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Goals for Today Review the basics of an experiment Learn how to create a unit-weighted composite variable and how/why it is used in psychology. Learn how.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
User Study Evaluation Human-Computer Interaction.
Associate Professor Arthur Dryver, PhD School of Business Administration, NIDA url:
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Exam Exam starts two weeks from today. Amusing Statistics Use what you know about normal distributions to evaluate this finding: The study, published.
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Conducting a User Study Human-Computer Interaction.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Assumes that events are governed by some lawful order
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Section 10.1 Confidence Intervals
Human-Computer Interaction. Overview What is a study? Empirically testing a hypothesis Evaluate interfaces Why run a study? Determine ‘truth’ Evaluate.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 22, 2012.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Day 10 Analysing usability test results. Objectives  To learn more about how to understand and report quantitative test results  To learn about some.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Difference Between Means Test (“t” statistic) Analysis of Variance (F statistic)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Hypothesis test flow chart
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Review Statistical inference and test of significance.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
Chapter 12 Power Analysis.
Experimental Design: The Basic Building Blocks
Chapter Ten: Designing, Conducting, Analyzing, and Interpreting Experiments with Two Groups The Psychologist as Detective, 4e by Smith/Davis.
Presentation transcript:

Useful Statistical Tools February 19, 2010

Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys

Aphorisms “Get close enough to know the task, but stay far enough to see the patterns.” "Humor happens, embrace it.“ "Much like improv, prom night, and getting into fights, the key to good contextual inquiry is to always say yes.“ "Learn as though you would never be able to master it; hold it as though you would be in fear of losing it.“ "Until you learn to interpret openly, you open yourself to mis-interpretation.“ "To know an answer, you must ask a question. To know a truth, you must contextually inquire the right question.” "Your participant does all the hard stuff. All you have to do is talk about it and check your work“ "You cannot learn if you already know, unless you first learn how to forget!“ "Listen to the people around you, including to those you know well -- but listen deeper.“ "Do, or do not. There is no try."

Any guesses From those who did not in? Juelaila has won the first cookie There is one cookie remaining

Aphorisms “Get close enough to know the task, but stay far enough to see the patterns.” "Humor happens, embrace it.“ "Much like improv, prom night, and getting into fights, the key to good contextual inquiry is to always say yes.“ "Learn as though you would never be able to master it; hold it as though you would be in fear of losing it.“ "Until you learn to interpret openly, you open yourself to mis-interpretation.“ "To know an answer, you must ask a question. To know a truth, you must contextually inquire the right question.” "Your participant does all the hard stuff. All you have to do is talk about it and check your work“ "You cannot learn if you already know, unless you first learn how to forget!“ "Listen to the people around you, including to those you know well -- but listen deeper.“ "Do, or do not. There is no try."

Aphorisms “Get close enough to know the task, but stay far enough to see the patterns.” "Humor happens, embrace it.“ "Much like improv, prom night, and getting into fights, the key to good contextual inquiry is to always say yes.“ "Learn as though you would never be able to master it; hold it as though you would be in fear of losing it.“ "Until you learn to interpret openly, you open yourself to mis-interpretation.“ "To know an answer, you must ask a question. To know a truth, you must contextually inquire the right question.” "Your participant does all the hard stuff. All you have to do is talk about it and check your work“ "You cannot learn if you already know, unless you first learn how to forget!“ "Listen to the people around you, including to those you know well -- but listen deeper.“ "Do, or do not. There is no try."

Cookies! "Do, or do not. There is no try.“ – Juelaila answered first "Until you learn to interpret openly, you open yourself to mis-interpretation.“ – No answers

Let’s discuss A few of these aphorisms Do you think that they help us understand the idea and practice of contextual inquiry better?

Your thoughts? “Get close enough to know the task, but stay far enough to see the patterns.”

Your thoughts? "Much like improv, prom night, and getting into fights, the key to good contextual inquiry is to always say yes.“

Your thoughts? "Until you learn to interpret openly, you open yourself to mis-interpretation.“

Your thoughts? "Your participant does all the hard stuff. All you have to do is talk about it and check your work“

Your thoughts? "You cannot learn if you already know, unless you first learn how to forget!“

Comments? Questions?

Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys

Useful Statistical Tools Power Analysis Meta-Analysis Imputation

Power Analysis A set of methods for determining The probability that you will obtain a statistically significant result, assuming a true effect size and sample size of a certain magnitude

Or The reverse Given a certain true effect size, and a desired probability of obtaining a statistically significant result, what sample size is needed?

Why? When? Why might a researcher want to do each type of power analysis? When might a researcher want to do each type of power analysis?

When used Effect size + Power --> Sample Size – Usually used before running study to pick sample size Effect size + Sample Size --> Power – Usually used after running study to explain to thesis committee why more subjects are needed

Power analysis Can be computed from – “Effect Size”/ Cohen’s d (M1 – M2)/ (pooled SD, e.g.  ) – r – Difference in two r values – And several other metrics

Power analysis Can be computed for – Single-group t-test – Two-group t-test – Paired t-test – F test – Sign test – Etc., etc., etc.

Mathematical Details Differ for different statistical tests and metrics Possible to do this in online power calculators

Sign Test Example (Courtesy of John McDonald)

What is a good value for power? Conventionally, power = 0.80 is treated as “good” Kind of a magic number

Comments? Questions?

I need 3 volunteers

Play with calculator Two-sample t-test

Volunteer #1 If the true effect size is 0.5  how big a sample do you need to achieve Power = 0.8?

Volunteer #2 If the true effect size is 0.2  how big a sample do you need to achieve Power = 0.8?

Volunteer #3 If your control condition gains 20 points pre-post And your experimental condition gains 40 points pre- post And the pooled standard deviation is 30 points And you have 20 students in each condition What’s your statistical power?

Comments? Questions?

How can statistical power be increased? Both in theory, and in real life

How can statistical power be increased? Increase sample size

How can statistical power be increased? Increase difference in means – Make your intervention better

How can statistical power be increased? Increase difference in means – Make your control condition worse Some researchers make the mistake of picking a control condition that’s impossibly good – ScienceAssistments versus ScienceAssistments, with one less potential IV This doesn’t mean you should fish for a control condition that is absurdly awful – DrScheme versus Learning programming through interpretive dance – Miley’s World versus Learning math through reading textbooks

How can statistical power be increased? Increase difference in means – Make your control condition worse Some researchers make the mistake of picking a control condition that’s impossibly good – ScienceAssistments versus ScienceAssistments, with one less potential IV This doesn’t mean you should fish for a control condition that is absurdly awful – DrScheme versus Learning programming through interpretive dance – Miley’s World versus Learning math through reading textbooks written in Danish

How can statistical power be increased? Reduce standard deviation – What methods have we discussed in class that could help us do this?

How can statistical power be increased? Reduce standard deviation – What methods have we discussed in class that could help us do this? Stratification

Comments? Questions?

Meta-Analysis

Very important point, right up front There is meta-analysis And then there are the statistical techniques used in meta-analysis – Much broader in application than just classical meta-analysis!

Meta-Analysis In the classic sense, integrating across a set of previous studies, to attempt to find an overall effect size or significance of finding across all those studies

Examples Kulik & Kulik (1991) computer-aided instruction does 0.3  better than traditional instruction Cohen, Kulik, & Kulik (1982) found that expert tutors do 2.3  better than traditional instruction; novice tutors only do 0.4  better than traditional instruction

Process of doing a meta-analysis Find all the studies on topic of interest Find measure of interest (effect size or statistical significance) Integrate across studies

Challenges What might make it difficult to Find all the studies on topic of interest ?

Challenges to Finding all Studies Knowing what terminology to use in literature review – many phenomena have many names – Off-task behavior, Time-on-task, Percent On-Task, Attention – Gaming the system, Systematic Guessing, Hint Abuse, Help Abuse, Executive Help-Seeking, Letaxmaning, Off-Task Gaming Behavior, Player Transformation, Goal Structure Misalignment

Challenges to Finding all Studies “File-Drawer Effect” – Papers with null results get rejected by conference program committees and journal reviewers – Papers with null results don’t get submitted in the first place

Find measure of interest Statistical significance – If you can find a p, you can turn it into a Z, and you’re good to go Using Z formula in Excel, or a Z-p table – Set direction on Z to be consistent E.g. all studies with finding X are positive All studies with finding not-X are negative

Find measure of interest Effect size – Transform values into correlations or Cohen’s d values

Why might you… Why might you want to do meta-analysis on effect size versus statistical significance?

Integrating Across Studies Two cases Studies are independent Studies are non-independent

Studies are Independent By far the statistically easier case

Aggregating significance tests Stouffer’s Z For N studies, each with Z value  Z sqrt(N)

Volunteer?

Example Five studies on the effects of taking gym class on mathematics performance – Two studies found positive effect of taking gym class, p= 0.02, p=0.06 – Three studies found negative effect of taking gym class, p=0.05, p=0.11, p=0.75 – One-tailed Z table on the next slide

Z table

Aggregating correlations Convert r to Fisher z’ For N studies, each with z’ value  z’  N Then convert the result back to r

Why Fisher z’? Equal differences between any two Fisher z’ values are equal in significance Whereas r is uneven – From r=0.8 to 0.9 is a bigger difference in significance than r=0.2 to r=0.3 – So transformation is necessary to weight all differences in correlation equally

Volunteer?

Example Five studies on the effects of learning computer programming on popularity – Two studies found positive correlation, r = 0.1, r= 0.3 – Three studies found negative correlation, r = - 0.8, r = - 0.6, r = – Fisher z’ table on the next slide

r z'

Comments? Questions?

Studies are non-independent Generally taken to mean that same sample (at least in part) is involved The case where there is non-independence due to similar (or the same) learning materials is generally not considered, as inter- correlation is low and difficult to compute

Math is “complex” Strube’s (1985) Adjusted Z is used instead of Stouffer’s Z in these cases – Accounts for correlation of different data points for the same subject Similar approach for effect size

Comments? Questions?

Other Uses of These Techniques

Non-independence in modeling Take the case where you are studying whether an EDM model is statistically significantly different than chance – N actions involving M students It is extremely invalid to do a statistical significance test involving N actions – Assumes each action is independent of each other action But it biases towards non-significance to collapse the N data points into one data point per student

Solution Do separate statistical significance test within each student (actions can be treated as independent of each other, once student is accounted for) Then use Stouffer’s Z to aggregate across students

To see examples… There is not time to discuss the math in detail today, but see examples in – Baker, Corbett, & Aleven (2008) – Baker, Corbett, Roll, & Koedinger (2008)

Comments? Questions?

Imputation

In data sets with large amounts of data per data point – For instance, extremely long surveys or demographic data It is common to have small amounts of missing data in each data point – E.g. variable 17 missing for students 1, 14, 90, 112, 202, 477

In these cases… It may be undesirable to throw out every data point that has a missing response – You might end up losing 30-40% of your data, or more, and biasing your data For instance, people who occasionally fail to respond to survey items probably differ systematically from people who dilligently and carefully answer every question

Imputation For each data point missing a value Find a set of “similar” data point that is not missing that value – Similar data point has low absolute difference across non-missing variables Randomly choose one of the non-missing values to fill in the missing data

Multiple Imputation Create 3-10 data sets in this fashion Then for all the missing data, find the mean (and SD) across all imputed data sets Use the no-longer-missing data in future analyses

An alternative: regression imputation Find set of linear regression functions predicting each variable from all other variables Use this function to fill in missing data

Advantages? Disadvantages? Multiple Imputation Regression Imputation Throwing out all data points with missing variables

Comments? Questions?

Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys

Probing Question Observation: Relatively few researchers use power analysis when designing their studies. Why? Are they making a mistake?

Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys

Assignment #5 Any questions?

Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys