Inference, Representation, Replication & Programmatic Research

Slides:

Advertisements

Similar presentations

Cluster Analysis Purpose and process of clustering Profile analysis Selection of variables and sample Determining the # of clusters.

Advertisements

Kxk BG Factorial Designs expanding the 2x2 design reasons for larger designs statistical analysis of kxk BG factorial designs using LSD for kxk factorial.

Business Statistics for Managerial Decision

Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.

INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.

Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~

Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.

Meta Analysis & Programmatic Research Re-introduction to Programmatic research Importance of literature reviews Meta Analysis – the quantitative contribution.

Multivariate Analyses & Programmatic Research Re-introduction to Programmatic research Factorial designs  “It Depends” Examples of Factorial Designs Selecting.

Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.

Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:

Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.

Some Details about Bivariate Stats Tests Conceptualizing the four stats tests Conceptualizing NHST, critical values and p-values NHST and Testing RH: Distinguishing.

Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.

Ch. 9 Fundamental of Hypothesis Testing

Multiple-Group Research Designs Limitations of 2-group designs “Kinds” of Treatment & Control conditions Kinds of Causal Hypotheses k-group ANOVA & Pairwise.

2x2 BG Factorial Designs Definition and advantage of factorial research designs 5 terms necessary to understand factorial designs 5 patterns of factorial.

An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification.

Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.

Effect Sizes, Power Analysis and Statistical Decisions Effect sizes -- what and why?? review of statistical decisions and statistical decision errors statistical.

Research Methods Steps in Psychological Research Experimental Design

Inferential Statistics

Hypothesis Testing:.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.

S519: Evaluation of Information Systems Week 14: April 7, 2008.

Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.

Types of validity we will study for the Next Exam... internal validity -- causal interpretability external validity -- generalizability statistical conclusion.

Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.

An Overview of Statistics

QUANTITATIVE RESEARCH Hypothesizing, counting, and reporting.

Essential Statistics Chapter 131 Introduction to Inference.

From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.

Preparing for the 2 nd Hourly. What is an hourly? An hourly is the same thing as an in-class test. How many problems will be on the hourly? There will.

Intro to Factorial Designs The importance of “conditional” & non-additive effects The structure, variables and effects of a factorial design 5 terms necessary.

Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.

Section 10.1 Confidence Intervals

Sampling and Probability Chapter 5. Sampling & Elections >Problems with predicting elections: Sample sizes are too small Samples are biased (also tied.

Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.

Foundations of Sociological Inquiry Statistical Analysis.

Lab 9: Two Group Comparisons. Today’s Activities - Evaluating and interpreting differences across groups – Effect sizes Gender differences examples Class.

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.

Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.

CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.

BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.

Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative

Intro to AP Statistics and Exam

CHAPTER 9 Testing a Claim

INF397C Introduction to Research in Information Studies Spring, Day 12

Unit 5: Hypothesis Testing

Inference and Tests of Hypotheses

Warm Up Check your understanding P. 586 (You have 5 minutes to complete) I WILL be collecting these.

Part III – Gathering Data

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Data, Univariate Statistics & Statistical Inference

Hypothesis Tests for a Population Mean,

Stat 217 – Day 28 Review Stat 217.

CHAPTER 9 Testing a Claim

INTEGRATED LEARNING CENTER

Significance Tests: The Basics

Significance Tests: The Basics

Reasoning in Psychology Using Statistics

Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.

Reasoning in Psychology Using Statistics

CHAPTER 9 Testing a Claim

Two Halves to Statistics

CHAPTER 9 Testing a Claim

Presentation transcript:

Inference, Representation, Replication & Programmatic Research Populations & Samples -- Parameters & Statistics Descriptive vs. Inferential Statistics How we evaluate and support our inferences Identifying & evaluating replications Extracting replications from larger studies

Parameter -- summary of a population characteristic Just a bit of review … The group that we are hoping our data will represent is the … The list of folks in that group that we have ... The folks we select from that list with the intent of obtaining data from each ... The group from which we actually collect data is the ... population sampling frame selected sample data sample Some new language … Parameter -- summary of a population characteristic Statistic -- summary of a sample characteristic Just two more … Descriptive Statistic -- calculated from sample data to describe the sample Inferential Statistic -- calculated from sample data to infer about a specific population parameter

Identify each of the following… So, let’s put it together -- first a simple one … The purpose of the study was to determine the mean number of science credits taken by all students previously enrolled in Psyc350. We got the list of everybody enrolled last fall and from it selected 20 students to ask about their science coursework. It turned out that 3 had dropped, one was out of town and another didn’t remember whether they had taken 1 chem course or two. Sampling -- Identify each of the following... Target population Sampling frame Selected sample Data sample Last fall’s enrollment list Psyc350 enrollees 20 from last fall 15 providing data Identify each of the following… Mean # science credits from all students previously enrolled in Psyc350 Mean # science credits from 15 students enrolled in Psyc350 during the fall semester who were in the data sample Mean # science credits from all students previously enrolled in Psyc350 estimated from the 15 students in the data sample population parameter descriptive statistic inferential statistic

Descriptive statistics obtained from the sample Reviewing descriptive and inferential statistics … Remember that the major difference between descriptive and inferential statistics is intent – what information you intend to get from the statistic Descriptive statistics obtained from the sample used to describe characteristics of the sample used to determine if the sample represents the target population by comparing sample statistics and population parameters Inferential statistics used to describe, infer, estimate, approximate characteristics of the target population Parameters – description of population characteristics usually aren’t obtained from the population (we can’t measure everybody) ideally they are from repeated large samplings that produce consistent results, giving us confidence to use them as parameters Let’s look at an example of the interplay of these three…

First.. Sampling -- Identify each of the following... Each year we used to interview everybody from senior managers to part-time janitorial staff once a year to get a feel for “How things were going?”. Generally we found we had 70% female and 30% male employees, divided about 10% management, 70% clerical and 20% service/janitorial, with an average age of 32.5 (std=5.0) years and 7.4 (std= 5.0) years seniority. From these folks we usually got an overall satisfaction rating of about 5.2 (std = 1.1) on a 7-point scale. With the current cost increases we can no longer interview everybody. So, this year we had a company that conducts surveys complete the interview using a sample of 120 employees who volunteered in response to a notice in the weekly company newsletter. We were very disappointed to find that the overall satisfaction rating had dropped to 3.1 (std=1.0). At a meeting to discuss how to improve worker satisfaction, one of the younger managers asked to see the rest of the report and asked use to look carefully at one of the tables… Table 3 -- Sample statistics (From 100 completed interviews) gender 50 males (50%) 50 females (50%) job 34 mang. (34%) 30 cler. (30%) 36 ser/jan (36%) age mean = 21.3 std = 10 seniority mean = 2.1 std = 6 First.. Sampling -- Identify each of the following... Target population Sampling Frame Selected sample Data sample our company company newsletter 120 volunteers 100 who completed the survey

And now … Kinds of “values” -- identify all of each type … Each year we used to interview everybody from senior managers to part-time janitorial staff once a year to get a feel for “How things were going?”. Generally we found we had 70% female and 30% male employees, divided about 10% management, 70% clerical and 20% service/janitorial, with an average age of 32.5 (std=5.0) years and 7.4 (std= 5.0) years seniority. From these folks we usually got an overall satisfaction rating of about 5.2 (std = 1.1) on a 7-point scale. With the current cost increases we can no longer interview everybody. So, this year we had a company that conducts surveys complete the interview using a sample of 120 employees who volunteered in response to a notice in the weekly company newsletter. We were very disappointed to find that the overall satisfaction rating had dropped to 3.1 (std=1.0). At a meeting to discuss how to improve worker satisfaction, one of the younger managers asked to see the rest of the report and asked use to look carefully at one of the tables… Table 3 -- Sample statistics (From 100 completed interviews) gender 50 males (50%) 50 females (50%) job 34 mang. (34%) 30 cler. (30%) 36 ser/jan (36%) age mean = 21.3 std = 10 seniority mean = 2.1 std = 6 And now … Kinds of “values” -- identify all of each type … parameter descriptive statistics inferential statistic

And now … Kinds of “values” -- identify all of each type … Each year we used to interview everybody from senior managers to part-time janitorial staff once a year to get a feel for “How things were going?”. Generally we found we had 70% female and 30% male employees, divided about 10% management, 70% clerical and 20% service/janitorial, with an average age of 32.5 (std=5.0) years and 7.4 (std= 5.0) years seniority. From these folks we usually got an overall satisfaction rating of 5.2 (std = 1.1) on a 7-point scale. With the current cost increases we can no longer interview everybody. So, this year we had a company that conducts surveys complete the interview using a sample of 120 employees who volunteered in response to a notice in the weekly company newsletter. We were very disappointed to find that the overall satisfaction rating had dropped to 3.1 (std=1.0). At a meeting to discuss how to improve worker satisfaction, one of the younger managers asked to see the rest of the report and asked use to look carefully at one of the tables… Table 3 -- Sample statistics (From 100 completed interviews) gender 50 males (50%) 50 females (50%) job 34 mang. (34%) 30 cler. (30%) 36 ser/jan (36%) age mean = 21.3 std = 10 seniority mean = 2.1 std = 6 And now … Kinds of “values” -- identify all of each type … parameter descriptive statistics inferential statistic

We can compare sample descriptive statistics & population parameters Of course, the real question is whether the “3.1 rating” is cause for concern… Should we interpret the mean rating of 3.1 as indicating that the folks who work here are much less satisfied than they used to be? Why or why not? Looks bad, doesn’t it ? Well – depends upon whether the sample is representative of the population. Any way to check that? We can compare sample descriptive statistics & population parameters this sample is not representative of the population the sample is “too male,” “too managerial & janitorial”, “too young,” and “short-tenured” compared to the population parameters

How do we know our inferences are correct? Here’s the point… Our theories, hypotheses and implementations are about populations, but (because we can never, ever collect data from the entire population) our data come from samples !!! We infer that the data and analysis results we obtain from our sample tell us about the relationships between those variables in the population! How do we know our inferences are correct? we can never “know” – there will never be “proof” (only evidence) check the selection/sampling procedures we used we check that our sample statistics match known population parameters (when we know those parameters) we check if our results agree with previous results from “the lit” we perform replication and converging operations research Here’s another version of this same idea !!!

“Critical Experiment” vs. “Converging Operations” You might be asking yourself, “How can we sure we ‘got the study right’?” How can we be sure that we.. … have a sample that represents the target population? … have the best research design? … have good measures, tasks and a good setting? … did the right analyses and make the correct interpretations? Said differently – How can we be sure we’re running the right study in the right way ??? This question assumes the “critical experiment” approach to empirical research – that there is “one correct way to run the one correct study” and the answer to that study will be “proof”. For both philosophical and pragmatic reasons (that will become apparent as we go along) scientific psychologists have abandoned this approach and adopted “converging operations” – the process of running multiple comparable versions of each study and looking for consistency (or determining sources of inconsistencies) – also called the Research Loop

the “Research Loop” Novel RH: Replication Convergence Library Research Learning “what is known” about the target behavior Hypothesis Formation Based on Lib. Rsh., propose some “new knowledge” Research Design Determine how to obtain the data to test the RH: the “Research Loop” Data Collection Carrying out the research design and getting the data. Novel RH: Replication Convergence Draw Conclusions Decide how your “new knowledge” changes “what is known” about the target behavior Data Analysis Data collation and statistical analysis Hypothesis Testing Based on design properties and statistical results

“Comparable” studies -- replication The more similar the studies the more direct the test of replication and the more meaningful will be a “failure to replicate” The more differences between the studies the more “reasons” the results might not agree, and so, the less meaningful will be a “failure to replicate” Ground rules… Same or similar IV (qual vs. quant not important) Same or similar DV (qual vs. quant not important) Similar population, setting & task/stimulus Note if similar design (e.g., experiment or non-experiment)

“Comparing” studies -- replication If the studies are comparable, then the comparison is based on… Effect size (r) and direction/pattern Direction/pattern Be sure to take DV “direction” into account (e.g., measuring % correct vs. % error or “depression” vs. “happiness” Effect size Don’t get too fussy about effect size comparability … Remember .1 = small .3 = medium .5 = large Smaller but in the same direction is still pretty similar What about differences in “significance”??? If the effect sizes are similar, these are usually just “power” or “sample size” differences – far less important than effect size/direction !

Replication – some classic “conundrums” #1 Imagine there are four previous studies, all procedurally similar, all looking at the relationship between social skills and helping behavior. Here are the results… r(36) = .28, p>.05 r(65) = .27, p<.05 r(72) = .27, p<.05 r(31) = .28, p>.05 Do these studies replicate one another? Yes !!! the studies all found similar effects (size & direction) the differences in significance are due to power & sample size differences

Replication – some classic “conundrums” #2 Imagine there are four previous studies, all procedurally similar, all looking at the relationship between amount of therapy and decrease in depressive symptoms. Here are the results… r(36) = .25, p>.05 r(42) = .27, p>.05 r(51) = .27, p>.05 r(31) = .25, p>.05 Given these results, what is your “best guess” of the population correlation between social skills and helping behavior? r = .00, since none of the results were significant? r ≈ .25 - .27, since this is the consistent answer? I’d go with the .25-.27, but encourage someone to do an a priori power analysis before the next study!!!

Remember this one … Researcher #1 Acquired 20 computers of each type, had researcher assistants (working in shifts & following a prescribed protocol) keep each machine working continually for 24 hours & count the number of times each machine failed and was re-booted. Researcher #2 Acquired 20 computers of each type, had researcher assistants (working in shifts & following a prescribed protocol) keep each machine working continually for 24 hours or until it failed. PC Mac Mean failures PC = 5.7, std = 2.1 Mean failures Mac = 3.6, std = 2.1 F(1,38) = 10.26, p = .003 Failed Not 15 5 6 14 X2(1) = 8.12, p <.003  F / (F + df) =  10.26 / (10.26+38) r = .46  ² / N = 8.12 / 40 r = .45 So, by computing effect sizes and effect direction/pattern, we can compare these similar studies (same IV – conceptually similar DV) and see that the results replicate!

Try this one … Researcher #1 Asked each of 60 students whether or not they had completed the 20-problem on-line exam preparation and noted their scores on the exam (%). She used BG ANOVA to compare the mean % of the two groups. Researcher #2 Asked each of 304 students how many of the on-line exam questions they had completed and noted their scores on the exam (%). She used correlation to test for a relationship between these two quantitative variables. Completed Exam Prep = 83% No Exam Prep = 76% F(1,58) = 8.32, p = .003  F / (F + df) =  8.32 / (8.32+58) r = .35 r (301) = .12, p = .042 Comparing the two studies we see that while the effects are in the same direction (better performance is associated with “more” on-line exam practice), the size of the effects in the two studies is very different. Also, the significance of the second effect is due to the huge sample size!

And this one … Researcher #1 Interviewed 85 patients from a local clinic and recorded the number of weeks of therapy they had attended and the change in their wellness scores. She used correlation to examine the relationship between these two variables. Researcher #2 Assigned each of 120 patients to receive group or not and noted whether or not they had improved after 24 weeks. She used X2 to examine the relationship between these variables. Therapy Control Improved Not 45 15 25 35 X2(1) = 13.71, p <.001  ² / N = 13.71 / 1120 r = .34 r (83) = .30, p = .002 So, by computing effect sizes and effect direction/pattern, we can compare these similar studies (conceptually similar IV & DV) and see that the results show a strong replication!

Replication & Generalization in k-group Designs -- ANOVA Most k-group designs are an “expansion” or an extension of an earlier, simpler design When comparing with a 2-group design, be sure to use the correct conditions Study #1 Study #2 Mean failures PC = 5.7, std = 2.1 Mean failures Mac = 3.6, std = 2.1 F(1,38) = 10.26, p = .003 r =  F / (F + df) = .46 Mean failures IBM = 5.9, std = 2.1 Mean failures Dell = 3.8, std = 2.1 Mean failures Mac = 3.6, std = 2.1 F(2,57) = 10.26, p = .003, MSe = We need to know what “PC” means in the first study! What is “PCs” were IBM? What if PC were Dell? What if PC were something else?

Replication & Generalization in k-group Designs -- X2 Most k-group designs are an “expansion” or an extension of an earlier, simpler design Study #1 Study #2 PC Mac IBM Dell Mac 7 13 Failed Not 15 5 6 14 Failed Not 16 4 5 15 X2(1) = 8.12, p <.003 r =  ² / N = .45 X2(1) = 13.79, p <.004 We need to know what “PC” means in the first study! What if “PCs” were IBM? What if PC were Dell? What if PC were something else?