Introduction to Inferential Statistics

Slides:



Advertisements
Similar presentations
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Advertisements

Logistic Regression and Odds Ratios
Data Analysis Statistics. Inferential statistics.
Chi-square Test of Independence
Data Analysis Statistics. Inferential statistics.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.
Correlation, OLS (simple) regression, logistic regression, reading tables.
Correlation, OLS (simple) regression, logistic regression, reading tables.
Introduction to Inferential Statistics. Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: – Categorical variables:
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Difference Between Means Test (“t” statistic) Analysis of Variance (“F” statistic)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
SW318 Social Work Statistics Slide 1 Logistic Regression and Odds Ratios Example of Odds Ratio Using Relationship between Death Penalty and Race.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Introduction to Inferential Statistics
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Correlation, OLS (simple) regression, logistic regression, reading tables.
Difference Between Means Test (“t” statistic) Analysis of Variance (F statistic)
Chapter 13 Understanding research results: statistical inference.
Nonparametric Statistics
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Nonparametric Statistics
Review – What are the odds?
Final exam practice questions (answers at the end)
Hypothesis Testing.
INF397C Introduction to Research in Information Studies Spring, Day 12
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Inference and Tests of Hypotheses
SAMPLING Purposes Representativeness “Sampling error”
Chi-Square X2.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 16: Research with Categorical Data.
CATEGORICAL VARIABLES
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing Review
Chapter 25 Comparing Counts.
Data measurement, probability and statistical tests
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Social Research Methods
Inferential statistics,
Difference Between Means Test (“t” statistic)
Multiple logistic regression
Inferential Statistics
Nonparametric Statistics
Review: What influences confidence intervals?
Statistical Inference about Regression
Chapter 10 Analyzing the Association Between Categorical Variables
CATEGORICAL VARIABLES
Testing hypotheses Continuous variables.
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Data measurement, probability and statistical tests
Inferential Statistics
15.1 The Role of Statistics in the Research Process
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Testing hypotheses Continuous variables.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Inferential testing.
Presentation transcript:

Introduction to Inferential Statistics

Taking out the “loosey-goosey” So far we’ve assessed relationships between variables two ways: Categorical variables: tables and proportions (percentages) Continuous variables: scattergrams and simple correlation (r) Higher rank  more stress Higher income  fancier cars r = -.6 r2 = .36 Higher income  less crime

Inferential statistics Higher rank  less cynicism? Alas, results are often ambiguous Maybe the difference in cynicism between officers and supervisors is slight Maybe the student lot has better cars than we expected Maybe the r between age and weight is a small -.17 (r2 of .03) Can we still confirm or reject the hypotheses? Inferential statistics (see next slide) are an extension of procedures that we’ve already used Provide more precise assessments Tell us the chance - the probability that there is no relationship between variables This allows us to properly “infer” (project) our results to populations Cynicism Rank Low High Officers 75% 25% 100% Supervisors 81% 19% Higher income  nicer cars? DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 30% 20% 50% 100% HIGH (F/S lot) 40% 60% Age  weight? r = -.17 r2 = .03

Inferential statistics (a.k.a. “test statistics”) Independent and dependent variables are continuous Regression (r2 and R2) b statistic - interpreted as unit change in the DV for each unit change in the IV Independent variables are nominal or continuous; dependent variable is nominal Logistic regression, generates “b” and exp(b) (a.k.a. odds ratio) Independent and dependent variables are categorical Chi-Square (X2) Categorical dependent and continuous independent variables Difference between the means test (t statistic) Procedure Level of Measurement Statistic Interpretation Regression All variables continuous r2, R2 b Proportion of change in the dependent variable accounted for by change in the independent variable. Unit change in the dependent variable caused by a one-unit change in the independent variable Logistic regression DV nominal & dichotomous, IV’s nominal or continuous exp(B) (odds ratio) Don’t try - it’s on a logarithmic scale Odds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Chi-Square All variables categorical (nominal or ordinal) X2 Reflects difference between Observed and Expected frequencies. Difference between means IV dichotomous, DV continuous t Reflects magnitude of difference.

variance - the “real” relationship General procedure Types of hypotheses Working hypothesis – what a regular hypothesis is called Null hypothesis – its opposite: the presumption that any apparent relationship between variables is caused by chance. Draw one or more samples and code the independent and dependent variables Use a test statistic to assess the working hypothesis The computer calculates a coefficient for the test statistic (e.g., r2 = .20) These coefficients are the sum of two components “Systematic” variance: The actual, “systematic” relationship between variables “Error” variance: An apparent relationship, caused by sampling error. It shrinks as sample size increases. Systematic variance - the “real” relationship The big question Once we remove the error component, is enough “real” relationship left to reject the null hypothesis? Error variance

Test statistics and the null hypothesis To reject the null hypothesis, the test statistic coefficient (e.g., r2 = .20) must be sufficiently large, after subtracting sampling error, to reject the null hypothesis How much “room” is required? Enough to yield a probability of less than five in one- hundred (< .05) that a relationship between variables was produced by chance. If so, the relationship between variables is deemed “statistically significant” and the null hypothesis of no relationship is rejected as FALSE. For significant relationships, one to three asterisks usually appear next to the test statistic coefficient (e.g., .25*, .36**, .41***). More asterisks = greater confidence that a relationship is “systematic” – meaning not caused by chance. * Probability less than 5 in 100 that a coefficient was produced by chance (p< .05) ** Probability less than 1 in 100 that a coefficient was produced by chance (p< .01) *** Probability less than 1 in 1,000 that a coefficient was produced by chance (p< .001) If the coefficient is too small, no asterisk (*) is awarded. The relationship is non- significant. The working hypothesis is rejected and the null hypothesis is TRUE. Instead of asterisks, sometimes the actual probability that a coefficient was produced by chance is given, often in a column labeled “p”. Significant relationships are denoted by p’s less than .05 Good Better Best

How do we know if the null hypothesis is true? Null hypothesis: No relationship between variables. Any apparent effect was produced by chance. You can use the table for your test statistic to check if the coefficient is sufficiently large to reject the null To reject the null, the test statistic (e.g., R2, t, b, X2) must be so large that the probability the null is true is less than five in one-hundred (< .05) In your articles, look for asterisks in the tables that depict relationships between variables. If there is no asterisk in the column for the test statistic (here it’s a b) the null for that relationship is true. Remember - usually one asterisk (*) means the probability the null is true is less than 5 in 100 (p <.05). Two asterisks (**) is better (p <.01, probability the null is true is less than one in 100). Three (***) is great (p <.001, probability the null is true is less than one in 1,000.) Null hypothesis is true Reject null hypothesis Chi-Square table Dependent variable: satisfaction with police

A caution on hypothesis testing… Probabilities (that the null hypothesis is true) are the most common way to evaluate relationships. The smaller the probability, the more likely that the null hypothesis (meaning, no relationship) is false, meaning that the greater the likelihood that the working hypothesis is true But this process has been criticized for suggesting misleading results. (Click here for a summary of the arguments.) We normally use p values to accept or reject null hypotheses. Its real meaning is subtle: Formally, a p <.05 means that, if an association between variables was tested an infinite number of times, a coefficient as large as the one actually obtained (say, an r2 of .30) would come up less than five times in a hundred if the null hypothesis of no relationship was actually true. For our purposes, as long as we keep in mind the inherent sloppiness of social science, and the difficulties of accurately quantifying social science phenomena, it’s sufficient to use p-values to accept or reject null hypotheses. We should always be skeptical of findings of “significance,” particularly when very large samples are involved. When sample size is large - say, a thousand - even weak relationships can show up as statistically significant. (More on this later.)

Examples of tables from articles, panels 1-6

1 Hypothesis: Alcohol consumption  Victimization Method: Logistic regression Statistics: b and Odds Ratio (Exp b) Richard B. Felson and Keri B. Burchfield, “Alcohol and the Risk of Physical and Sexual Assault Victimization,” Criminology (42:4, 2004)

2 Hypothesis: Race and class  Satisfaction with police Method: Logistic regression Statistics: b and Exp b (odds ratio) Yuning Wu, Ivan Y. Sun and Ruth A. Triplett, “Race, Class or Neighborhood Context: Which Matters More in Measuring Satisfaction With Police?,” Justice Quarterly (26:1, 2009)

3 Hypothesis: Low self control  More contact with police Method: Logistic regression Statistics: b and Exp b (odds ratio) Kevin M. Beaver, Matt DeLisi, Daniel P. Mears and Eric Stewart, “Low Self-Control and Contact with the Criminal Justice System in a Nationally Representative Sample of Males,” Justice Quarterly (26:4, 2009)

4 Hypothesis: Gender and race of victim  Imposition of death sentence Method: Logistic regression Statistics: b (“coefficient”) and odds-ratio (Exp b) Marian R. Williams, Stephen Demuth and Jefferson E. Holcomb, “Understanding the Influence of Victim Gender in Death Penalty Cases: The Importance of Victim Race, Sex-Related Victimization, and Jury Decision Making,” Criminology (45:4, 2007)

5 Hypothesis: Strains of imprisonment  Recidivism Method: Logistic regression Statistics: B and Exp B (odds-ratio) Shelley Johnson Listwan, Christopher J. Sullivan, Robert Agnew, Francis T. Cullen and Mark Colvin, “The Pains of Imprisonment Revisited: The Impact of Strain on Inmate Recidivism,” Justice Quarterly (30:1, 2013)

6 Hypothesis: Officer and driver race  Vehicle search Method: Logistic regression Statistics: Odds ratio (Exp B) (Stand. Error in parentheses) Jeff Rojek, Richard Rosenfeld and Scott Decker, “Policing Race: The Racial Stratification of Searches in Police Traffic Stops,” Criminology (50:4, 2012