Inferential Statistics

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and Linear Regression.
Review ? ? ? I am examining differences in the mean between groups
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Correlation Chapter 9.
Multiple Regression [ Cross-Sectional Data ]
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Correlation and Regression Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Correlations and T-tests
SIMPLE LINEAR REGRESSION
Data Analysis Statistics. Inferential statistics.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Review Regression and Pearson’s R SPSS Demo
Relationships Among Variables
Smith/Davis (c) 2005 Prentice Hall Chapter Eight Correlation and Prediction PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
Regression and Correlation
Inferential Statistics
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Lecture 16 Correlation and Coefficient of Correlation
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Covariance and correlation
Correlation and Regression
CHAPTER NINE Correlational Research Designs. Copyright © Houghton Mifflin Company. All rights reserved.Chapter 9 | 2 Study Questions What are correlational.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.
Examining Relationships in Quantitative Research
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Correlation. Correlation Analysis Correlations tell us to the degree that two variables are similar or associated with each other. It is a measure of.
Examples of Presentations  The following are examples of presentations of regression tables and their interpretations.  These interpretations target.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Introduction to Regression Analysis
Inferential Statistics
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Inferential Statistics

Inferential Statistics With inferential statistics you can do the following: Determine probability of characteristics of population based on the characteristics of your sample. You know what the relationship “looks like” in the sample. You have the actual numbers. Inferential statistics tells you the probability that the same relationship exists in the population. Assess the strength of the relationship between independent and dependent variables. Inferential statistics allows you to determine if the relationship is statistically significant and helps you decide if it is substantially significant (e.g., is the relationship strong enough to matter).

Inferential Statistics Inferential statistics are used to test hypotheses Research hypotheses state there is a relationship between two or more variables in your sample. Initially you do NOT assume there is a relationship in the population (i.e., null hypothesis claims there is NO relationship). You compute a statistic that indicates the probability that we can reject the null hypothesis and support the research hypothesis (e.g., the same relationship we see in our sample also exists in the population). If it is 95% probable that the sample relationship exists in the population, we say it is significant at the .05 level (5% chance of making an error, or we say there is also a relationship in the population from which this sample was drawn).

How Do We Apply What We Learn From Inferential Statistics? BEFORE you use any Intervention, you should determine if there is evidence that it works. For instance, what is the probability that fertilizer will increase crop yield for farmers? BEFORE you work with any group, you may want to know the characteristics of that group. For instance, what proportion of abused women are eventually killed by their partner? What is the probability that “risk” increases after they leave the partner? BEFORE you make recommendations, you want to understand the probabilities of success. What is the probability that those who participate in 4-H will graduate from college? Are they significantly more likely to graduate than those who do not participate?

Inferential Statistics Can Answer the Following Questions Is the Intervention I am currently using worth my time? Does it work with 5% or 95% of program participants? Are participants more likely than general population to reach goals? What factors are most important when attempting to increase effectiveness of intervention? If I use a second intervention, does it increase success by 10%, 15%, or 60%? Are characteristics of program participants important? Policy Implications - Is it worth the tax payer’s dollars? Is this a Spurious Relationship? Is there a difference between groups receiving intervention and those not receiving it Is it large enough to warrant use of limited resources? (substantial significance) Is it large enough to argue that this intervention “works” in different settings/situations?

Example of Applying Information Learned From Inferential Statistics You implement an “on-line” program to improve communication amongst young married couples. Using measurement of long term objectives, is it successful? What percent of my respondents stayed married? How much lower is the divorce rate for those who participated than for general population? What factors are most important? Does required payment increase success? Do older/younger couples respond more effectively to this counseling? Policy Implications - Is it worth Tax Payer’s Dollars? Is this relationship spurious (i.e., proactive individuals are more likely to seek intervention and also more likely to have good marriages)? Is the intervention “cost effective”? Is difference big enough to matter? Is there evidence that this program could work in other settings/situations or with other couples?

Correlation and Regression

Stating Hypothesis for Regression Null Hypothesis There is not relationship between any of the independent variables and the dependent variable Technically, all of the slopes are zero Research Hypothesis There is a relationship between at least one of the independent variables and the dependent variable Technically, at least one of the slopes are zero This relationship could be positive or negative

Inferential Statistics First consider bi-variate statistics Pearson Correlation When is it used? When you have a continuous independent variable and a continuous dependent variable. How do you interpret it? When the probability associated with the ___ statistics is .05 or less then you can assume there is a relationship between the dependent and independent variable For instance you may want to know if the number of hours participants spend in your program is positively related to their scores on school exams * NOTE The Pearson Correlation and Bi-variate Regression are very similar

Inferential Statistics First consider bi-variate statistics Bi-variate Regression When is it used? When you have a continuous independent variable and a continuous dependent (outcome) variable For instance, you may want to know if the number of hours participants spend in your program is positively related to their scores on school exams How do you interpret it? When the probability associated with the F-statistic is .05 or less then you can assume there is a relationship between the dependent and the independent variable NOTE The Pearson Correlation and Bi-variate Regression are very similar

Pearson Correlation Consists of a continuous independent and a continuous dependent variable (i.e., X and Y) A Pearson correlation coefficient is used to estimate the strength of the relationship between X and Y in the population A Pearson correlation coefficient ranges from -1 to +1 The closer to -1 or +1 it is, the stronger the relationship between X and Y, and the lower the probability that we would make a mistake if we claimed there is a relationship between X and Y in the population A scatter plot can give a visual representation of the relationship between X and Y A scatter plot shows all of the data points/plots and their relationship, using an X and Y axis On the following slide, respondents’ scores on BETA and SAT were plotted so that there is one data point for someone who scored 1000 on the SAT and 12 on the BETA.

Bi-variate scatterplot showing a strong positive relationship If all of the data points were on the regression line, then the correlation coefficient would be 1. This would indicate that if we know a person’s score on the SAT we can predict their score on the BETA 100% of the time.

Bi-variate scatterplot showing a strong negative/inverse relationship . The regression line or slope indicates where the data points would be if you could predict Y after knowing X 100% of the time. It is the “predicted” Y.

Correlation Matrix The following slide contains a computer generated correlation matrix. A correlation matrix can provide the following information: Strength of the relationship between any two of the variables The probability that you would make a mistake if you claimed any two variables are related in the population At the top of the correlation matrix, the following information is reported: The mean of each continuous variable The sample size The standard deviation of each continuous variable The range of scores for each continuous variable

Pearson Correlation Coefficient What does it tell us about the strength of the relationship between X and Y? Strength of Relationship r value R2 values Perfect 1.0 1.0 Strong .8 .64 Moderate .5 .25 Weak .2 .04 No Relationship 0 0 Weak - .2 - .04 Moderate - .5 - .25 Strong - .8 - .64 Perfect -1.0 -1.0 Strength of relationship (r or R2) The closer to 1 or -1 the R and R2 are, the stronger the relationship. . Significance The stronger the relationship the more likely it is significant.

Correlation Matrix SR90 = number of men per every 100 women TPOV90 = % of people living in poverty FHH = % of female headed households EMPMAL = % of males employed EMPFEM90 = % of females employer

The first number in the matrix (marked by the maroon textbox) is the correlation coefficient. It indicates the strength of the relationship. The second number is the probability. It must be .05 or less if you are to generalize to the population. There may be a third number in the matrix. It would indicate the sample size.

Bi-Variate Regression This is a bi-variate regression printout. It focuses specifically on the relationship between two of the variables (e.g. FHH90 and TPOV90) reported in the matrix on the previous slide. Note that the standardized estimate is the same as the correlation coefficient. If you square this number (.51215714) you would get .2623 (the R square). If you square the standardized estimate you always get the R-square. It is the percent increase in you ability to predict Y if you know X. In this example, your ability to predict the poverty rate (TPOV90) in a city increases by 26% if you know the percent of female headed households in that city.. The Prob>F is .0001 This indicates this relationship is significant. We are more than 99% sure that this relationship exists in the population.

This is a SAS printout of a Pearson Correlation Matrix This is a SAS printout of a Pearson Correlation Matrix. This matrix reports the relationship between 3 continuous variables (i.e., GPR, grade in school and number of times the student has skipped class).

Types of Regression Bi-Variate Regression Continuous dependent variable Continuous independent variable Relationship between the two can be negative or inverse, positive, linear or curvilinear. Multiple Regression You use multiple independent variables to predict a continuous dependent variable For instance, you could use number of hours participating in the program, score on attitude index and age to predict success in school (i.e., GPR) A variation – you use one or more continuous independent variables and one categorical variable to predict a dependent variable—The categorical variable can have only two categories (i.e., male or female) For instance, you would use gender, number of hours participating n the program, score on attitude index and age to predict success in school (i.e., GPR)

Regression – continued Logit Regression Categorical dependent variable (two categories) You use continuous independent variables to predict the probability of falling into one category or another For instance, how does number of hours studying for exams, age, and number of classes skipped, influence the probability that a student will graduate from high school? Graduation is measured simply as did or did not graduate

Regression Printout This is a copy of a multiple regression printout and includes a brief explanation of the numbers reported.

Interpreting a regression printout Pr > F is <.000l indicating we can reject the null hypothesis and at least one independent variable is significantly related. R-Square is .5187 indicating that our ability to predict posself (self-esteem score) increases by 51% if we know the value of all of the independent variables. Pr . is less than .0001 for grades (abc) and how much like they other students (likestu) indicating that these are the independent variables that are related to posself (self-esteem score).

Interpreting Parameter Estimates Parameter estimates are how much Y changes for every one unit change in X. From printout on previous slide, we see that for every grade increase (i.e., from C to B) then Posself (self esteem score) increases by .69 or 2/3 of a point. If the independent variable is a dummy variable then we interpret it slightly differently. It is how much Y changes when we go from one category to the other. From printout on previous slide we see that as we go from the category of male (coded 0) to female (coded 1) then Posself decreases by .21688.

Interpreting Parameter Estimates Caution – When you interpret a parameter estimate, you must consider how you measured the X variables. If your parameter estimate is 1 and you measured money in $10,000 increments, then for every $10,000 you spend on your child, their SAT score would increase by 1. If your parameter estimate is 1 and you measure money in dollars, then for every dollar, their SAT score would increase by 1. $10,000 would result in an increase of 10,000 on their SAT.

Predicting Y Why and when do we predict Y? Explain results to a nonacademic audience Explain regression in an interesting way How do we predict? First generate regression printout THEN use the prediction formula – Y=a+b1x1+b2x2+b3x3+…….Where: Y=the intercept or constant b=slope or parameter estimate which tells you the change in Y for every one unit change in X X=value of each independent variable that you select using the codebook Y=the predicted value of your dependent variable as predicted by the combination of X variables

Now You can do it

Contact Information Dr. Carol Albrecht USU Extension Assessment Specialist carol.albrecht@usu.edu 979-777-2421