Introduction to Regression Analysis

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Introduction to Regression Analysis
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Simple Linear Regression
The Simple Regression Model
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Lecture 15 Basics of Regression Analysis
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
Regression and Correlation Methods Judy Zhong Ph.D.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Correlation and Linear Regression
Understanding Multivariate Research Berry & Sanders.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Chapter 15 Correlation and Regression
1.6 Linear Regression & the Correlation Coefficient.
Examining Relationships in Quantitative Research
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Correlation & Regression Analysis
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Research Methods: 2 M.Sc. Physiotherapy/Podiatry/Pain Correlation and Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Regression and Correlation
Regression Analysis.
Lecture 10 Regression Analysis
REGRESSION G&W p
Difference between two groups Two-sample t test
Correlation and Simple Linear Regression
Inference and Tests of Hypotheses
Chapter 5 STATISTICS (PART 4).
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 14: Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Ass. Prof. Dr. Mogeeb Mosleh
Correlation and Simple Linear Regression
LEARNING OUTCOMES After studying this chapter, you should be able to
STATISTICS Topic 1 IB Biology Miss Werba.
Correlation and Regression
Simple Linear Regression and Correlation
Correlation and Regression
Statistics II: An Overview of Statistics
Product moment correlation
15.1 The Role of Statistics in the Research Process
Warsaw Summer School 2017, OSU Study Abroad Program
Regression Part II.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Introduction to Regression Analysis March 21-22, 2000

Review of correlation Correlation examines the relationship between two variables. The relationship can be expressed as a graph called a scattergram.

Correlation

Correlation The relationship between the two variables can also be expressed as a number called the correlation coefficient. The values of correlation coefficients range from -1 to +1. The most commonly used measure of correlation is the Pearson product-moment correlation coefficient, designated as “r”.

Correlation A positive correlation indicates that high values of one variable correspond with high values of the other variable and low values correspond with low values. When this occurs, the r value is a positive number (r=.66) A negative correlation indicates that when the value for one variable is high the value for the other variable will be low. When this occurs, the r value is a negative number (r=-.66)

Correlation A correlation coefficient of -1 or +1 represents a perfect correlation. r = -1 is a perfect negative relationship r = +1 is a perfect positive relationship A correlation of 0 (r = 0) means that no linear relationship exists between the two variables. The stronger the relationship between the two variables, the closer r is to 1.

Correlation r = 0.82 There is a strong positive relationship. As female literacy increases in a country, female life expectancy also increases.

Correlation R = - 0.84 There is a strong negative relationship. As female literacy in a country increases, infant mortality decreases.

Correlation r = 0.37 There is a moderate, positive relationship between birth rate and death rate for a country.

Correlation Web site: Guessing correlations

Review of t test The t test is an inferential statistic that compares the means of two different groups. It asks the question, “are the differences between the two groups statistically significant or did the differences occur randomly (by chance)?”

Review of t test Steps in hypothesis testing Rule of thumb for t formulate a research and null hypothesis select an alpha level calculate a test statistic (in this case “t”) compare observed to critical value Rule of thumb for t The critical value of t for a two-tailed test with  = .05 and n>30 is 2.04 If these criteria are met, a t value that exceeds 2 is statistically significant.

Review of t test In research articles, the t value is usually reported as being statistically significant or not by indicating p values. p <.05 means the t value is statistically significant at an alpha level of .05. In other words, you are 95% confident. Other commonly used levels of statistical significance include p <.01 p <.001

Review of t test By custom, * is also used to indicate statistical significance. p <.05 * p <.01 ** p <.001 ***

Review of t test Statistical significance essentially means “difference.” It also means that the difference is large enough to assume that it did not occur by chance. It does not mean that the finding, however, is important. That requires theory and other types of evidence.

Regression Analysis - Introduction Often, researchers are interested in examining the relationships between two or more variables. Contingency tables can do this when we have nominal or ordinal level data and not too many categories. When we have interval level data, we can use another statistical technique called regression analysis. Model testing

Regression Analysis - Introduction Regression analysis is one of the strongest statistical tools we have in the social sciences. It takes full advantage of the information we have in our data. Measure associations among variables infer population characteristics based on a sample argue, by statistically manipulating variables, causal links between variables forecast outcomes

Regression Analysis - Introduction Suppose we are looking at the relationship between two variables: weight of a package of apples and price. If the scale is accurate, it looks like this.

This line, or relationship, can be displayed as a mathematical formula also. Y = .79X Where Y is the price of the bag of apples, and X is the weight of apples. .79 is the price per pound There is a positive perfect relationship between the two variables. r= 1.0

Suppose the scale was a little off. Now the line might look like this. r= .96 Suppose the scale was a little off. Now the line might look like this. 12 10 8 6 4 2 WEIGHT 20 40 60 80 100 120 140 PRICE

Now there is not a perfect relationship between the two variables, so it can not be expressed as a single equation. We can, however, estimate a line that is the “best fit” of the data. This is one of the things that regression analysis does.

Bivariate Regression Analysis We will begin with a two-variable model. A regression model can best be expressed as an equation.

a = the constant or Y intercept b = the regression coefficient or slope Y = the predicted value of Y, the dependent variable X = the independent variable

a = the constant or Y intercept This is what Y would equal if X = 0. X represents the possible values of the independent variable.  = the slope of the line. The slope is the change in Y per unit change in X. As slope approaches 0, it indicates that there is no relationship.

Y (DV) is crime rate (per 100,000) X (IV) is unemployment (in %) For example, if  = 120, then the slope tells us that for every one percent increase in unemployment, the crime rate will go up 120 per 100,000 persons.

Regression gives us a line that is the “best fit” of the data Regression gives us a line that is the “best fit” of the data. Because the relationship between the variables is not perfect, all the points will not fall exactly on the line. OVERHEAD Therefore we refer to Y as the predicted value of Y also called Y hat. The difference between the actual Y and the predicted Y is referred to as the residual.

OVERHEAD Scatterplot and regression line of car repair costs and miles

Correlation coefficient One of the major uses of regression analysis is to estimate population characteristics. The regression line is the best estimator of the linear model. If any other line were placed on the scatterplot, the sum of the square residuals (distance between Y and predicted Y) would be larger.

Correlation coefficient A test of goodness of fit lets us determine how well our equation fits the data. One way to assess this is visually. A second way is the correlation coefficient. (Pearson’s r).

Correlation coefficient Another test is the coefficient of determination referred to as r2. r2 indicates the proportion of the variance in the dep var associated with or explained by the indep var(s). In our previous car repair costs example, the r2 = .87 indicating that 87% of the variance in car repair costs is explained by mileage.

Correlation coefficient When is r2 big enough? It depends.