The Right Questions about Statistics: How regression works Maths Learning Centre The University of Adelaide Regression is a method designed to create a.

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS Frequency Distribution # Indivi duals Median Mean MedianMean Median Figure 2.Frequency distributions of three different samples. ABC.
Advertisements

Simple Linear Regression Analysis
Multiple Regression and Model Building
Lesson 10: Linear Regression and Correlation
Objectives 10.1 Simple linear regression
C 3.7 Use the data in MEAP93.RAW to answer this question
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 3 Bivariate Data
Objectives (BPS chapter 24)
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Chapter 10 Simple Regression.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 12 Simple Regression
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
The Basics of Regression continued
Simple Linear Regression Analysis
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Simple Linear Regression Analysis
Objectives of Multiple Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Introduction to Linear Regression
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Part IV Significantly Different: Using Inferential Statistics
Managerial Economics Demand Estimation & Forecasting.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Chapter 8: Simple Linear Regression Yang Zhenlin.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
EXCEL DECISION MAKING TOOLS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
ANOVA, Regression and Multiple Regression March
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
Regression Chapter 5 January 24 – Part II.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Lecture Slides Elementary Statistics Twelfth Edition
CHAPTER 12 More About Regression
Chapter 13 Simple Linear Regression
Chapter 4: Basic Estimation Techniques
Chapter 14 Introduction to Multiple Regression
Regression Analysis AGEC 784.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
REGRESSION (R2).
How regression works The Right Questions about Statistics:
Basic Estimation Techniques
How confidence intervals work
Inference for Regression
Multiple Regression.
STAT 250 Dr. Kari Lock Morgan
CHAPTER 12 More About Regression
Chapter 11 Simple Regression
Regression and Residual Plots
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 14 Inference for Regression
CHAPTER 12 More About Regression
3 basic analytical tasks in bivariate (or multivariate) analyses:
Presentation transcript:

The Right Questions about Statistics: How regression works Maths Learning Centre The University of Adelaide Regression is a method designed to create a FORMULA that uses some information to PREDICT/EXPLAIN an outcome, using DATA. You calculate the formula that most closely matches your data.

Regression begins with a research question about a numerical outcome you’re interested in exactly how one or more things affect that outcome. Variable NUMERICAL Variable NUMERICAL Variable NUMERICAL “RESPONSE VARIABLE” “DEPENDENT VARIABLE” “OUTCOME VARIABLE” “INDEPENDENT VARIABLES” “EXPLANATORY VARIABLES” “CRITERION VARIABLE” “PREDICTOR VARIABLES”

For example, you might be interested in how a person’s body temperature is affected by the number of grams of chilli in a meal. Chilli (g) NUMERICAL Temp (°C) NUMERICAL

Variable NUMERICAL Variable NUMERICAL Variable NUMERICAL X1X1 Y X2X2 What the regression will produce is a FORMULA that will let you calculate the outcome based on the explanatories (the formula is also called a MODEL). Y = β 0 + β 1 X 1 + β 2 X 2

For example, the formula might look like this: Chilli (g) NUMERICAL Temp (°C) NUMERICAL temp = (chilli) The process of regression finds the numbers in this formula so that it gives answers closest to the actual data.

What sort of relationship might be there? “SCATTERPLOT” The shape will tell you what sort of formula you ought to use. A “DESCRIBE” question!

LINEAR Y = β 0 + β 1 X

EXPONENTIAL Y = αe βX

LOGARITHMIC Y = β 0 + β 1 ln(X)

LINEAR The easiest one to work with is LINEAR regression because the formula is simplest Y = β 0 + β 1 X

With LINEAR relationships, to DESCRIBE how strong this relationship is, you can calculate the CORRELATION (r) r = -1r = 0r = 1 Ignores how steep the slope is – Just tells you how close to a line the points are.

The process so far... Have a “what’s the formula?” question. Look at the pattern – usually with a scatterplot – to help choose a formula. Variable NUMERICAL Variable NUMERICAL Variable NUMERICAL

The next step is to find the numbers in the formula itself. There’s some complicated-looking equations to figure out what these are, based on calculus and matrix algebra... Y = β 0 + β 1 X 1 + β 2 X 2 “INTERCEPT” “CONSTANT” “COEFFICIENTS” BUT the computer program will do all that for you. “SLOPES”

What the computer will do in Excel: Original data Regression output Formula numbers temp = (chilli)

What the formula means: temp = (chilli) How much temperature changes on average for a change of 1 g of chilli. 1g extra of chilli puts your temperature up by 0.45 degrees on average. Does getting this number in my data mean that chilli really does affect temperature?

temp = (chilli) If no relationship, then this number would be most likely to be zero. Is there really a relationship? A “DECIDE” question! Assuming some things, we can calculate a test statistic that comes from a t-distribution, and find a p-value. P-value = “SIGNIFICANT EFFECT”

Assuming some things, we can calculate a test statistic that comes from an F distribution, and find a p-value. Y = X 1 – 0.24X 2 If no relationship at all, then these numbers would be most likely to be zero. Is there really a relationship (for multiple regression)? P-value = “SIGNIFICANT RELATIONSHIP”

Y = X 1 – 0.24X 2 If no relationship with X 1, then this number would be most likely to be zero. Is there really a relationship with X 1 ? P-value = “SIGNIFICANT EFFECT”

Y = X 1 – 0.24X 2 If no relationship with X 2, then this number would be most likely to be zero. Is there really a relationship with X 2 ? P-value = 0.26 “NOT SIGNIFICANT EFFECT” At this stage you would normally remove the X 2 from the formula and do a new regression

temp = (chilli) We are asking what options for this number are consistent with our data. How big could the effect be? An “ESTIMATE” question! Assuming some things, we can calculate a confidence interval for this number. 95% CI is from 0.38 to 0.52

DISCLAIMER: There are a whole lot of things you need to check in order to make sure your regression is acceptable statistically (especially if you are using p-values or confidence intervals). I have not mentioned any of these today. You will need to look them up in a book like Medical Statistics at a Glance or Intro Stats.

So this is how you perform regression: Have a “what’s the formula?” question. Collect data. Look at the pattern to choose a formula. Get a computer to calculate the numbers and p-values. Check the p-values. Choose your final formula.

And this is what regression means: It tells you a formula for how to calculate an outcome based on other information. It does NOT tell you if some things CAUSE others, only how to calculate them as accurately as possible. The computer output will tell you p-values and confidence intervals to answer other types of questions.