Linear Regression and Correlation. Fitted Regression Line.

Slides:



Advertisements
Similar presentations
Multiple Analysis of Variance – MANOVA
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Hypothesis Testing Steps in Hypothesis Testing:
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
Correlation and regression
Objectives (BPS chapter 24)
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 10 Simple Regression.
Final Review Session.
Chapter Topics Types of Regression Models
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Ch. 14: The Multiple Regression Model building
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 13 Multiple Regression
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Adjusted from slides attributed to Andrew Ainsworth
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Methods and Applications CHAPTER 15 ANOVA : Testing for Differences among Many Samples, and Much.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Handout Twelve: Design & Analysis of Covariance
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Differences Among Groups
BPS - 5th Ed. Chapter 231 Inference for Regression.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
More Multiple Regression
Chapter 14 Introduction to Multiple Regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
CHAPTER 12 More About Regression
12 Inferential Analysis.
Kin 304 Inferential Statistics
CHAPTER 29: Multiple Regression*
More Multiple Regression
More Multiple Regression
12 Inferential Analysis.
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Presentation transcript:

Linear Regression and Correlation

Fitted Regression Line

Equation of the Regression Line Least squares regression line of Y on X

Regression Calculations

Plotting the regression line

Residuals Using the fitted line, it is possible to obtain an estimate of the y coordinate The “errror” in the fit we term the “residual error”

Residual

Residual Standard Deviation

Residuals from example

Other ways to evaluate residuals Lag plots, plot residuals vs. time delay of residuals…looks for temporal structure. Look for skew in residuals Kurtosis in residuals – error not distributed “normally”.

Model Residuals: constrained Pairwise model Independent model Model Residuals: freely moving Pairwise model Independent model Pairwise model Independent model Pairwise model Independent model

Parametric Interpretation of regression: linear models Conditional Populations and Conditional Distributions  A conditional population of Y values associated with a fixed, or given, value of X.  A conditional distribution is the distribution of values within the conditional population above Population mean Y value for a given X Population SD of Y value for a given X

The linear model Assumptions:  Linearity  Constant standard deviation

Statistical inference concerning You can make statistical inference on model parameters themselves estimates

Standard error of slope 95% Confidence interval for where

Hypothesis testing: is the slope significantly different from zero? = 0 Using the test statistic: df=n-2

Coefficient of Determination r 2, or Coefficient of determination: how much of the variance in data is accounted for by the linear model.

Line “captures” most of the data variance.

Correlation Coefficient R is symmetrical under exchange of x and y.

What’s this? It adjusts R to compensate for the fact That adding even uncorrelated variables to the regression improves R

Statistical inference on correlations Like the slope, one can define a t-statistic for correlation coefficients:

Consider the following some “Spike Triggered Averages”:

STA example R 2 =0.25. Is this correlation significant? N=446, t = 0.25*(sqrt(445/( ^2))) = 5.45

When is Linear Regression Inadequate? Curvilinearity Outliers Influential points

Curvilinearity

Outliers Can reduce correlations and unduly influence the regression line You can “throw out” some clear outliers A variety of tests to use. Example? Grubb’s test Look up critical Z value in a table Is your z value larger? Difference is significant and data can be discarded.

Influential points Points that have a lot of influence on regressed model Not really an outlier, as residual is small.

Conditions for inference Design conditions  Random subsampling model: for each x observed, y is viewed as randomly chosen from distribution of Y values for that X  Bivariate random sampling: each observed (x,y) pair must be independent of the others. Experimental structure must not include pairing, blocking, or an internal hierarchy. Conditions on parameters is not a function of X Conditions concerning population distributions  Same SD for all levels of X  Independent Observatinos  Normal distribution of Y for each fixed X  Random Samples

Error Bars on Coefficients of Model

MANOVA and ANCOVA

MANOVA Multiple Analysis of Variance Developed as a theoretical construct by S.S. Wilks in 1932  Key to assessing differences in groups across multiple metric dependent variables, based on a set of categorical (non-metric) variables acting as independent variables.

MANOVA vs ANOVA ANOVA Y 1 = X 1 + X 2 + X X n (metric DV) (non-metric IV’s) MANOVA Y 1 + Y Y n = X 1 + X 2 + X X n (metric DV’s) (non-metric IV’s)

ANOVA Refresher SSDfMSF Between SS(B) k-1 Within SS(W) N-k Total SS(W)+SS(B) N-1 Reject the null hypothesis if test statistic is greater than critical F value with k-1 Numerator and N-k denominator degrees of freedom. If you reject the null, At least one of the means in the groups are different

MANOVA Guidelines Assumptions the same as ANOVA Additional condition of multivariate normality  all variables and all combinations of the variables are normally distributed Assumes equal covariance matrices (standard deviations between variables should be similar)

Example The first group receives technical dietary information interactively from an on-line website. Group 2 receives the same information in from a nurse practitioner, while group 3 receives the information from a video tape made by the same nurse practitioner. User rates based on usefulness, difficulty and importance of instruction Note: three indexing independent variables and three metric dependent variables

Hypotheses H0: There is no difference between treatment group (online learners) from oral learners and visual learners. HA: There is a difference.

Order of operations

MANOVA Output 2 Individual ANOVAs not significant

MANOVA output Overall multivariate effect is signficant

Post hoc tests to find the culprit

Post hoc tests to find the culprit!

Once more, with feeling: ANCOVA Analysis of covariance Hybrid of regression analysis and ANOVA style methods Suppose you have pre-existing effect differences between subjects Suppose two experimental conditions, A and B, you could test half your subjects with AB (A then B) and the other half BA using a repeated measures design

Why use? Suppose there exists a particular variable that *explains* some of what’s going on in the dependent variable in an ANOVA style experiment. Removing the effects of that variable can help you determine if categorical difference is “real” or simply depends on this variable. In a repeated measures design, suppose the following situation: sequencing effects, where performing A first impacts outcomes in B.  Example: A and B represent different learning methodologies. ANCOVA can compensate for systematic biases among samples (if sorting produces unintentional correlations in the data).

Example

Results

Second Example How does the amount spent on groceries, and the amount one intends to spend depend on a subjects sex? H0: no dependence Two analyses:  MANOVA to look at the dependence  ANCOVA to determine if the root of there is significant covariance between intended spending and actual spending

MANOVA

Results

ANCOVA

ANCOVA Results So if you remove the amount the subjects intend to spend from the equation, No significant difference between spending. Spending difference not a result Of “impulse buys”, it seems.

Principal Component Analysis Say you have time series data, characterized by multiple channels or trials. Are there a set of factors underlying the data that explain it (is there a simpler exlplanation for observed behavior)? In other words, can you infer the quantities that are supplying variance to the observed data, rather than testing *whether* known factors supply the variance.

Example: 8 channels of recorded EMG activity

PCA works by “rotating” the data (considering a time series as a spatial vector) to a “position” in the abstract space that minimizes covariance. Don’t worry about what this means.

Note how a single component explains almost all of the variance in the 8 EMGs Recorded. Next step would be to correlate these components with some other parameter in the experiment.

Largest PC Neural firing rates

Some additional uses: Say you have a very large data set, but believe there are some common features uniting that data set Use a PCA type analysis to identify those common features. Retain only the most important components to describe “reduced” data set.