Multiple regression, ANCOVA, General Linear Models

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Hypothesis Testing Steps in Hypothesis Testing:
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Ch11 Curve Fitting Dr. Deshi Ye
Generalized Linear Models (GLM)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Variance and covariance M contains the mean Sums of squares General additive models.
Chapter 10 Simple Regression.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Intro to Statistics for the Behavioral Sciences PSYC 1900
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Chapter 11 Multiple Regression.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Today Concepts underlying inferential statistics
Simple Linear Regression Analysis
Relationships Among Variables
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Simple Linear Regression
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Regression and correlation Dependence of two quantitative variables.
Environmental Modeling Basic Testing Methods - Statistics III.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Canadian Bioinformatics Workshops
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do.
Stats Methods at IC Lecture 3: Regression.
The simple linear regression model and parameter estimation
Chapter 12 Simple Linear Regression and Correlation
Analysis of variance ANOVA.
Chapter 9 Multiple Linear Regression
Non-linear relationships
Virtual COMSATS Inferential Statistics Lecture-26
Kin 304 Regression Linear Regression Least Sum of Squares
Comparing Three or More Means
BPK 304W Regression Linear Regression Least Sum of Squares
12 Inferential Analysis.
Simple Linear Regression
BPK 304W Correlation.
Multiple Regression – Part II
CHAPTER 29: Multiple Regression*
Comparing Several Means: ANOVA
Review of Hypothesis Testing
Welcome to the class! set.seed(843) df <- tibble::data_frame(
Statistics review Basic concepts: Variability measures Distributions
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
Multiple Regression Models
Hypothesis testing and Estimation
Association, correlation and regression in biomedical research
Simple Linear Regression
12 Inferential Analysis.
Simple Linear Regression
Simple Linear Regression
Product moment correlation
Inferential Statistics
Multivariate Linear Regression
Introduction to Regression
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
MGS 3100 Business Analysis Regression Feb 18, 2016
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Multiple regression, ANCOVA, General Linear Models

Multiple regression

I have more predictors than one In manipulative experiment – amount of water and dose of nutrients as independent variables for biomass of plant raised In observation study – species richness is explained by latitude, altitude and annual rainfall.

In ideal case, predictors shouldn’t be correlated with each other This can be ensured in an experiment But hardly in observational study (e.g., it would be difficult to find a locations ina way that latitude and precipitation would be independent)

Model The same assumptions as in simple linear regression – i.e. random variability is additive and independent of the expected value (i.e. homogeneity of variances), relation is linear. More over - effects of individual independent variables are additive.

For two predictors is representation a plain in three-dimensional space [ozone] Temperature Wind velocity

Numbers of procedures are analogue to simple regression coefficients α and βi (for each of predictors) mean value for the population, [which is unknown], we estimate using a sample coefficients a and bi. βi (for population), or bi. for sample - slope (dependent on units used) Criterion of least squares of residual sum of squares. Tests - either ANOVA of the whole model, or (using t-tests) tests of individual regression coefficients

In contrast to single regression, meaning of tests differs ANOVA of the whole model: H0: Response is independent of all the predictors, i.e. βi=0 for all i Separate null hypothesis for individual predictors βi=0 – relating to individual variables.

Range of predictor values can differ considerably Range of predictor values can differ considerably. and slope values are dependent on units used. Water Nutrients P.High

ANOVA of whole model Analysis of sum of squares SSTOT = SSRegress. + SSResidual DFTOT = n-1 ; DFRegress=number of variables, DFResid=n-1-number of variables Classically MS=SS/DF = is estimation of population variance, if H0 is true – this all leads to classic F-distribution.

R2 - coefficient of determination Percentage of variability explained by model R2adj. = adjusted – different corrections; having many independent variables and relatively few observations, then R2 is higher in our sample than in the population. Number of observations should be considerably higher than no. of predictors. When number of observations = number of predictors + 1, then the model perfectly fits all points, (but predictive ability of the model is null).

Partial regression coefficients How much explains given variable in addition to all other variables in the model (“in addition” is especially important to say, if predictors are correlated)

Tests of partial regression coefficients Beta in Statistica program – it is something different than “our” β - (on principle, it cannot be computed from finite sample). It is standardized partial regression coefficient (computed after Z transformation of all the variables (both predictors and response) Regression plain goes through the origin thereafter

Tests of partial regress coefficients Beta – (i.e. standardized r.c.) indicates relative size of the effect of predictor (with regard to used range of predictors’ values), it is independent of units used B - (is b in “our” model) is used for construction of function Y=a + biXi – and thus depends on measured units. “Translates” change in predictor into change in the response

Tests of partial regress coefficients Beta – how much (standardized) repsponse will change with change of predictor by proportional part of its variability B – how much response will change [in its units] with change of predictor by its one unit.

Tests of partial regression coefficients We use for testing t=B/s.e.(B)=Beta/s.e.(Beta) Standard error depends on predictors’ correlation considerably! Test for Intercept is usually very uninteresting again Attention, results of ANOVA and partial coefficient tests haven’t to correspond to each other!

Marginal and partial effects

It is not always advantage to have a many predictors There are several methods, how to simplify our model (used usually in observational studies) It is better to use your head first and don’t put everything to program just because it came from automatic analyzer. Stepwise selection of predictors - stepwise selection Forward, Backward, etc. Criteria weighting independent character and “penalizing” Complexity. (AIC) “Jack-knife” and similar methods

Mind the variables on circular scale used as predictors We can hardly get linear response to 1. Orientation of inclination (or anything) measured e.g. in degrees or radians 2. “Julian day” 3. Hours of a day Various solutions (e.g. Nordness and Esterness for orientation)

General Linear Models

We have had ANOVA model: Xij = μ+αi + εij Eventually for more categorical variables We can compute average as ΣX/n , but it can be computed using method of least residual sum of squares Regression: Generally: Y = deterministic part of model + ε As deterministic part combination of categorical and quantitative predictors - single effects are additive; it is then General Linear Model (mind shortcut GLM)

Examples Number of species in community ~ rock [categ], type of land management [categ], altitude [quant] Level of cholesterol ~ sex [categ], age [qant], amount of flitch consumed [qant] Level of heterozygosity ~ ploidy [categ - probably], population size [qant]

Various formulations of models enable to test if two regression lines are the same They aren’t the same, but have the same inclination Have even different inclination (then interaction of quantitative variable and factor is significant = categ. variables) And a lot of similar questions

ANCOVA (analysis of covariance) Probably the most common of general linear models We suppose, that lines are parallel to each other Most often we want to filter out some “disturbing” effect – should lead to lower error variability

Example Example – I compare weight of members of sport club and of beer club. As weight is dependent on body height (which is trivial), I will have quite big variability in both groups I will use height as a covariate In principle, I test, if lines of weigh dependence on high are the same or shifted and I assume they have the same inclination

Example Example – experiment with rats – I have a suspicion that the result will depend on their weight – but it is impossible to have all rats with the same weight I use rat weight in the beginning of experiment as covariate I will try my best at the same time to have rats of the same weight in all groups (that variables [predictors] of rat weight and “experimental group” would be independent)

How can I decide, as I can use variable as quantitative and when as categorical one The less degrees of freedom the model “takes”, the more powerful is the test The more degrees of freedom the model “takes”, the better “fit” And what now...

Fertilization, 0, 70 and 140 kg N/ha, effect on crop yield Two possible models: Regression: Yield = a + b*dose of fertilizer + error [it assumes linear increase of yield with the dose, “takes” one degree of freedom] Anova: Yield = grand mean + specific effect of potion + error [it doesn’t presume linear relation, we use two degrees of freedom] If assumption of linearity is true, regression test will be more powerful [but both of them are alright], but if it false, regression will be quite absurd