Regression, Part B: Going a bit deeper Assaf Oron, May 2008

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Final Review Session.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Log-linear and logistic models
Chapter 11 Multiple Regression.
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation and Regression
Objectives of Multiple Regression
Chapter 13: Inference in Regression
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Linear Models Alan Lee Sample presentation for STATS 760.
Chapter 8: Simple Linear Regression Yang Zhenlin.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Stats Methods at IC Lecture 3: Regression.
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Linear Algebra Review.
Chapter 7. Classification and Prediction
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
6. Simple Regression and OLS Estimation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 12 Simple Linear Regression and Correlation
Statistical Data Analysis - Lecture /04/03
Linear Regression.
CHAPTER 7 Linear Correlation & Regression Methods
Multiple Regression Analysis and Model Building
Chapter 11 Simple Regression
Correlation and Regression
Stats Club Marnie Brennan
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
Linear Regression/Correlation
Chapter 12 Simple Linear Regression and Correlation
Hypothesis testing and Estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Simple Linear Regression
Charles University Charles University STAKAN III
Product moment correlation
Covariance and Correlation Assaf Oron, May 2008
Multiple Regression Berlin Chen
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Regression, Part B: Going a bit deeper Assaf Oron, May 2008 Stat 391 – Lecture 14 Regression, Part B: Going a bit deeper Assaf Oron, May 2008

Overview We introduced simple linear regression, and some responsible-use tips (dot your t’s and cross your i’s, etc.) Today, we go behind the scenes: Regression with binary X, and t-tests The statistical approach to regression Multiple regression Regression hypothesis tests and inference Regression with categorical X, and ANOVA Advanced model selection in regression Advanced regression alternatives

Binary X and t-tests It is convenient to introduce regression using continuous X But it can also be done when X is limited to a finite number of values, or even to non-numerical values We use the exact same formulae and framework When X is binary – that is, divides the data into two groups (e.g., “male” vs. “female”) - the regression is completely equivalent to the two-sample t-test (the version with the equal-variance assumption) The regression assigns x=0 to one group, x=1 to the other, so our “slope” becomes the difference between group means, and our “intercept” is the mean of the x=0 group Let’s see this in action:

Regression: the Statistical Approach Our treatment of regression thus far has been devoid of any probability assumptions All we saw was least-squares optimization, partition of sums-of-squares, some diagnostics, etc. But regression can be viewed via a probability model: The β’s are seen (in classical statistics) as fixed constant parameters, to be estimated. The x’s are fixed as well. The ε are random, and are different between different y’s They have expectation 0, and under standard regression are assumed i.i.d. normal

Regression: the Statistical Approach (2) The equation in the previous slide is a simple example for a probabilistic regression model Such models describe observations as a function of fixed explanatory variables (x) – known as covariates – plus random noise The linear-regression formula can also be written as a conditional probability:

Regression: the Statistical Approach (3) The probability framework allows us to use the tools of hypothesis testing, confidence intervals – and statistical estimation Under the i.i.d-normal-error assumption, the MLE’s for intercept and slope are identical to the least-squares solutions (this is because the log-likelihood is quadratic, so the MLE mechanics are equivalent to least-squares optimization) Hence the “hats” in the formula

Multiple Regression Often, our response y can potentially be explained by more than one covariate For example: earthquake ground movement at a specific location, is affected by both the magnitude and the distance from the epicenter (attenu dataset) It turns out that everything we did for a single x, can be done with p covariates, using analogous formulae Instead of finding the least-squares line in 2D, we find the least-squares hyperplane in p+1 dimensions We have to convert to matrix-vector terminology:

Multiple Regression (2) Responses: vector of length n Errors: i.i.d. vector of normal r.v.’s; length n Model Matrix: n rows, (p+1) columns Parameter vector quantifying the effects; length p+1 Where has the intercept term gone? It is merged into X the model matrix, as the first column: a column of 1’s (check it out) Each covariate takes up a subsequent column

Multiple Regression (3) This was the math; conceptually, what does multiple regression do? It calculates the “pure” effect of each covariate, while neutralizing the effect of the other covariates (We call this “adjusting for the other covariates”) If a covariate is NOT in the model, it is NOT neutralized – so it becomes a potential confounder Let’s see this in action on the attenu data:

Multiple Regression (4) …and now for the first time: we actually write out the solutions First for the parameters: And these are the fitted values for y: Note the matrix transpose and inverse operators All this is a function of X, and can be written as a single matrix a.k.a. “the Hat Matrix” (why?)

Regression Inference Why did we bother with these matrix formulae? To show you that both parameter estimates and fitted values are linear combinations of the original observations Each individual estimate or fitted value can be written as a weighted sum of the y’s, Useful fact: linear combinations of normal r.v.’s are also normal r.v.’s So if our model assumptions hold, the beta-hats and y-hats are all normal (how convenient)

Regression Inference (2) What if the observation errors are not normal? Well, recall that since a sum is just the mean multiplied by n, its shape also becomes “approximately normal” as n increases, due to the CLT This holds for weighted-sums as well, under fairly general assumptions At bottom line, if the errors are “reasonably well-behaved” and n is large enough, our estimates are still as good as normal

Regression Inference (3) So individual beta-hat or y-hat can be assumed (approximately) normal, with variance The only missing piece is to estimate σ2, the variance of the observation errors …And σ2 is easily estimated using the residuals But since we estimate variance from the data, all our inference is based on the t-distribution We lose a degree of freedom for each parameter including the intercept, to end up with n-p-1

Back to that Printout again… Call: lm(formula = y1 ~ x1, data = anscombe) Residuals: Min 1Q Median 3Q Max -1.92127 -0.45577 -0.04136 0.70941 1.83882 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.0001 1.1247 2.667 0.02573 * x1 0.5001 0.1179 4.241 0.00217 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.237 on 9 degrees of freedom Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295 F-statistic: 17.99 on 1 and 9 DF, p-value: 0.002170 The t-statistics and p-values are for tests against a null hypothesis that the true parameter value is zero (each parameter tested separately). If your null is different, you’ll have to do the test on our own Which parameter does this null usually NOT make sense for?

Note: Beware The (Model) Matrix If X has a column which is a linear function of other column(s), it is said to be a singular matrix It cannot be inverted The software may scream at you Conceptually, you are asking the method to decide between two identical explanations; it cannot do this If X has a column which is “almost” a linear function of other column(s), it is said to suffer from collinearity Your beta-hat S.E.’s will be huge Conceptually, you are asking the method to decide between two nearly-identical explanations; still not a good prospect

Categorical X and ANOVA We saw that simple regression with binary X is equivalent to a t-test Similarly, we can model a categorical covariate having k>2 categories, within multiple regression For example: ethnic origin vs. life expectancy The covariate will take up k-1 columns in X – meaning there’ll be k-1 parameters to estimate R interprets text covariates as categorical; you can also convert numerical values to categorical using factor Let’s see this in action:

Categorical X and ANOVA (2) Regression on a categorical variable is equivalent to a technique called ANOVA: analysis of variance ANOVA is used to analyze designed experiments ANOVA’s name is derived from the fact that its hypothesis tests are performed by comparing sums of square deviations (such as those shown last lecture) This is known as the F test, and appears in our standard regression printout ANOVA is considered an older technology, but is still very useful in engineering, agriculture, etc.

Regression Inference and Model Selection So… we can keep fitting the data better by adding as many covariates as we want? Not quite. If p≥n-1, you can fit the observations perfectly. This is known as a saturated model; pretty useless for drawing conclusions (in statistics jargon, you will have used up all your degrees of freedom) Before reaching n-1, each additional covariate improves the fit. Where to stop? Obviously, there is a tradeoff; we seek the optimum between over-fitting and under-fitting From a conceptual perspective, we usually prefer the simpler models (less covariates) However, given all possible covariates, “how to find the optimal combination?” is an open question

Regression Inference: Nested Models If two models are nested, we can make a formal hypothesis test between them, called a likelihood-ratio test (LRT) This test checks whether the gain in explained variability is “worth” the price paid in degrees of freedom But when are two models nested? The simplest case: if model B = model A + some added terms, then A is nested in B (sometimes, nesting includes simplification of more complicated multi-level covariates: e.g., region vs. state) In R, the LRT is available via lrtest , in the lmtest package (and also via the anova function)

Open-Ended Model Selection Where all else fails… use common sense. Your final model is not necessarily the best in terms of “bang for the buck” Your covariate of interest should definitely go in “nuisance covariates” required by the client or the accepted wisdom, should go in as well Causal diagrams are a must for nontrivial problems; covariates with a clear causal connection to the response should go in first (there are also model-selection tools, known as AIC, BIC, cross-validation, BMA, etc.)

Open-Ended Model Selection (2) Additionally, the goal of the model matters: If for formal inference/policy/scientific conclusions, you should be more conservative (less covariates, less effort to fit the data closely) If for prediction and forecasting under conditions similar to those observed, you can be a bit more aggressive In any case, there is no magic solution Always remember not to put too much faith in the model

More Sophisticated Regressions The assumptions of linearity-normality-i.i.d. are quite restrictive Some violations can be handled via standard regression: Nonlinearity – transform the variables Unequal variances – weighted least squares For other violations, extensions of ordinary regression have been developed

More Sophisticated Regressions (2) For some types of non-normality we have generalized linear models (GLM) The GLM solution is also an MLE GLM’s cover a family of distributions that includes the normal, exponential, Gamma, binomial, Poisson The variant with binomial responses is known as Logistic Regression; let’s see it in action If we suffer from outliers or heavy tails, there are many types of robust regression to choose from

More Sophisticated Regressions (3) If observations are not i.i.d., but are instead divided into groups, we can use hierarchical or “mixed” models (this is very common) Of course, any regression can be done using Bayesian methods These are especially useful for complicated hierarchical models Finally, if y’s dependence upon x is not described well by any single function, there is nonparametric regression (“smoothing”) Some of which we may see next week