Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Linear regression models
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
The Simple Regression Model
Chapter 11 Multiple Regression.
Introduction to Probability and Statistics Linear Regression and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation & Regression
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Simple Linear Regression Models
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Introduction to Linear Regression
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Environmental Modeling Basic Testing Methods - Statistics III.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Correlation & Regression Analysis
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 4: Basic Estimation Techniques
Chapter 20 Linear and Multiple Regression
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Basic Estimation Techniques
Virtual COMSATS Inferential Statistics Lecture-26
Correlation and Regression
Chapter 11 Simple Regression
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Multiple Regression Models
CHAPTER- 17 CORRELATION AND REGRESSION
Correlation and Simple Linear Regression
Undergraduated Econometrics
Simple Linear Regression and Correlation
Product moment correlation
3.2. SIMPLE LINEAR REGRESSION
Introduction to Regression
St. Edward’s University
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Computational Statistics

Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement in a sample population)  We will often consider two (or more) variables simultaneously.  1) The data (x1, y1),…, (xn, yn) are considered as independent replications of a pair of random variables, (X, Y ). (observational studies)  2) The data are described by a linear regression model (planned experiments)  yi = a + b * xi + εi; i = 1,..., n

Regression

The linear model  Multiple regression model  Predicts a response variable using a linear function made up of co-variables (predictor variables)  Goal:  to estimate the unknown parameter (β p ) of each covariable (X p ) (its weight/significance)  to estimate the the error variance  Ŷ = b1 + b2 * X1 + b3 * X2 +…. + ε  ε = systematic erros + random errors

The linear model II  Quantify uncertainty in empirical data  Assign significance to various components (covariables)  Find a good compromise between between model size and the ability to describe data (and hence the response)

The linear model III  Sample size n > number of predictors p  p column vectors are linerarly independent  errors are random => responses are random as well  E(ε) = 0. If ε ≠ 0 => systematic error, if the model is correct

Model variations  Linear regression through the origin  p1 = 1, Ŷ = B1 x X + ε  Simple linear regression  p1 = 2, Ŷ = B1 + B2 x X + ε  Quadratic regression  p1 = 3, Ŷ = B1 + B2 x X + B2 x X 2 + ε  Regression with transformed predictor variables (example)  Ŷ = B1 + B2 x log(X) + B2 x sin(X)+ ε  Data needs to be checked for linearity to identify model changes

Goals of analysis  A good fit with small errors using the method of least squares  Good parameter estimates – how much predictor variables explains (contributes to) the response in the chosen model  Good prediction of the response as a function of predictor variables  Using confidence intervals and statistical tests to help us reach these goals  Find the best model in an interactive process. Probably using heuristics

Least Squares  residual r = Ŷ(beta, covariates) – X (empirical)  Best β when r² is minimal for the chosen set of covariates (in the given model)  Least squares based on random errors => least squares are random too => different betas for each measured sample => different regression lines (Although, The Central Limit Theory predicts a “true” regression line with “enough” samples)

Linear Model Assumptions  E(error) = 0 (Linear equation is correct)  All xi’s are exact (no systematic error)  Error variance is constant (homoscedasticity). Empirical error variance = teoretical variance.  Otherwise weighted least squares can be used  Uncorrelated errors; Cov(ei, ej) = 0 for all i≠j.  Otherwise, generalized least squares can be used  Errors are normally distributed => Y normally distributed.  Otherwise, robust methods can be used instead of least squares

Model cautions  covariate problem (time-based)  Dangerous to use a fitted model to extrapolate where no predictor variables have been observed  Is the average height of the Vikings just a few centimeters?

Test and Confidence (any predictor)  Test predictor (p) using the null-hypothesis H0,p : βp = 0 againt the alternative Ha,p : βp ≠ 0  Using t-test and P-values to determine the relevance  Quantifies the effect of the p’th predictor variable after having subtracted the linear effect of all other predictor variables on Y  Problem: All predictors might have significance due to correlation among predictor variables

Test and Confidence (global)  Using ANOVA (ANalysis Of VAriance) to test the hypothesis (H0) that all βs = 0 (no relevance) versus at least one β≠0 (Ha)  F-test to quantify the statistical significance of the predictor variables  Describe fitness using sum of squares R 2 = SS(explained) / SS(total) = (Ŷ-E(Y)) 2 / ( Y-E(Y) ) 2

Tukey-Anscombe plot (linearity assumption)  Using residuals as an approximation of the unobservable error and linearity  Plotting residuals against the fitted values (response)  Correlation should be zero -> random fluctuation of values around a horisontal through zero line  A trend-plot is evidence of a non-linear relation (or systematic error)  Possible solution : transform the response variable or perform a weighted regression  SD grows : Y -> log(Y)  SD grows as square root : Y->SQRT(Y)

The Normal/QQ Plot (norm.distr.assumptions)  Checking the normal distribution using quantile-quantile plot (qqplot) or normal plot  y-axis = Quantiles of the residuales, x-axis = theoretical quantiles of N(0,1)  Normal plot gives a straight line intercepting the mean with a slope value = standard deviation

Weighted regression ??

Model selection  We want the model to be as simple as possible  What predictors should be included?  We want the best/optimal model, not necessarily the true model  More predictors -> higher variance  Optimize the bias-variance trade-off

Searching for the best model  Forward selection  Start with the smallest model, include the predictor which reduces the residual sum of squares most until a large number of predictors have be selected. Choose the model with the smallest Cp-statistic  Backward selection  Start with the full model, Exclude the predictor which increases the residual sum of squares the least until all, or most, predictor variables have been deleted. Choose the model with the smallest Cp-statistic  The cross-validated R 2 can be used to calculate the best model when multiple models have been identified (using forward or backward selection)