Data Handling & Analysis Polynomials and model fit Andrew Jackson

Slides:



Advertisements
Similar presentations
Transformations Data transformation is commonly used to linearise the relationship between two numerical variables. If the relationship is non-linear,
Advertisements

Analysis of variance and statistical inference.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Brief introduction on Logistic Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Polynomial Regression and Transformations STA 671 Summer 2008.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Get it Straight!! Chapter 10
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Introduction: The General Linear Model b b The General Linear Model is a phrase used to indicate a class of statistical models which include simple linear.
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 8 Linear Regression © 2010 Pearson Education 1.
CHAPTER 8: LINEAR REGRESSION
EGR 105 Foundations of Engineering I
Excel Part III Curve-Fitting, Regression Section 8 Fall 2013 EGR 105 Foundations of Engineering I.
The Use and Interpretation of the Constant Term
Choosing a Functional Form
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Statistics for the Social Sciences
Statistics for Managers Using Microsoft® Excel 5th Edition
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
EGR 105 Foundations of Engineering I Fall 2007 – week 7 Excel part 3 - regression.
Petter Mostad Linear regression Petter Mostad
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
Lecture Week 3 Topics in Regression Analysis. Overview Multiple regression Dummy variables Tests of restrictions 2 nd hour: some issues in cost of capital.
Review.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Spreadsheet Problem Solving
Model Checking in the Proportional Hazard model
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Nonlinear Regression Functions
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Chapter 10 Re-expressing Data: Get It Straight!. Slide Straight to the Point We cannot use a linear model unless the relationship between the two.
Lecture 6 Re-expressing Data: It’s Easier Than You Think.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Lecture 16 - Approximation Methods CVEN 302 July 15, 2002.
YOU NEED TO KNOW WHAT THIS MEANS
Solving polynomial equations
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Information criteria What function fits best? The more free parameters a model has the higher will be R 2. The more parsimonious a model is the lesser.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Statistics 10 Re-Expressing Data Get it Straight.
Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 8 Linear Regression.
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Basic Estimation Techniques
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Re-expressing the Data: Get It Straight!
Re-expressing the Data: Get It Straight!
Re-expressing the Data: Get It Straight!
So how do we know what type of re-expression to use?
Least Squares Fitting A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the.
Parent Functions.
Using Factoring To Solve
Parent Functions.
Nonlinear Fitting.
Presentation transcript:

Data Handling & Analysis Polynomials and model fit Andrew Jackson

Linear type data How are two measures related?

Data are the number of species (Y) recorded per time spent looking for them (X) Specifically, these data come from fisheries data Good proxy for species diversity in the marine habitat What do we do about curvature?

Clearly a straight line won’t do

… the residuals are horrible

Polynomials Polynomials are linear equations that show curvature – Quadratics Y = b 0 + b 1 X + b 2 X 2 – Cubics Y = b 0 + b 1 X + b 2 X 2 + b 3 X 3 – 5 th, 6 th order polynomials etc…

Quadratic model

Better… But not so good at lower values of x Try a more complicated model like a cubic Quadratic residuals

Note the double curvature Model appears to explain the lower values better But how sure are we of the increase at higher values? Cubic model

Better than the quadratic But still over-estimating the lowest values of x Cubic residuals

Model is – Y~log(X) Appears to explain the data very well across the full range Check the residuals… Log transform the X variable

Now these look pretty near perfect Y~log(X) residuals

The null model Consists of a mean and a variance only It gives us a benchmark against which we can test our models that include more information If we can’t do better than the null model then we don’t understand our data or system!

Residuals of the null model

Choosing between alternative models We now have a choice between 5 models – Null model (zero order polynomial, which includes an intercept only – i.e. just a mean and variance model) – Straight line (first order polynomial) – Quadratic (second order polynomial) – Cubic (third order polynomial) – First order polynomial with log(X) How do we select which one to use? – Higher order polynomials require more parameters

Parsimony as a central tenet Parsimony is the application of the most simplest explanation for a phenomenon and underpins all of science So.. We need to pick the model that – Fits the data the best, and … – Uses the least number of parameters

Likelihood of data

AIC for model selection We will use Akaike’s Information Criterion (AIC) to select the most suitable model AIC = -2Log(likelihood) + 2k – Log-likelihood gets bigger the better the fit – k is the number of parameters in the model Lower AIC = more suitable model

AIC of our models Null model Straight line Quadratic Cubic th order th order th order-77.7 log(X) So the log(x) model is the best in this case Note that adding more orders to the polynomials ceases to confer any benefit after 5 th order. Also… these get increasingly difficult to explain and relate to biological phenomena

Conclusions AIC provides an objective way to compare alternative models Lower AIC indicates a more parsimonius model Must only compare AIC on models of the exact same response variable Only provides relative, and not absolute indication of model fit – Still need to check that the model is any good – Residuals etc…