Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Handling & Analysis Polynomials and model fit Andrew Jackson

Similar presentations


Presentation on theme: "Data Handling & Analysis Polynomials and model fit Andrew Jackson"— Presentation transcript:

1 Data Handling & Analysis Polynomials and model fit Andrew Jackson a.jackson@tcd.ie

2 Linear type data How are two measures related?

3 Data are the number of species (Y) recorded per time spent looking for them (X) Specifically, these data come from fisheries data Good proxy for species diversity in the marine habitat What do we do about curvature?

4 Clearly a straight line won’t do

5 … the residuals are horrible

6 Polynomials Polynomials are linear equations that show curvature – Quadratics Y = b 0 + b 1 X + b 2 X 2 – Cubics Y = b 0 + b 1 X + b 2 X 2 + b 3 X 3 – 5 th, 6 th order polynomials etc…

7 Quadratic model

8 Better… But not so good at lower values of x Try a more complicated model like a cubic Quadratic residuals

9 Note the double curvature Model appears to explain the lower values better But how sure are we of the increase at higher values? Cubic model

10 Better than the quadratic But still over-estimating the lowest values of x Cubic residuals

11 Model is – Y~log(X) Appears to explain the data very well across the full range Check the residuals… Log transform the X variable

12 Now these look pretty near perfect Y~log(X) residuals

13 The null model Consists of a mean and a variance only It gives us a benchmark against which we can test our models that include more information If we can’t do better than the null model then we don’t understand our data or system!

14 Residuals of the null model

15 Choosing between alternative models We now have a choice between 5 models – Null model (zero order polynomial, which includes an intercept only – i.e. just a mean and variance model) – Straight line (first order polynomial) – Quadratic (second order polynomial) – Cubic (third order polynomial) – First order polynomial with log(X) How do we select which one to use? – Higher order polynomials require more parameters

16 Parsimony as a central tenet Parsimony is the application of the most simplest explanation for a phenomenon and underpins all of science So.. We need to pick the model that – Fits the data the best, and … – Uses the least number of parameters

17 Likelihood of data

18 AIC for model selection We will use Akaike’s Information Criterion (AIC) to select the most suitable model AIC = -2Log(likelihood) + 2k – Log-likelihood gets bigger the better the fit – k is the number of parameters in the model Lower AIC = more suitable model

19 AIC of our models Null model -248.2 Straight line - 184.1 Quadratic -142.5 Cubic - 124.9 4 th order-83.5 5 th order-77.6 6 th order-77.7 log(X) - 68.4 So the log(x) model is the best in this case Note that adding more orders to the polynomials ceases to confer any benefit after 5 th order. Also… these get increasingly difficult to explain and relate to biological phenomena

20 Conclusions AIC provides an objective way to compare alternative models Lower AIC indicates a more parsimonius model Must only compare AIC on models of the exact same response variable Only provides relative, and not absolute indication of model fit – Still need to check that the model is any good – Residuals etc…


Download ppt "Data Handling & Analysis Polynomials and model fit Andrew Jackson"

Similar presentations


Ads by Google