Data Handling & Analysis Polynomials and model fit Andrew Jackson

Data Handling & Analysis Polynomials and model fit Andrew Jackson a.jackson@tcd.ie

Linear type data How are two measures related?

Data are the number of species (Y) recorded per time spent looking for them (X) Specifically, these data come from fisheries data Good proxy for species diversity in the marine habitat What do we do about curvature?

Clearly a straight line won’t do

… the residuals are horrible

Polynomials Polynomials are linear equations that show curvature – Quadratics Y = b 0 + b 1 X + b 2 X 2 – Cubics Y = b 0 + b 1 X + b 2 X 2 + b 3 X 3 – 5 th, 6 th order polynomials etc…

Quadratic model

Better… But not so good at lower values of x Try a more complicated model like a cubic Quadratic residuals

Note the double curvature Model appears to explain the lower values better But how sure are we of the increase at higher values? Cubic model

Better than the quadratic But still over-estimating the lowest values of x Cubic residuals

Model is – Y~log(X) Appears to explain the data very well across the full range Check the residuals… Log transform the X variable

Now these look pretty near perfect Y~log(X) residuals

The null model Consists of a mean and a variance only It gives us a benchmark against which we can test our models that include more information If we can’t do better than the null model then we don’t understand our data or system!

Residuals of the null model

Choosing between alternative models We now have a choice between 5 models – Null model (zero order polynomial, which includes an intercept only – i.e. just a mean and variance model) – Straight line (first order polynomial) – Quadratic (second order polynomial) – Cubic (third order polynomial) – First order polynomial with log(X) How do we select which one to use? – Higher order polynomials require more parameters

Parsimony as a central tenet Parsimony is the application of the most simplest explanation for a phenomenon and underpins all of science So.. We need to pick the model that – Fits the data the best, and … – Uses the least number of parameters

Likelihood of data

AIC for model selection We will use Akaike’s Information Criterion (AIC) to select the most suitable model AIC = -2Log(likelihood) + 2k – Log-likelihood gets bigger the better the fit – k is the number of parameters in the model Lower AIC = more suitable model

AIC of our models Null model -248.2 Straight line - 184.1 Quadratic -142.5 Cubic - 124.9 4 th order-83.5 5 th order-77.6 6 th order-77.7 log(X) - 68.4 So the log(x) model is the best in this case Note that adding more orders to the polynomials ceases to confer any benefit after 5 th order. Also… these get increasingly difficult to explain and relate to biological phenomena

Conclusions AIC provides an objective way to compare alternative models Lower AIC indicates a more parsimonius model Must only compare AIC on models of the exact same response variable Only provides relative, and not absolute indication of model fit – Still need to check that the model is any good – Residuals etc…

Data Handling & Analysis Polynomials and model fit Andrew Jackson

Similar presentations

Presentation on theme: "Data Handling & Analysis Polynomials and model fit Andrew Jackson"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Handling & Analysis Polynomials and model fit Andrew Jackson

Similar presentations

Presentation on theme: "Data Handling & Analysis Polynomials and model fit Andrew Jackson"— Presentation transcript:

Similar presentations

About project

Feedback