R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Slides:



Advertisements
Similar presentations
1 Statistical Analysis for an AIDS Clinical Trial Weiming Ke The University of Memphis St. Jude Childrens Research Hospital June 5, 2004.
Advertisements

R_SimuSTAT_1 Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University.
Graph of a Curve Continuity This curve is _____________These curves are _____________ Smoothness This curve is _____________These curves are _____________.
Graph of a Curve Continuity This curve is continuous
Maximal Independent Subsets of Linear Spaces. Whats a linear space? Given a set of points V a set of lines where a line is a k-set of points each pair.
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
1 SPSS output & analysis. 2 The Regression Equation A line in a two dimensional or two-variable space is defined by the equation Y=a+b*X The Y variable.
Quantitative Methods Lecture 3
Linear regression models in R (session 1) Tom Price 3 March 2009.
Bivariate &/vs. Multivariate
Korelasi Diri (Auto Correlation) Pertemuan 15 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regression Analysis Chapter 10.
Chapter 4: Basic Estimation Techniques
Descriptive Statistics Calculations and Practical Application Part 2 1.
Partial Least Squares Very brief intro. Multivariate regression The multiple regression approach creates a linear combination of the predictors that best.
Outcome: Determine the square root of perfect squares.
Malcolm Cameron Xiaoxu Tan
All Possible Regressions and Statistics for Comparing Models
Co-ordinate Geometry 1 Contents 1.Distance between points (Simple)Distance between points (Simple) 2.Pythagoras and Distance between two pointsPythagoras.
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Lecture Unit Multiple Regression.
Test B, 100 Subtraction Facts
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
The Right Questions about Statistics: How confidence intervals work Maths Learning Centre The University of Adelaide A confidence interval is designed.
2.4 – Factoring Polynomials Tricky Trinomials The Tabletop Method.
Simple Linear Regression Analysis
Business Statistics, 4e by Ken Black
Multiple Regression and Model Building
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
Linear Programming – Simplex Method: Computational Problems Breaking Ties in Selection of Non-Basic Variable – if tie for non-basic variable with largest.
Multiplication Facts Practice
Normal Distribution The shaded area is the probability of z > 1.
Graeme Henchel Multiples Graeme Henchel
0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
7x7=.
SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
BCOR 1020 Business Statistics Lecture 28 – May 1, 2008.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Simple Linear Regression Analysis
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Using R for Marketing Research Dan Toomey 2/23/2015
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Environmental Modeling Basic Testing Methods - Statistics III.
Linear Models Alan Lee Sample presentation for STATS 760.
Data Analytics – ITWS-4600/ITWS-6600
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Jefferson Davis Research Analytics
Correlation and regression
Regression model with multiple predictors
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Console Editeur : myProg.R 1
PSY 626: Bayesian Statistics for Psychological Science
Chapter 12 Simple Linear Regression and Correlation
SAME THING?.
CHAPTER 12 More About Regression
Presentation transcript:

R for Macroecology Tests and models

Smileys

Homework  Solutions to the color assignment problem?

Statistical tests in R  This is why we are using R (and not C++)!  t.test()  aov()  lm()  glm()  And many more

Read the documentation!  With statistical tests, its particularly important to read and understand the documentation of each function you use  They may do some complicated things with options, and you want to make sure they do what you want  Default behaviors can change (with, e.g. sample size)

Returns from statistical tests  Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > t.test(x,y) Welch Two Sample t-test data: x and y t = , df = 18, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

Returns from statistical tests  Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > test = t.test(x,y) > str(test) List of 9 $ statistic : Named num attr(*, "names")= chr "t" $ parameter : Named num attr(*, "names")= chr "df" $ p.value : num $ conf.int : atomic [1:2] attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "x and y" - attr(*, "class")= chr "htest" t.test() returns a list

Returns from statistical tests  Getting the results out  This hopefully looks familiar after last week’s debugging > x = 1:10 > y = 3:12 > test = t.test(x,y) > test$p.value [1] > test$conf.int[2] [1] > test[[3]] [1]

Model specification  Models in R use a common syntax:  Y ~ X 1 + X 2 + X X i  Means Y is a linear function of X 1:j

Linear models  Basic linear models are fit with lm()  Again, lm() returns a list > x = 1:10 > y = 3:12 > test = lm(y ~ x) > test Call: lm(formula = y ~ x) Coefficients: (Intercept) x 2 1

Linear models  summary() is helpful for looking at a model > summary(test) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max e e e e e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e e e+15 <2e-16 *** x 1.000e e e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.883e-16 on 8 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

Extracting coefficients, P values  For the list (e.g. “test”) returned by lm(), test$coefficients will give the coefficients, but not the std.error or p value.  Instead, use summary(test)$coefficients

Model specification - interactions  Interactions are specified with a * or a :  X 1 * X 2 means X 1 + X 2 + X 1 :X 2  (X 1 + X 2 + X 3 )^2 means each term and all second-order interactions  - removes terms  constants are included by default, but can be removed with “-1”  more help available using ?formula

Quadratic terms  Because ^2 means something specific in the context of a model, if you want to square one of your predictors, you have to do something special: > x = 1:10 > y = 3:12 > test = lm(y ~ x + x^2) > test$coefficients (Intercept) x 2 1 > test = lm(y ~ x + I(x^2)) > test$coefficients (Intercept) x I(x^2) e e e-17

A break to try things out  t test  anova  linear models

Plotting a test object  plot(t.test(x,y)) does nothing  plot(lm(y~x)) plots diagnostic graphs

Forward and backward selection  Uses the step() function Object – starting model Scope – specifies the range of models to consider Direction – backward, forward or both? Trace – print to the screen? Steps – set a maximum number of steps k – penalization for adding variables (2 means AIC)

> x1 = runif(100) > x2 = runif(100) > x3 = runif(100) > x4 = runif(100) > x5 = runif(100) > x6 = runif(100) > y = x1+x2+x3+runif(100) > model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F) > summary(model) Call: lm(formula = y ~ x2 + x3 + x1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** x e-15 *** x < 2e-16 *** x e-15 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 96 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 96 DF, p-value: < 2.2e-16

All subsets selection leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE) x – a matrix of predictors y – a vector of the response method – how to compare models (Mallows C p, adjusted R 2, or R 2 ) nbest – number of models of each size to return

> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leaps(x = Xmat, y, method = "Cp", nbest = 2) $which FALSE TRUE FALSE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE FALSE 2 TRUE FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE FALSE FALSE FALSE 3 TRUE TRUE TRUE FALSE FALSE FALSE 3 FALSE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE FALSE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE FALSE TRUE 6 TRUE TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" "5" "6" $size [1] $Cp [1] [9] > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)

> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2) > aicVals = NULL > for(i in 1:nrow(leapOut$which)) > { > model = as.formula(paste("y~",paste(c("x1","x2","x3","x4", "x5","x6")[leapOut$which[i,]],collapse = "+"))) > test = lm(model) > aicVals[i] = AIC(test) > } > aicVals [1] [8] > i = 8 > leapOut$which[i,] TRUE TRUE TRUE FALSE TRUE FALSE > c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]] [1] "x1" "x2" "x3" "x5" > paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]],collapse = "+")) [1] "y~ x1+x2+x3+x5"

Comparing AIC of the best models > data.frame(leapOut$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals 1 FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Practice with the mammal data  VIF, lm(), AIC(), leaps()