Regression and Analysis Variance Linear Models in R
What do you know already?
Regression Continuous Dependent Variable Continuous Independent Variable Assumptions Normality Independence Constant variance N(0, 2 ) Linear or curvilinear
ANOVA Continuous Dependent Variable Discrete Independent Variable Assumptions Normality Independence Constant variance N(0, 2 ) Factor level variances are equal
Linear Models Regression and ANOVA (and in fact ANCOVA) are all related mathematically to one another. Exactly the same mathematics is used throughout. The only difference is the type (and number) of independent variables that you are working with. The base assumptions are required for all linear models.
What procedure are we going to use to analyse linear model data?
Wagga House Prices A Wagga Wagga Real Estate Agent wishes to use data from 30 recent house sales to predict future selling prices ($ 000) from land area (m 2 ). The data was collected from the internet from any real estate listings that included the land size and the listing price. Most of the included listings were for 2 bedroom, 1 bathroom and 1 garage houses.
Call: lm(formula = Price ~ Land, data = dat) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) Land e-07 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 28 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 28 DF, p-value: 3.141e-07
anova(dat.lm) Analysis of Variance Table Response: Price Df Sum Sq Mean Sq F value Pr(>F) Land e-07 *** Residuals Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Bottlenose Dolphins Neonate bottlenose dolphins produce many sounds just after birth. Prior to suckling these sounds intensify and then as the neonate prepares to feed the sounds cease, this is called a latency period (LP). It is thought that the LP is related to the suckling frequency. A study was conducted to collect information about the length of the LP and the suckling frequency, where the aim was to define this relationship if it existed.
Johne’s Disease To eliminate Johne’s disease from an infected farm or to prevent transmission, it is essential that susceptible animals are not exposed to an environment contaminated with the virus. The virus causing Johne’s disease is capable of persisting in the environment for long periods due to the high lipid content in the cell wall and the metabolic inactivity of the organism. Factors that could influence the survival of the virus in the soil including temperature, pH, organic matter exposure to ultra violet light and moisture content were investigated under controlled conditions.
Johne’s Disease continued This experiment involved trays of contaminated soil randomised to 12 unique treatments, involving changing the pH, UV light and the moisture content. They are uniquely defined as Treatment 1:12. The treatments were randomised to the trays of soil on a completely randomised fashion so there each treatment was replicated 5 times. The ln(number of virsus) remaining was the response measured as an indication of the effectiveness of the treatment. The aim of the experiment is to determine the “best” treatment for removing the virus from the soil.