Some Analysis of Some Perch Catch Data 56 perch were caught in a freshwater lake in Finland Their weights, lengths, heights and widths were recorded It may be anticipated that thefish's weights depend on their lengths, heights and widths whose product is a proxy for volume
Some questions/goals: summary outliers prediction interpretation of coefficients linear gaussian errors preparation for a comparative study presentation of results...
Some of the data. Weight(g) Length(cm) Height(cm) Width(cm) summary(weight) Min. 1st Qu. Median Mean 3rd Qu. Max
stem() The decimal point is 2 digit(s) to the right of the | 0 | | | 16 6 | | | The decimal point is 2 digit(s) to the right of the | 0 | | | | | 5 | 16 6 | | 00 8 | | | | 00
ecdf()
qqnorm()
density()
boxplot()
library(lattice) splom
plot()
qqnorm()
summary(junk2) Call: lm(formula = logweight ~ loglength + logheight + logwidth) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-07 *** loglength e-09 *** logheight *** logwidth ** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 52 degrees of freedom Multiple R-squared: 0.994, Adjusted R-squared: F-statistic: 2890 on 3 and 52 DF, p-value: < 2.2e-16
qqnorm()
anova(junk2) Analysis of Variance Table Response: logweight Df Sum Sq Mean Sq F value Pr(>F) loglength < 2.2e-16 *** logheight e-07 *** logwidth ** Residuals Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
logw = logl logh logw (.1690) (.2265) (.2167) (.1803)
h i library(MASS) lm.influence()$hat
E i * = E i /[S (-i) (1-h i )] qqline(studres())
E i * [h i /(1-h i )] dffits
D i cooks.distance
library(car) av.plots()
junk3<-cbind(length-mean(length),width-mean(width),height- mean(height)) cor(junk3) [,1] [,2] [,3] [1,] [2,] [3,]
Is X'X near singular? Would make interpretation of coefficients difficult junk3<-cbind(length-mean(length),width-mean(width),height- mean(height)) junk4<-svd(junk3) junk4$d [1]
Conclusions. Can replace weight by product of lengths Traditional linear model results not strongly invalidated Began with EDA, to look for unusual "things", then moved onto linear model...