Download presentation
Presentation is loading. Please wait.
1
Descriptive Stat and Correlation
Homeworks #1 and #2 Descriptive Stat and Correlation
2
What Determines Housing Price?
Industrial: the mean is slightly higher than the median; suggesting that distribution is slightly skewed. Odd that there are so many values that are 18.1
3
Correlation Matrix Chart from Prior Slide
4
Correlation and Outliers
5
Correlation and Outliers
Analysis with all data = Outliers – rationale story? Sometimes don’t know How to remove outliers Without outliers = ???
6
Normal Distribution & Outliers
7
Housing Price: Outliers
Tukey Outlier Keep or Remove? hist(bh$hprice, col = "green") plot(density(bh$hprice), ylim = c(0,0.07),col ="blue") curve(dnorm(x, mean=mean(bh$hprice), sd=sd(bh$hprice)), col="red", add=T) More:
8
Normal Distribution 4.13 13.33 22.53 31.73 40.93 50.13
9
Hard to know what trim value to use ahead of time
trim the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint. Default 0.1 Hard to know what trim value to use ahead of time
10
More Descriptive Statistics
11
Mean / Standard Deviation Coefficient of Variation
12
Crime
13
Removing Outliers: Subset Data
bh.out <- subset(bh, bh$hprice < 37 & bh$crime <37)
14
Correlation after Removal of Outliers
15
Correlation and Outliers
Analysis with all data = Outliers – rationale story? Sometimes don’t know How to remove outliers Without outliers =
16
Equation for a line: Y = mx +b
m = slope = change Y, change X b = y-intercept (when X=0) ∆Y 29.75 – 16.79 12.96 = = ∆X 0 - 20 - 20 Y b X
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.