Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.

Similar presentations


Presentation on theme: "Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s."— Presentation transcript:

1 Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s Height and Gender Graphic Packages: ggplot2

2 What factors are most responsible for height? Outcome = (Model) + Error

3 Galton’s Notebook on Families & Height

4 X1X2X3Y Galton’s Family Height Dataset

5 > getwd() [1] "C:/Users/johnp_000/Documents" > setwd()

6 Dataset Input Function Filename Object h <- read.csv("GaltonFamilies.csv")

7 str() summary() Data Types: Numbers and Factors/Categorical

8 Outline One Variable: Univariate Dependent / Outcome Variable Two Variables: Bivariate Outcome and each Predictor All Four Variables: Multivariate

9 Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s Height

10 Frequency Distribution, Histogram hist(h$child)

11 Area = 1 Density Plot plot(density(h$childHeight))

12 hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14)) curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T) Mode, Bimodal

13 Grammar of Graphics formations Legend Axes Seven Components ggplot2 built using the grammar of graphics approach

14 Asst. Professor of Statistics at Rice University ggplot2 plyr reshape rggobi profr Hadley Wickman and ggplot2 http://ggplot2.org/

15 In ggplot2 a plot is made up of layers. ggplot2 Plot Grammar of Graphics Layer -Data - Mapping -Geom -Stat -Postiion Scale Coord Facet

16 ggplot2 library(ggplot2) h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency") h.gg + geom_density()

17 ggplot2 h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right") h.gg + geom_density() + labs(x = "Height", y = "Frequency") h.gg + geom_density(aes(fill=factor(gender)), size=2)

18 Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s Height

19 Correlation and Regression

20

21 1.Calculate the difference between the mean and each person’s score for the first variable (x). 2.Calculate the difference between the mean and their value for the second variable (y). 3.Multiply these “error” values. 4.Add these values to get the cross product deviations. 5.The covariance is the average of cross-product deviations Covariance

22 Y X Persons 2,3, and 5 look to have similar magnitudes from their means

23 Covariance Calculate the error [deviation] between the mean and each subject’s score for the first variable (x). Calculate the error [deviation] between the mean and their score for the second variable (y). Multiply these error values. Add these values and you get the cross product deviations. The covariance is the average cross-product deviations:

24 Covariance depends upon the units of measurement Normalize the data Divide by the standard deviations of both variables. The standardized version of covariance is known as the correlation coefficient Standardizing the Covariance

25 Correlation ?cor cor(h$father, h$child) 0.2660385

26 Scatterplot Matrix: pairs()

27 Correlations Matrix library(car) scatterplotMatrix(heights)

28 ggplot2

29 Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s Height

30 Box Plot

31 Children’s Height vs. Gender boxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")

32 Descriptive Stats: Box Plot 69.23 64.10 5.13 ======

33 Subset Males men<- subset(h, gender=='male')

34 Subset Females women <- subset(h, gender==‘female')

35 Children’s Height: Males hist(men$childHeight)

36 Children’s Height: Females hist(women$child)

37 ggplot2 library(ggplot2) h.bb <- ggplot(h, aes(factor(gender), child)) h.bb + geom_boxplot() h.bb + geom_boxplot(aes(fill = factor(gender)))

38 Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Dad’s Height Gender Continuous Type Variable Mom’s Height


Download ppt "Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s."

Similar presentations


Ads by Google