Research Question What determines a person’s height?
Genetics Nutrition Immigration / Origins Disease Hypothesis Brainstorming Sons will be similar to their Dad’s height Daughters will be similar to their Mom’s height Hypotheses:
Literature Review: Article #1 Invented Regression When Mid-Parents are taller then mediocrity, their Children tend to be shorter than they When Mid-Parents are shorter than mediocrity, their Children tend to be taller then they Francis Galton
Literature Review: Article #2 Variables: Genes First two years of life Illnesses Infant mortality rates Smaller Families Higher income Better education
Literature Review: Article #3 “we find that a 54-loci genomic profile explained 4–6% of the sex- and age-adjusted height variance” “the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance”
Literature Review: Summary VariableGaltonHattonAulchenko HeightIndividualsCountry AverageIndividuals GenderMen and WomenMen OnlyMen and Women AgeIndividuals Countries Infant MortalityCountry Average GDPCountry Average Family SizeCountry Average TimeX GenomeIndividuals Observations~1, ,478
Variables Y X’s Height Independent Variables Dependent Variable Y X4 X3 X2X1
Height Dataset Variables heights <- read.csv("GaltonFamilies.csv")
Data Types: Numbers and Factors/Categorical Dataset Variables: Type
Summary Statistics
Frequency Distribution, Histogram hist(heights$childHeight)
hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14)) curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T) Bimodal: two modes Mode, Bimodal
Q-Q Plot
Correlation Matrix for Continuous Variables chart.Correlation(num2) PerformanceAnalytics package
Correlations Matrix: Both Types library(car) scatterplotMatrix(heights) Zoom in on Gender
Categorical: Revisit Box Plot Note there is an equation here: Y = mx b Correlation will depend on spread of distributions
Children Height by Gender
Linear Regression: Model 1 Child’s Height = f(Father’s Height)
Linear Regression: Model 2 model.5 <- lm(childHeight~gender, data = h) Child’s Height = f(Father’s Height)
Mom MidParent Height Linear Regression: Additional Models
Compare Models Model Intercept Father Mom NA midparentHeight Gender R-squares r R^
Key Findings: Gender was the biggest factor Parents height played a lesser role Downsides DataSet used did not include more variables of interest DataSet for X Country for 1877 Discussion Summary
Include More Predictor Variables Literature review of a few articles suggests several important factors: Nutrition Analyze a Contemporary DataSet DataSet used was from 18?? Location Specific as Well Future Research