Research Question What determines a person’s height?

Slides:



Advertisements
Similar presentations
Bivariate Normal Distribution and Regression Application to Galton’s Heights of Adult Children and Parents Sources: Galton, Francis (1889). Natural Inheritance,
Advertisements

Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
Linear regression models
Alexa Curcio. Original Problem : Would a restriction on height, such as prohibiting males from marrying taller females, affect the height of the entire.
1 Regression Analysis Modeling Relationships. 2 Regression Analysis Regression Analysis is a study of the relationship between a set of independent variables.
Chapter 17 Overview of Multivariate Analysis Methods
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
Alexa Curcio. Would a restriction on height, such as prohibiting males from marrying taller females, affect the height of the entire population?
X Y. Variance Covariance Correlation Scatter plot.
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
Regression Hal Varian 10 April What is regression? History Curve fitting v statistics Correlation and causation Statistical models Gauss-Markov.
Ch. 14: The Multiple Regression Model building
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Simple Linear Regression Analysis
Project Categories and Questions How to improve [Financial Metric]? Business Science What Determines Height? Government Sports How are School Districts.
Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)
Correlation and Covariance
What factors are most responsible for height?
R Example Descriptive Statistics Frequency and Histogram Diagrams Standard Deviation.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-5 Multiple Regression.
Quiz 2 - Review. Descriptive Statistics Be able to interpret: -Box Plots and Histograms -Mean, Median, Standard Deviation, and Percentiles.
Data from OpenIntro Statistics, exercise 1.36 The infant mortality rate is defined as the number of infant deaths per 1,000 live births. The data we consider.
STAT 1301 Chapter 8 Scatter Plots, Correlation. For Regression Unit You Should Know n How to plot points n Equation of a line Y = mX + b m = slope b =
A Few Handful Many Time Stamps One Time Snapshot Many Time Series Number of Variables Mobile Phone Galton Height Census Titanic Survivors Stock Market.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Regression. Population Covariance and Correlation.
Trait evolution Up until now, we focused on microevolution – the forces that change allele and genotype frequencies in a population This portion of the.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Psychology 820 Correlation Regression & Prediction.
Data Analysis.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Linear Statistical.
Least Squares Regression Remember y = mx + b? It’s time for an upgrade… A regression line is a line that describes how a response variable y changes as.
Where to Get Data? Run an Experiment Use Existing Data.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Example x y We wish to check for a non zero correlation.
What factors are most responsible for height?. Model Specification ERROR??? measurement error model error analysis unexplained unknown unaccounted for.
Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Correlation and Regression Chapter 9. § 9.2 Linear Regression.
Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s.
Continuous Outcome, Dependent Variable (Y-Axis) Child’s Height
Displaying Distribution with Graphs Section 1.1. September 18, 2015 Objectives: 1.Describe what is meant by exploratory data analysis. 2.Explain what.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
The Nature of Econometrics Tools of Using Econometrics.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Predict whom survived the Titanic Disaster
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Chapter 12: Regression Diagnostics
Chapter 1: THE NATURE OF REGRESSION ANALYSIS
Simple Linear Regression
Statistical Inference
SPSS STATISTICAL PACKAGE FOR SOCIAL SCIENCES
Multiple Regression Chapter 14.
15.1 The Role of Statistics in the Research Process
Descriptive Stat and Correlation
Chapter Fourteen McGraw-Hill/Irwin
Correlation and Covariance
Chapter 8 Regression analysis I Instructor: Li, Han.
Presentation transcript:

Research Question What determines a person’s height?

Genetics Nutrition Immigration / Origins Disease Hypothesis Brainstorming Sons will be similar to their Dad’s height Daughters will be similar to their Mom’s height Hypotheses:

Literature Review: Article #1 Invented Regression When Mid-Parents are taller then mediocrity, their Children tend to be shorter than they When Mid-Parents are shorter than mediocrity, their Children tend to be taller then they Francis Galton

Literature Review: Article #2 Variables: Genes First two years of life Illnesses Infant mortality rates Smaller Families Higher income Better education

Literature Review: Article #3 “we find that a 54-loci genomic profile explained 4–6% of the sex- and age-adjusted height variance” “the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance”

Literature Review: Summary VariableGaltonHattonAulchenko HeightIndividualsCountry AverageIndividuals GenderMen and WomenMen OnlyMen and Women AgeIndividuals Countries Infant MortalityCountry Average GDPCountry Average Family SizeCountry Average TimeX GenomeIndividuals Observations~1, ,478

Variables Y X’s Height Independent Variables Dependent Variable Y X4 X3 X2X1

Height Dataset Variables heights <- read.csv("GaltonFamilies.csv")

Data Types: Numbers and Factors/Categorical Dataset Variables: Type

Summary Statistics

Frequency Distribution, Histogram hist(heights$childHeight)

hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14)) curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T) Bimodal: two modes Mode, Bimodal

Q-Q Plot

Correlation Matrix for Continuous Variables chart.Correlation(num2) PerformanceAnalytics package

Correlations Matrix: Both Types library(car) scatterplotMatrix(heights) Zoom in on Gender

Categorical: Revisit Box Plot Note there is an equation here: Y = mx b Correlation will depend on spread of distributions

Children Height by Gender

Linear Regression: Model 1 Child’s Height = f(Father’s Height)

Linear Regression: Model 2 model.5 <- lm(childHeight~gender, data = h) Child’s Height = f(Father’s Height)

Mom MidParent Height Linear Regression: Additional Models

Compare Models Model Intercept Father Mom NA midparentHeight Gender R-squares r R^

Key Findings: Gender was the biggest factor Parents height played a lesser role Downsides DataSet used did not include more variables of interest DataSet for X Country for 1877 Discussion Summary

Include More Predictor Variables Literature review of a few articles suggests several important factors: Nutrition Analyze a Contemporary DataSet DataSet used was from 18?? Location Specific as Well Future Research