Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.

Slides:



Advertisements
Similar presentations
Which Test? Which Test? Explorin g Data Explorin g Data Planning a Study Planning a Study Anticipat.
Advertisements

Bivariate Normal Distribution and Regression Application to Galton’s Heights of Adult Children and Parents Sources: Galton, Francis (1889). Natural Inheritance,
Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
IB Math Studies – Topic 6 Statistics.
SPSS Session 1: Levels of Measurement and Frequency Distributions
Chapter 17 Overview of Multivariate Analysis Methods
Data Analysis Statistics. Inferential statistics.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
DATA VISUALIZATION UNIVARIATE (no review- self study) STEM & LEAF BOXPLOT BIVARIATE SCATTERPLOT (review correlation) Overlays; jittering Regression line.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Interactions in Regression.
Analysis of Research Data
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Data Analysis Statistics. Inferential statistics.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Source Code -Tons of Code Package -More Code -Statistical Functions -Datasets Workspace -Fewer Lines of Code -Capability.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Correlation and Covariance. Overview Continuous Categorical Histogram Scatter Boxplot Predictor Variable (X-Axis) Height Outcome, Dependent Variable (Y-Axis)
Correlation and Covariance
What factors are most responsible for height?
R Example Descriptive Statistics Frequency and Histogram Diagrams Standard Deviation.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Graphing Parameters Titles X-Axis Title Y-Axis Title Legend Scales Color Gridlines library(help="graphics") Basic Chart Types The R Graphics Package LineHistogram.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
A Few Handful Many Time Stamps One Time Snapshot Many Time Series Number of Variables Mobile Phone Galton Height Census Titanic Survivors Stock Market.
Chapter 8 Making Sense of Data in Six Sigma and Lean
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Statistics with TI-Nspire™ Technology Module E. Lesson 2: Properties Statistics with TI-Nspire™ Technology Module E.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
 Two basic types Descriptive  Describes the nature and properties of the data  Helps to organize and summarize information Inferential  Used in testing.
Chapter 3 Correlation.  Association between scores on two variables –e.g., age and coordination skills in children, price and quality.
Correlation. Correlation is a measure of the strength of the relation between two or more variables. Any correlation coefficient has two parts – Valence:
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Appendix B: Statistical Methods. Statistical Methods: Graphing Data Frequency distribution Histogram Frequency polygon.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
Research Question What determines a person’s height?
Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.
Where to Get Data? Run an Experiment Use Existing Data.
What factors are most responsible for height?. Model Specification ERROR??? measurement error model error analysis unexplained unknown unaccounted for.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s.
Continuous Outcome, Dependent Variable (Y-Axis) Child’s Height
FCI Supplement What determines FCI scores?. Explore FCI Dataset Descriptive Statistics Histograms Correlations Factor Analysis?
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
EHS 655 Lecture 4: Descriptive statistics, censored data
Anticipating Patterns Statistical Inference
Ggplot2 Wu Shaohuan.
Applied Biostatistics: Lecture 2
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Univariate Statistics
Basic Statistics Overview
Understanding Research Results: Description and Correlation
Bivariate Testing (Chi Square)
Treat everyone with sincerity,
Part I Review Highlights, Chap 1, 2
(Approximately) Bivariate Normal Data and Inference Based on Hotelling’s T2 WNBA Regular Season Home Point Spread and Over/Under Differentials
Descriptive Stat and Correlation
Correlation and Covariance
Association between 2 variables
Learning outcomes By the end of this session you should know about:
What’s your New Year’s Resolution?
Presentation transcript:

Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s Height and Gender Graphic Packages: ggplot2

What factors are most responsible for height? Outcome = (Model) + Error

Galton’s Notebook on Families & Height

X1X2X3Y Galton’s Family Height Dataset

> getwd() [1] "C:/Users/johnp_000/Documents" > setwd()

Dataset Input Function Filename Object h <- read.csv("GaltonFamilies.csv")

str() summary() Data Types: Numbers and Factors/Categorical

Outline One Variable: Univariate Dependent / Outcome Variable Two Variables: Bivariate Outcome and each Predictor All Four Variables: Multivariate

Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s Height

Frequency Distribution, Histogram hist(h$child)

Area = 1 Density Plot plot(density(h$childHeight))

hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14)) curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T) Mode, Bimodal

Grammar of Graphics formations Legend Axes Seven Components ggplot2 built using the grammar of graphics approach

Asst. Professor of Statistics at Rice University ggplot2 plyr reshape rggobi profr Hadley Wickman and ggplot2

In ggplot2 a plot is made up of layers. ggplot2 Plot Grammar of Graphics Layer -Data - Mapping -Geom -Stat -Postiion Scale Coord Facet

ggplot2 library(ggplot2) h.gg <- ggplot(h, aes(child)) h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency") h.gg + geom_density()

ggplot2 h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right") h.gg + geom_density() + labs(x = "Height", y = "Frequency") h.gg + geom_density(aes(fill=factor(gender)), size=2)

Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s Height

Correlation and Regression

1.Calculate the difference between the mean and each person’s score for the first variable (x). 2.Calculate the difference between the mean and their value for the second variable (y). 3.Multiply these “error” values. 4.Add these values to get the cross product deviations. 5.The covariance is the average of cross-product deviations Covariance

Y X Persons 2,3, and 5 look to have similar magnitudes from their means

Covariance Calculate the error [deviation] between the mean and each subject’s score for the first variable (x). Calculate the error [deviation] between the mean and their score for the second variable (y). Multiply these error values. Add these values and you get the cross product deviations. The covariance is the average cross-product deviations:

Covariance depends upon the units of measurement Normalize the data Divide by the standard deviations of both variables. The standardized version of covariance is known as the correlation coefficient Standardizing the Covariance

Correlation ?cor cor(h$father, h$child)

Scatterplot Matrix: pairs()

Correlations Matrix library(car) scatterplotMatrix(heights)

ggplot2

Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s Height

Box Plot

Children’s Height vs. Gender boxplot(h$child~gender,data=h, col=(c("pink","lightblue")), main="Children's Height by Gender", xlab="Gender", ylab="")

Descriptive Stats: Box Plot ======

Subset Males men<- subset(h, gender=='male')

Subset Females women <- subset(h, gender==‘female')

Children’s Height: Males hist(men$childHeight)

Children’s Height: Females hist(women$child)

ggplot2 library(ggplot2) h.bb <- ggplot(h, aes(factor(gender), child)) h.bb + geom_boxplot() h.bb + geom_boxplot(aes(fill = factor(gender)))

Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Dad’s Height Gender Continuous Type Variable Mom’s Height