Mar-16H.S.1 Error check in data Hein Stigum Presentation, data and programs at:

Slides:



Advertisements
Similar presentations
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
Advertisements

Introduction to Engineering MATLAB – 11 Plotting - 4 Agenda Multiple curves Multiple plot.
Chapter 12 Inference for Linear Regression
Topic 9: Remedies.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Chapter 4 The Relation between Two Variables
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Taking the pain out of looping and storing Patrick Royston Nordic and Baltic Stata Users’ meeting, Stockholm, 11 November 2011.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
Deviation = The sum of the variables on each side of the mean will add up to 0 X
1. Overview Do-files Summary statistics Correlation Linear regression
Getting Started with your data
Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change.
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
DIY fractional polynomials Patrick Royston MRC Clinical Trials Unit, London 10 September 2010.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
1 CCPR Computing Services Workshop: Introduction to Stata June, 2006.
The Practice of Statistics
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Describing the Relation Between Two Variables
Key Data Management Tasks in Stata
Tricks in Stata Anke Huss Generating „automatic“ tables in a do-file.
Section 2.5 Notes: Scatter Plots and Lines of Regression.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5 Summarizing Bivariate Data.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
Summarizing Bivariate Data
Regression. Population Covariance and Correlation.
Lesson 1-6 and 1-7 Ordered Pairs and Scatter Plots.
Regression Regression relationship = trend + scatter
Bivariate data are used to explore the relationship between 2 variables. Bivariate Data involves 2 variables. Scatter plots are used to graph bivariate.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
3.2 - Least- Squares Regression. Where else have we seen “residuals?” Sx = data point - mean (observed - predicted) z-scores = observed - expected * note.
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
Dec-15H.S.1 Stata 8, Programing Hein Stigum Presentation, data and programs at:
AP STATISTICS Section 3.2 Least Squares Regression.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
A P STATISTICS LESSON 3 – 3 (DAY 3) A P STATISTICS LESSON 3 – 3 (DAY 3) RISIDUALS.
Ch 5 Relationships Between Quantitative Variables (pg 150) –Will use 3 tools to describe, picture, and quantify 1) scatterplot 2) correlation 3) regression.
Thinking about Graphs The Grammar of Graphics and Stata.
Introduction to Regression
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Mar-16H.S.1Mar-16H.S.1 Stata 5, Mixed Models Not finished Hein Stigum Presentation, data and programs at:
Chapter 9 Scatter Plots and Data Analysis LESSON 1 SCATTER PLOTS AND ASSOCIATION.
Bivariate Data – Scatter Plots and Correlation Coefficient……
Algebra 1 Section 4.1 Plot points and scatter plots To locate a point in a plane, you can use an ordered pair in the form (x,y) in a Cartesian Coordinate.
Unit 3 Correlation. Homework Assignment For the A: 1, 5, 7,11, 13, , 21, , 35, 37, 39, 41, 43, 45, 47 – 51, 55, 58, 59, 61, 63, 65, 69,
CCSS.Math.Content.8.SP.A.1 Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities.
Residual Plots EXPLORING BIVARIATE DATA. STUDY GUIDE 1. Read pages 57—64 of the Exploring Bivariate Data packet.
8 th grade Vocabulary Word, Definition, model Unit 6: Linear Models and Patterns of Association.
Survey Statistics and Analysis
A radical view on plots in analysis
Advanced Quantitative Techniques
Advanced Quantitative Techniques
Advanced Quantitative Techniques
Scatter Plots Below is a sample scatter plot, can you tell me what they are designed to show.
STATA User Group September 2007
Presentation, data and programs at:
Standard Statistical analysis Linear-, logistic- and Cox-regression
Regression diagnostics
MANOVA Control of experimentwise error rate (problem of multiple tests). Detection of multivariate vs. univariate differences among groups (multivariate.
P.O.D. #36 Find the volume of each solid.
Chapter 3 Vocabulary Linear Regression.
Bivariate Data credits.
SAS/Graph to help data Dose/Concentration consistency review
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
Presentation transcript:

Mar-16H.S.1 Error check in data Hein Stigum Presentation, data and programs at:

Example data HUMIS –Birth cohort, 5 counties in Norway –N=475 mother-child pairs –Repeated questionnaires Purpose –Outcome:Growth after birth –Exposure:Contaminants in mother’s milk Mar-16H.S.2

Mar-16H.S.3 Agenda Potential problems –String variables, Missing, … Univariate Bivariate Multivariable Individual growth

Mar-16H.S.4 Potential problems

Mar-16H.S.5 String variables encode KJONN if KJONN!=" ", generate(sex3) String to numeric

Mar-16H.S.6 Missing

Mar-16H.S.7 Univariate outliers

Mar-16H.S.8 Commands for previous plot local i=1 foreach var of varlist age1 weight1 fHCB BMI1 mHeight mWeight { graph hbox `var', marker(1, mlabel(id) msymbol(i) mlabpos(0) mlabangle(-90)) /// name(plt`i', replace) local ++i } graph combine plt1 plt2 plt3 plt4 plt5 plt6, col(2)

Mar-16H.S.9 Bivariate outliers

Mar-16H.S.10 Commands for previous plot twoway (scatter mWeight mHeight) /// (scatter mWeight mHeight if BMI1>35 | BMI1<16, mcol(red))/// (qfit mWeight mHeight)/// (qfit mWeight mHeight if mHeight<185)///, legend(off) text( "BMI>35", col(red)) /// ytitle("Mother's weight") xtitle("Mother's height")

Mar-16H.S.11 Multivariable outliers Weight

Mar-16H.S.12 Commands for previous plot gen agesq=age^2 gen ageqb=age^3 regress weight age agesq ageqb if age>=0 & age<1000 capture: drop xb res predict xb, xb/* predicted value */ predict res, res/* residuals */ tw (scatter weight age)(scatter weight age if abs(res)>4000, mcol(red))/// (line xb age, sort lcol(red)) if age>=0 & age<1000, legend(off)

Mar-16H.S.13 Plot of individual growth patterns: weight versus age

Mar-16H.S.14 Weight by age 1

Mar-16H.S.15 Weight by age 2

Mar-16H.S.16 Weight by age 3

Mar-16H.S.17 Weight by age 4

Mar-16H.S.18 Weight by age 5

Mar-16H.S.19 Weight by age 6

Mar-16H.S.20 Weight by age 7

Mar-16H.S.21 Weight by age 8

Mar-16H.S.22 Weight by age 9

Mar-16H.S.23 Weight by age 10

Mar-16H.S.24 Weight by age 11

Mar-16H.S.25 Weight by age 12

Mar-16H.S.26 Weight by age 13

Mar-16H.S.27 Weight by age 14

Mar-16H.S.28 Weight by age 15

Mar-16H.S.29 Weight by age 16

Commands for previous plots * Individual growth patterns. OBS 16 pages of each 30 plots * Repeated measurements, long format, age nested in id sort id age/* sort by id-number and age */ global d=30/* 30 plots per page */ forvalues i=1(1)16 {/* 16 pages*30 plots=480 subjects */ local j=(`i'-1)*$d+1/* plot subjects in id-interval: j<=id<=k */ local k=`i'*$d twoway (line weight age, connect(ascending)) if id>=`j' & id<=`k‘ ///,by(id, compact title("Weight by age, `i'") note("") ) /// ylabel(0(5000)15000) xlabel(0(200)800) graph export “H:\Projects\HUMIS\Weight gain\plt`i'.emf", replace /* Enhanced Metafile Format */ }/* end of loop */ * Make new Photo album in Powerpoint, and add all plots. This will give one plot per page in max size. Mar-16H.S.30

Mar-16H.S.31 After new data merge Plot of individual growth patterns: weight versus age

Mar-16H.S.32

Mar-16H.S.33

Mar-16H.S.34

Mar-16H.S.35

Mar-16H.S.36

Mar-16H.S.37

Mar-16H.S.38

Mar-16H.S.39

Mar-16H.S.40

Mar-16H.S.41

Mar-16H.S.42

Mar-16H.S.43

Mar-16H.S.44

Mar-16H.S.45

Mar-16H.S.46

Mar-16H.S.47

Mar-16H.S.48 Individual plots in large datasets? Scan 1 page (=30 curves) in 5 sec –Hours used=5N/(30*60*60) Scan all –If N=50 000, need 2.3 hours May instead scan curves of subjects with medium to large residuals. –Residual>1000 finds 190 of the 470 children=40% 12 of the 15 deviant growth patterns=80%

Summing up Graph, outliers –Uni:Boxplots –Bi:Scatterplots –Multi:Scatterplots+residuals –Individual growth Merge errors are not rare! Mar-16H.S.49