Download presentation
Presentation is loading. Please wait.
Published byRalph Brown Modified over 9 years ago
1
Mar-16H.S.1 Error check in data Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/
2
Example data HUMIS –Birth cohort, 5 counties in Norway –N=475 mother-child pairs –Repeated questionnaires Purpose –Outcome:Growth after birth –Exposure:Contaminants in mother’s milk Mar-16H.S.2
3
Mar-16H.S.3 Agenda Potential problems –String variables, Missing, … Univariate Bivariate Multivariable Individual growth
4
Mar-16H.S.4 Potential problems
5
Mar-16H.S.5 String variables encode KJONN if KJONN!=" ", generate(sex3) String to numeric
6
Mar-16H.S.6 Missing
7
Mar-16H.S.7 Univariate outliers
8
Mar-16H.S.8 Commands for previous plot local i=1 foreach var of varlist age1 weight1 fHCB BMI1 mHeight mWeight { graph hbox `var', marker(1, mlabel(id) msymbol(i) mlabpos(0) mlabangle(-90)) /// name(plt`i', replace) local ++i } graph combine plt1 plt2 plt3 plt4 plt5 plt6, col(2)
9
Mar-16H.S.9 Bivariate outliers
10
Mar-16H.S.10 Commands for previous plot twoway (scatter mWeight mHeight) /// (scatter mWeight mHeight if BMI1>35 | BMI1<16, mcol(red))/// (qfit mWeight mHeight)/// (qfit mWeight mHeight if mHeight<185)///, legend(off) text(110 195 "BMI>35", col(red)) /// ytitle("Mother's weight") xtitle("Mother's height")
11
Mar-16H.S.11 Multivariable outliers Weight
12
Mar-16H.S.12 Commands for previous plot gen agesq=age^2 gen ageqb=age^3 regress weight age agesq ageqb if age>=0 & age<1000 capture: drop xb res predict xb, xb/* predicted value */ predict res, res/* residuals */ tw (scatter weight age)(scatter weight age if abs(res)>4000, mcol(red))/// (line xb age, sort lcol(red)) if age>=0 & age<1000, legend(off)
13
Mar-16H.S.13 Plot of individual growth patterns: weight versus age
14
Mar-16H.S.14 Weight by age 1
15
Mar-16H.S.15 Weight by age 2
16
Mar-16H.S.16 Weight by age 3
17
Mar-16H.S.17 Weight by age 4
18
Mar-16H.S.18 Weight by age 5
19
Mar-16H.S.19 Weight by age 6
20
Mar-16H.S.20 Weight by age 7
21
Mar-16H.S.21 Weight by age 8
22
Mar-16H.S.22 Weight by age 9
23
Mar-16H.S.23 Weight by age 10
24
Mar-16H.S.24 Weight by age 11
25
Mar-16H.S.25 Weight by age 12
26
Mar-16H.S.26 Weight by age 13
27
Mar-16H.S.27 Weight by age 14
28
Mar-16H.S.28 Weight by age 15
29
Mar-16H.S.29 Weight by age 16
30
Commands for previous plots * Individual growth patterns. OBS 16 pages of each 30 plots * Repeated measurements, long format, age nested in id sort id age/* sort by id-number and age */ global d=30/* 30 plots per page */ forvalues i=1(1)16 {/* 16 pages*30 plots=480 subjects */ local j=(`i'-1)*$d+1/* plot subjects in id-interval: j<=id<=k */ local k=`i'*$d twoway (line weight age, connect(ascending)) if id>=`j' & id<=`k‘ ///,by(id, compact title("Weight by age, `i'") note("") ) /// ylabel(0(5000)15000) xlabel(0(200)800) graph export “H:\Projects\HUMIS\Weight gain\plt`i'.emf", replace /* Enhanced Metafile Format */ }/* end of loop */ * Make new Photo album in Powerpoint, and add all plots. This will give one plot per page in max size. Mar-16H.S.30
31
Mar-16H.S.31 After new data merge Plot of individual growth patterns: weight versus age
32
Mar-16H.S.32
33
Mar-16H.S.33
34
Mar-16H.S.34
35
Mar-16H.S.35
36
Mar-16H.S.36
37
Mar-16H.S.37
38
Mar-16H.S.38
39
Mar-16H.S.39
40
Mar-16H.S.40
41
Mar-16H.S.41
42
Mar-16H.S.42
43
Mar-16H.S.43
44
Mar-16H.S.44
45
Mar-16H.S.45
46
Mar-16H.S.46
47
Mar-16H.S.47
48
Mar-16H.S.48 Individual plots in large datasets? Scan 1 page (=30 curves) in 5 sec –Hours used=5N/(30*60*60) Scan all –If N=50 000, need 2.3 hours May instead scan curves of subjects with medium to large residuals. –Residual>1000 finds 190 of the 470 children=40% 12 of the 15 deviant growth patterns=80%
49
Summing up Graph, outliers –Uni:Boxplots –Bi:Scatterplots –Multi:Scatterplots+residuals –Individual growth Merge errors are not rare! Mar-16H.S.49
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.