Download presentation
Presentation is loading. Please wait.
1
Hans Baumgartner Penn State University
Data Screening Hans Baumgartner Penn State University
2
Missing data (Thoemmes and Mohan 2015)
Data matrix: 𝐷 𝑁𝑥𝐾 = 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 0/1 indicator matrix of missingness: 𝑅 𝑁𝑥𝐾 M-graphs: Fully observed variable Fully unobserved variable Partially observed variable Observed portion of a variable with missing data *
3
Missing data: MCAR 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅 𝑅⊥ 𝐷 𝑜𝑏𝑠 , 𝐷 𝑚𝑖𝑠
𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅 𝑅⊥ 𝐷 𝑜𝑏𝑠 , 𝐷 𝑚𝑖𝑠 Example: X Y Y* Ry 𝜀 𝑦 𝜀 𝑅
4
Missing data: MAR 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅|𝐷𝑜𝑏𝑠 𝑅⊥ 𝐷 𝑚𝑖𝑠 | 𝐷 𝑜𝑏𝑠
𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅|𝐷𝑜𝑏𝑠 𝑅⊥ 𝐷 𝑚𝑖𝑠 | 𝐷 𝑜𝑏𝑠 Examples: X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 A
5
Missing data: NMAR or MNAR
𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 ≠𝑃 𝑅|𝐷𝑜𝑏𝑠 𝑅∼⊥ 𝐷 𝑚𝑖𝑠 | 𝐷 𝑜𝑏𝑠 Examples: X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 L
6
%include 'd:\m554\programs\jitter
%include 'd:\m554\programs\jitter.sas'; TITLE 'Attitude toward using coupons -- data screening'; DATA coupon; INFILE 'd:\m554\DataScreening\cfa.dat'; INPUT id aa1t1 aa2t1 aa3t1 aa4t1 aa1t2 aa2t2 aa3t2 aa4t2; run; DATA coupont1; SET coupon(keep=id aa1t1 aa2t1 aa3t1 aa4t1); %jitter(data=coupont1,out=coupont1,var=aa1t1 aa2t1 aa3t1 aa4t1,new=jaa1t1 jaa2t1 jaa3t1 jaa4t1);
7
title 'proc univariate for coupon data'; proc univariate plot normal; var aa1t1 aa2t1 aa3t1 aa4t1; histogram aa1t1 aa2t1 aa3t1 aa4t1 / normal (mu=est sigma=est color=red w=2.5 ) midpoints = 1 to 7 by 1; probplot aa1t1 aa2t1 aa3t1 aa4t1 / w=2.5 ); qqplot aa1t1 aa2t1 aa3t1 aa4t1 / run; proc sgscatter data=coupont1; title 'Scatterplot Matrix for coupon data'; matrix aa1t1 aa2t1 aa3t1 aa4t1 / diagonal=(histogram normal) ellipse=(type=predicted);
8
/* proc sgplot data=coupont1; title 'jittered scatterplot'; scatter x=aa1t1 y=aa2t1 / jitter; ellipse x=aa1t1 y=aa2t1; run; */
9
%include 'd:\m554\programs\outlier
%include 'd:\m554\programs\outlier.sas'; %include 'd:\m554\programs\label.sas'; %include 'd:\m554\programs\cqplot.sas'; %let devtyp=SCREEN; TITLE 'Attitude toward using coupons -- data screening'; DATA coupon; INFILE 'd:\m554\DataScreening\cfa.dat'; INPUT id aa1t1 aa2t1 aa3t1 aa4t1 aa1t2 aa2t2 aa3t2 aa4t2; DATA coupont1; SET coupon(keep=id aa1t1 aa2t1 aa3t1 aa4t1); title 'Multivariate outlier detection - 5 passes'; %outlier(data=coupont1, var=aa1t1 aa2t1 aa3t1 aa4t1, id=id, pvalue=.0002, passes=5); run;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.