Hans Baumgartner Penn State University

Slides:



Advertisements
Similar presentations
Introduction to R Graphics
Advertisements

Missing Data Analysis. Complete Data: n=100 Sample means of X and Y Sample variances and covariances of X Y
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
5-7: Scatter Plots & Lines of Best Fit. What is a scatter plot?  A graph in which two sets of data are plotted as ordered pairs  When looking at the.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Matrix A matrix is a rectangular array of elements arranged in rows and columns Dimension of a matrix is r x c  r = c  square matrix  r = 1  (row)
Descriptive Statistics In SAS Exploring Your Data.
Regression Diagnostics - I
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Screening the Data Tedious but essential!.
Week 3 Topic - Descriptive Procedures Program 3 in course notes Cody & Smith (Chapter 2)
How to build graphs, charts and plots. For Categorical data If the data is nominal, then: Few values: Pie Chart Many Values: Pareto Chart (order of bars.
Confirmatory factor analysis
Lecture 8 Chi-Square STAT 3120 Statistical Methods I.
STAT 3130 Statistical Methods II Missing Data and Imputation.
Statistics and Numerical Method Part I: Statistics Week 1I: Data Presentation 1/2555 สมศักดิ์ ศิวดำรงพงศ์
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
6 Mar 2007EMBnet Course – Introduction to Statistics for Biologists Linear Models I Correlation and Regression.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Parallel Processing in SAS CPUCOUNT A comparison of Proc Means for the Project.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
Bread Example: nknw817.sas Y = number of cases of bread sold (sales) Factor A = height of shelf display (bottom, middle, top) Factor B = width of shelf.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
ANOVA: Graphical. Cereal Example: nknw677.sas Y = number of cases of cereal sold (CASES) X = design of the cereal package (PKGDES) r = 4 (there were 4.
Tips & Tricks From your fellow SAS users 9/30/2004.
Multivariate Data Analysis Chapter 2 – Examining Your Data
Scatter Diagram of Bivariate Measurement Data. Bivariate Measurement Data Example of Bivariate Measurement:
2-1 Data Summary and Display Population Mean For a finite population with N measurements, the mean is The sample mean is a reasonable estimate of.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
The general structural equation model with latent variates Hans Baumgartner Penn State University.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Chapter 9 Scatter Plots and Data Analysis LESSON 1 SCATTER PLOTS AND ASSOCIATION.
Scatter Plots. Standard: 8.SP.1 I can construct and interpret scatterplots.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Lecture 3 Topic - Descriptive Procedures
1 Multivariate Normal Distribution 朱永军信息管理学院. 2 Generalized from univariate normal densityGeneralized from univariate normal density Base of many multivariate.
Creating a picture of your data
제 5장 기술통계 및 추론 PROC MEANS 절차 PROC MEANS <options> ;
Lesson 3 Overview Descriptive Procedures Controlling SAS Output
Lecture 2 Topics - Descriptive Procedures
Applied Statistical Analysis
Multiple Imputation.
Advanced Analytics Using Enterprise Miner
Lecture 14 Review of Lecture 13 What we’ll talk about today?
Multiple Linear Regression
CH2. Cleaning and Transforming Data
Lecture 2 Topics - Descriptive Procedures
EM for Inference in MV Data
5-4 The Paired t-Test OPTIONS NOOVP NODATE NONUMBER ls=80;
2-1 Data Summary and Display 2-1 Data Summary and Display.
SAS를 이용한 자료의 탐색 김 호 서울대학교 보건대학원.
Checking the data and assumptions before the final analysis.
Appending and Concatenating Files
Set Axis macro.
EM for Inference in MV Data
Let’s continue to review some of the statistics you’ve learned in your first class: Bivariate analyses (two variables measured at a time on each observation)
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Hans Baumgartner Penn State University
Simulate Multiple Dice
Let’s review some of the statistics you’ve learned in your first class: Univariate analyses (single variable) are done both graphically and numerically.
Lecture 2 Topics - Descriptive Procedures
Presentation transcript:

Hans Baumgartner Penn State University Data Screening Hans Baumgartner Penn State University

Missing data (Thoemmes and Mohan 2015) Data matrix: 𝐷 𝑁𝑥𝐾 = 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 0/1 indicator matrix of missingness: 𝑅 𝑁𝑥𝐾 M-graphs: Fully observed variable Fully unobserved variable Partially observed variable Observed portion of a variable with missing data *

Missing data: MCAR 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅 𝑅⊥ 𝐷 𝑜𝑏𝑠 , 𝐷 𝑚𝑖𝑠 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅 𝑅⊥ 𝐷 𝑜𝑏𝑠 , 𝐷 𝑚𝑖𝑠 Example: X Y Y* Ry 𝜀 𝑦 𝜀 𝑅

Missing data: MAR 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅|𝐷𝑜𝑏𝑠 𝑅⊥ 𝐷 𝑚𝑖𝑠 | 𝐷 𝑜𝑏𝑠 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 =𝑃 𝑅|𝐷𝑜𝑏𝑠 𝑅⊥ 𝐷 𝑚𝑖𝑠 | 𝐷 𝑜𝑏𝑠 Examples: X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 A

Missing data: NMAR or MNAR 𝑃 𝑅 𝐷 =𝑃 𝑅 𝐷 𝑜𝑏𝑠 ,𝐷 𝑚𝑖𝑠 ≠𝑃 𝑅|𝐷𝑜𝑏𝑠 𝑅∼⊥ 𝐷 𝑚𝑖𝑠 | 𝐷 𝑜𝑏𝑠 Examples: X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 X Y Y* Ry 𝜀 𝑦 𝜀 𝑅 L

%include 'd:\m554\programs\jitter %include 'd:\m554\programs\jitter.sas'; TITLE 'Attitude toward using coupons -- data screening'; DATA coupon; INFILE 'd:\m554\DataScreening\cfa.dat'; INPUT id aa1t1 aa2t1 aa3t1 aa4t1 aa1t2 aa2t2 aa3t2 aa4t2; run; DATA coupont1; SET coupon(keep=id aa1t1 aa2t1 aa3t1 aa4t1); %jitter(data=coupont1,out=coupont1,var=aa1t1 aa2t1 aa3t1 aa4t1,new=jaa1t1 jaa2t1 jaa3t1 jaa4t1);

title 'proc univariate for coupon data'; proc univariate plot normal; var aa1t1 aa2t1 aa3t1 aa4t1; histogram aa1t1 aa2t1 aa3t1 aa4t1 / normal (mu=est sigma=est color=red w=2.5 ) midpoints = 1 to 7 by 1; probplot aa1t1 aa2t1 aa3t1 aa4t1 / w=2.5 ); qqplot aa1t1 aa2t1 aa3t1 aa4t1 / run; proc sgscatter data=coupont1; title 'Scatterplot Matrix for coupon data'; matrix aa1t1 aa2t1 aa3t1 aa4t1 / diagonal=(histogram normal) ellipse=(type=predicted);

/* proc sgplot data=coupont1; title 'jittered scatterplot'; scatter x=aa1t1 y=aa2t1 / jitter; ellipse x=aa1t1 y=aa2t1; run; */

%include 'd:\m554\programs\outlier %include 'd:\m554\programs\outlier.sas'; %include 'd:\m554\programs\label.sas'; %include 'd:\m554\programs\cqplot.sas'; %let devtyp=SCREEN; TITLE 'Attitude toward using coupons -- data screening'; DATA coupon; INFILE 'd:\m554\DataScreening\cfa.dat'; INPUT id aa1t1 aa2t1 aa3t1 aa4t1 aa1t2 aa2t2 aa3t2 aa4t2; DATA coupont1; SET coupon(keep=id aa1t1 aa2t1 aa3t1 aa4t1); title 'Multivariate outlier detection - 5 passes'; %outlier(data=coupont1, var=aa1t1 aa2t1 aa3t1 aa4t1, id=id, pvalue=.0002, passes=5); run;