Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis Using SAS

Similar presentations


Presentation on theme: "Data Analysis Using SAS"— Presentation transcript:

1 Data Analysis Using SAS
SAS Workshop Data Analysis Using SAS Hun Myoung Park, Ph.D. University Information Technology Services Center for Statistical and Mathematical Computing Sunday, September 16, 2018 © The Trustees of Indiana University (812)

2 Outline Descriptive Statistics Chi-Square Test Measure of Association
Data Analysis Using SAS September 16, 2018 Outline Descriptive Statistics Chi-Square Test Measure of Association T-TEST Analysis of Variance Correlation Analysis Ordinary Least Squares (OLS) Binary Logit and Probit Models Panel Data Models University Information Technology Services Center for Statistical and Mathematical Computing

3 OUTPUT DELIVERY SYSTEM
Data Analysis Using SAS September 16, 2018 OUTPUT DELIVERY SYSTEM ODS controls SAS output (format, styles, etc.) HTML format is very useful nowadays especially for data conversion and graphics ODS HTML FILE=‘c:\temp\test.html’; PROC …; …; ODS HTML CLOSE; University Information Technology Services Center for Statistical and Mathematical Computing

4 DESCRIPTIVE STATISTICS
Data Analysis Using SAS September 16, 2018 DESCRIPTIVE STATISTICS You MUST describe and examine data sets of interest carefully before conducting analyses. PROC REPORT PROC SUMMARY PROC UNIVARIATE PROC MEAN PROC FREQ PROC TABULATE PROC PLOT PROC CHART University Information Technology Services Center for Statistical and Mathematical Computing

5 Data Analysis Using SAS
September 16, 2018 PROC REPORT Provide contents and summary statistics of data sets in many flexible ways. PROC REPORT DATA=sm.airline NOWD HEADLINE HEADSKIP; COLUMN airline year cost output fuel; DEFINE airline / ORDER; DEFINE year / ORDER; DEFINE cost / ANALYSIS MEAN; DEFINE output / ANALYSIS MEAN; DEFINE fuel / ANALYSIS MEAN; BREAK AFTER airline/ OL SUMMARIZE SKIP; RBREAK AFTER / DOL SUMMARIZE; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

6 PROC SUMMARY Provides descriptive statistics of variables.
Data Analysis Using SAS September 16, 2018 PROC SUMMARY Provides descriptive statistics of variables. PROC SUMMARY DATA=sm.cancer PRINT; VAR cigar bladder lung kidney; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

7 Data Analysis Using SAS
September 16, 2018 PROC MEANS Like PROC SUMMARY, this procedure provides various descriptive statistics. PROC MEANS DATA=sm.grade7; VAR stat math; PROC MEANS DATA=sm.grade7 N SUM MEAN VAR; Conduct one sample t-test PROC MEANS DATA=sm.grade7 T STD STDERR; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

8 PROC UNIVARIATE Provides various descriptive statistics
Data Analysis Using SAS September 16, 2018 PROC UNIVARIATE Provides various descriptive statistics PROC UNIVARIATE DATA=sm.airline; VAR cost output; RUN; Conducts normality test and one sample t-test PROC UNIVARIATE DATA=sm.airline NORMAL PLOT; VAR cost; University Information Technology Services Center for Statistical and Mathematical Computing

9 PROC UNIVARIATE (Q-Q) Provides Q-Q Plots
Data Analysis Using SAS September 16, 2018 PROC UNIVARIATE (Q-Q) Provides Q-Q Plots PROC UNIVARIATE DATA=sm.airline; VAR cost; QQPLOT cost /NORMAL; RUN; PROC CAPABILITY provides P-P Plot as well PROC CAPABILITY DATA=sm.airline NORMAL; PPPLOT cost /NORMAL; University Information Technology Services Center for Statistical and Mathematical Computing

10 PROC FREQ Produces frequency tables of variables listed.
Data Analysis Using SAS September 16, 2018 PROC FREQ Produces frequency tables of variables listed. PROC FREQ DATA=sm.airline; TABLES airline year; Produces contingency tables or cross-tables using * between variables. PROC FREQ DATA=sm.cancer; TABLES area*smoke / NOROW; RUN; NOROW, NOCOL, and NOPERCENT do not display row, column, total percents from each cell, respectively. University Information Technology Services Center for Statistical and Mathematical Computing

11 Data Analysis Using SAS
September 16, 2018 PROC TABULATE PROC TABULATE produces various statistics in a table form. TABULATE can control formats and table forms in a sophisticated way. Useful when summarizing and examining data sets. PROC TABULATE DATA=sm.airline F=12.3; CLASS airline; VAR cost; TABLE airline,cost*(N MEAN STD)*(F=9.2); LABEL cost='Cost of Ariline'; University Information Technology Services Center for Statistical and Mathematical Computing

12 PROC PLOT Produces a plot of two variables PROC PLOT DATA=sm.cancer;
Data Analysis Using SAS September 16, 2018 PROC PLOT Produces a plot of two variables PROC PLOT DATA=sm.cancer; PLOT lung*cigar; RUN; PLOT lung*cigar=‘%’ kidney*cigar=“*” / OVERLAY; University Information Technology Services Center for Statistical and Mathematical Computing

13 Data Analysis Using SAS
September 16, 2018 PROC CHART Produces various vertical and horizontal charts with many options. PROC CHART DATA=sm.cancer; HBAR cigar /TYPE=PERCENT; VBAR lung / GROUP = smoke TYPE=MEAN; BLOCK area / GROUP = smoke TYPE=MEAN SUMVAR=lung NOHEADER SYMBOL='X'; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

14 Data Analysis Using SAS
September 16, 2018 CHI-SQUARE TEST 1 Chi-square test examines if two variables are independent. PROC FREQ conducts chi-square test with the /CHISQ option. PROC FREQ DATA=sm.cancer; TABLES area*smoke /CHISQ; The expected frequency of each cell should be greater than 5; otherwise, chi-square test is not reliable. TABLES area*smoke /CHISQ EXPECTED; University Information Technology Services Center for Statistical and Mathematical Computing

15 Data Analysis Using SAS
September 16, 2018 CHI-SQUARE TEST 2 Measure of association tells the strength of relationship. MEASURES is needed. PROC FREQ DATA=sm.cancer; TABLES area*smoke /CHISQ MEASURES; RUN; Both variables are ordinal, read gamma (-1~1) Otherwise (at least one variable is nominal), read lambda (0~1). University Information Technology Services Center for Statistical and Mathematical Computing

16 T-TEST 1 T-test compares group means
Data Analysis Using SAS September 16, 2018 T-TEST 1 T-test compares group means University Information Technology Services Center for Statistical and Mathematical Computing

17 Data Analysis Using SAS
September 16, 2018 T-TEST 2 One sample t-test examines if the means of a variables is 0 or a constant hypothesized. Use PROC TTEST, UNIVARIATE, and MEANS. TITLE2 'One Sample T-Test'; PROC TTEST H0=20 ALPHA=.01 DATA=sm.cancer; VAR lung; PROC UNIVARIATE MU0=20 VARDEF=DF NORMAL ALPHA=.01 DATA=sm.cancer; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

18 T-TEST 3 PAIRED statement for paired t-tests
Data Analysis Using SAS September 16, 2018 T-TEST 3 PAIRED statement for paired t-tests Data should be arranged in the wide format. PROC TTEST DATA=sm.cancer; PAIRED lung*kidney; RUN; Use operators (* and :) PROC TTEST H0=3 DATA=sm.cancer; PAIRED (lung)*(kidney bladder); University Information Technology Services Center for Statistical and Mathematical Computing

19 T-TEST 4 Independent sample t-test. Data arranged in the long form.
Data Analysis Using SAS September 16, 2018 T-TEST 4 Independent sample t-test. PROC TTEST H0=0 ALPHA=.05 DATA=sm.cancer; CLASS smoke; VAR lung; Data arranged in the long form. Check F-test for equal variance. Read Pooled T in case of equal variance. PROC TTEST COCHRAN DATA=sm.cancer; CLASS west; VAR kidney; University Information Technology Services Center for Statistical and Mathematical Computing

20 ANALYSIS OF VARIANCE 1 Use PROC ANOVA, GLM, and MIXED
Data Analysis Using SAS September 16, 2018 ANALYSIS OF VARIANCE 1 Use PROC ANOVA, GLM, and MIXED PROC ANOVA DATA=sm.cancer; CLASS smoke; MODEL lung=smoke; PROC GLM DATA=sm.cancer; PROC MIXED DATA=sm.cancer; ; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

21 Data Analysis Using SAS
September 16, 2018 ANALYSIS OF VARIANCE 2 PROC ANOVA can handle balanced data while GLM and MIXED can handle balanced and unbalanced data. GLM and MIXED are generally recommended for complex models. PROC GLM DATA=sm.cancer; CLASS smoke area; MODEL lung=smoke area /SS3; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

22 Data Analysis Using SAS
September 16, 2018 CORRELATION ANALYSIS Karl Pearson correlation coefficients for interval variables PROC CORR DATA=sm.airline PEARSON COV; VAR cost output fuel load; RUN; For ordinal variables, add SPEARMAN and/or KENDALL options to CORR statement instead of PEARSON. PROC CORR DATA=sm.airline SPEARMAN KENDALL; University Information Technology Services Center for Statistical and Mathematical Computing

23 ORDINARY LEAST SQUARES 1
Data Analysis Using SAS September 16, 2018 ORDINARY LEAST SQUARES 1 Classical linear regression model or ordinary least squares (OLS). Has many strong assumptions such as linearity, constant variance, and independent variables that are not related to errors. Use PROC REG with the MODEL statement. PROC REG DATA=sm.airline; MODEL cost = output fuel load; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

24 ORDINARY LEAST SQUARES 2
Data Analysis Using SAS September 16, 2018 ORDINARY LEAST SQUARES 2 Imposing restrictions PROC REG DATA=sm.airline; MODEL cost = output fuel load /NOINT; MODEL cost = output fuel load; RESTRICT load=1; Hypothesis Test (Wald Test) TEST output=0; RUN; University Information Technology Services Center for Statistical and Mathematical Computing

25 ORDINARY LEAST SQUARES 3
Data Analysis Using SAS September 16, 2018 ORDINARY LEAST SQUARES 3 Get residuals, DW for AR(1) PROC REG DATA=sm.airline; MODEL cost = output fuel load /R DW; RUN; Check multicollinearity. MODEL cost = output fuel load /COLLIN VIF TOL; Serious multicollinearity if tolerance level < (1-R2) or .1, VIF> 10; Eigenvalue <.01, Condition index <50, or proportion of variation > .8 University Information Technology Services Center for Statistical and Mathematical Computing

26 ORDINARY LEAST SQUARES 4
Data Analysis Using SAS September 16, 2018 ORDINARY LEAST SQUARES 4 OLS has strong assumptions that are easily violated in the real world. PROC NLIN for nonlinear models PROC SYSLIN for equation systems with errors correlated PROC AUTOREG and ARIMA for autocorrelation PROC LOGISTIC and QLIM for categorical dependent variables University Information Technology Services Center for Statistical and Mathematical Computing

27 Data Analysis Using SAS
September 16, 2018 LOGIT/PROBIT MODELS 1 Use PROC LOGISTIC, PROBIT, QLIM, GENMOD to fit Logit and Probit Models PROC LOGISTIC DESCENDING DATA = sm.trust; MODEL trust = educate income age male; PROC PROBIT DATA = sm.trust; MODEL trust = educate income age male /DIST=LOGISTIC; PROC QLIM DATA=sm.trust; MODEL trust = educate income age male /DISCRETE(DIST=LOGIT); PROC GENMOD DATA = sm.trust DESC; MODEL trust = educate income age male /DIST=BINOMIAL LINK=LOGIT; RUN; LOGISTIC produces opposite sigens University Information Technology Services Center for Statistical and Mathematical Computing

28 LOGIT/PROBIT MODELS 2 Compute odd ratios using the UNITS statement
Data Analysis Using SAS September 16, 2018 LOGIT/PROBIT MODELS 2 Compute odd ratios using the UNITS statement PROC LOGISTIC DATA = sm.trust; MODEL trust(EVENT='1') = educate income age male; UNITS educate=SD income=SD age=SD; RUN; For a unit increase in x, the odds of having 1 are expected to change by a factor of odd ratios =exp(b_hat*sd). Marginal effects need computation. University Information Technology Services Center for Statistical and Mathematical Computing

29 LOGIT/PROBIT MODELS 3 Estimate Probit models
Data Analysis Using SAS September 16, 2018 LOGIT/PROBIT MODELS 3 Estimate Probit models PROC PROBIT DATA = sm.trust; MODEL trust = educate income age male; PROC LOGISTIC DATA = sm.trust DESC; MODEL trust = educate income age male /LINK=PROBIT; PROC QLIM DATA=sm.trust; MODEL trust = educate income age male /DISCRETE (DIST=NORMAL); RUN; Logistic and QLIM are recommended University Information Technology Services Center for Statistical and Mathematical Computing

30 Data Analysis Using SAS
September 16, 2018 PANEL DATA MODELS 1 Fixed effect model assumes different intercepts among groups or periods. Fixed effect model is in fact a dummy variable least squares model. Random effect model assumes different variances among groups or periods. In SAS, PROC PANEL and TSCSREG fit fixed and random effect models. PROC PANEL is preferred. University Information Technology Services Center for Statistical and Mathematical Computing

31 PANEL DATA MODELS 2 A dummy variable least squares model.
Data Analysis Using SAS September 16, 2018 PANEL DATA MODELS 2 A dummy variable least squares model. PROC REG DATA=sm.airline; MODEL cost = g1-g5 output fuel load; RUN; Fixed effect model using PROC PANEL that fits the adjusted within effect model PROC PANEL DATA=masil.airline; ID airline year; MODEL cost = output fuel load /FIXONE; University Information Technology Services Center for Statistical and Mathematical Computing

32 PANEL DATA MODELS 3 Random effect model using PROC PANEL and TSCSREG
Data Analysis Using SAS September 16, 2018 PANEL DATA MODELS 3 Random effect model using PROC PANEL and TSCSREG PROC PANEL DATA=sm.airline; ID airline year; MODEL cost = output fuel load /RANONE; RUN; PROC TSCSREG DATA=sm.airline; University Information Technology Services Center for Statistical and Mathematical Computing

33 Data Analysis Using SAS
September 16, 2018 FACTOR ANALYSIS 1 Extract a small number of factors (latent variables) out of many manifest variables (observed variables). PROC FACTOR DATA=sm.survey; VAR q1-q20; RUN; Rotation methods (e.g., VARIMAX, PARSIMAX, EQUAMAX, and PROMAX) in ROTATE= or R= and the number of factors in NFACTORS= or N=. PROC FACTOR DATA=sm.survey ROTATE=VARIMAX NFACTORS=3; University Information Technology Services Center for Statistical and Mathematical Computing

34 Data Analysis Using SAS
September 16, 2018 FACTOR ANALYSIS 2 Method of extracting factors such as principal component analysis-default method, maximum likelihood (ML), and principal factor analysis (PRINIT). PROC FACTOR DATA=sm.survey METHOD=ML R=PROMAX N=3; VAR q1-q20; Store factor scores using OUT=. Variables Factor1, Factor2, Factor3, … are created in the data set. PROC FACTOR DATA=sm.survey M=ML R=VARIMAX N=3 OUT=sm.surveyScore; University Information Technology Services Center for Statistical and Mathematical Computing

35 Data Analysis Using SAS
September 16, 2018 RELIABILITY TEST PROC CORR produces Chronbach‘s coefficient alpha statistic with ALPHA. PROC CORR DATA=sm.survey ALPHA NOMISS; VAR q1-q20; RUN; NOMISS excludes observations with missing values. Alpha larger than .8 indicates high reliability of measurement. Section labeled as “Cronbach Coefficient Alpha with Deleted Variable” lists alpha if the variable is removed. University Information Technology Services Center for Statistical and Mathematical Computing

36 REFERENCES University Information Technology Services
Data Analysis Using SAS September 16, 2018 REFERENCES Long, J. Scott Regression Models for Categorical and Limited Dependent Variables. Sage. Muller, Keith E., and Bethel A. Fetterman Regression and ANOVA: An Integrated Approach Using SAS Software. Cary, NC: SAS Institute. Stokes, Maura E., Charles S. Davis, and Gary G. Koch Categorical Data Analysis Using the SAS System, 2nd ed.. Cary, NC: SAS Institute. Walker, Glenn A Common Statistical Methods for Clinical Research with SAS Examples. Cary, NC: SAS Institute. University Information Technology Services Center for Statistical and Mathematical Computing


Download ppt "Data Analysis Using SAS"

Similar presentations


Ads by Google