Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2
Descriptive Procedures In SAS
Syntax for Procedures PROC PROCNAME DATA=datasetname ; substatements/ ; The WHERE statement is a useful substatement available to all procedures. PROC PRINT DATA=demo ; VAR marstat ; WHERE state = 'MN';
Data Layout of tomhs.data In Course Notes VariableTypeLenPosInformDescription PTIDChar101$10.Patient ID CLINICChar112$1.Clinical center RANDDATENum614mmddyy10.Randdate SBPBLNum31153.SBP at baseline DATA tomhs; INFILE ‘folderpath\tomhs.data'; ptid clinic randdate sbpbl 3. ; Note: You can give any legal variable name.
DATA weight; INFILE ‘C:\SAS_Files\tomhs.data' ; ptid clinic group sex height weight sbpbl sbp12 3.; bmi = (weight* )/(height*height); sbpchg = sbp12 - sbpbl; if group = 6 then active = 2; else active = 1; RUN; Program 3
PROC PRINT DATA = weight (OBS=5) NOBS; TITLE 'Proc Print: Five observations from the TOMHS Study'; RUN; PROC MEANS DATA = weight; VAR height weight bmi; TITLE 'Proc Means Example 1'; RUN; PROC MEANS DATA = weight MEAN MEDIAN STD MAXDEC=2; VAR height weight bmi; TITLE 'Proc Means Example 2 (specifying options)'; RUN;
Proc Print: Five observations from the TOMHS Study ptid clinic sex height weight bmi C03615 C B00979 B B00644 B D01348 D A01088 A Proc Means Example 1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum height weight bmi
Proc Means Example 2 (specifying options) The MEANS Procedure Variable Mean Median Std Dev height weight bmi
PROC MEANS DATA = weight N MEAN STD MAXDEC=2 ; CLASS clinic; VAR height weight bmi; TITLE 'Proc Means Example 3 (Using a CLASS statement)'; RUN; N clinic Obs Variable N Mean Std Dev A 18 height weight bmi B 29 height weight bmi C 36 height weight bmi D 17 height weight bmi
PROC TTEST DATA = weight ; VAR sbpchg; CLASS active; TITLE 'T-Test Comparing Active and Placebo Groups'; RUN; ****************************************************************************** T-Test Comapring Active and Placebo Groups The TTEST Procedure Statistics Lower CL Upper CL Lower CL Variable active N Mean Mean Mean Std Dev Std Dev sbpchg sbpchg sbpchg Diff (1-2) Variable active Std Dev Std Err sbpchg Diff (1-2) T-Tests Variable Method Variances DF t Value Pr > |t| sbpchg Pooled Equal
PROC UNIVARIATE DATA = weight ; VAR bmi; ID ptid; TITLE 'Proc Univariate Example 1'; RUN; * Note: PROC UNIVARIATE will give you much output ; PROC UNIVARIATE
Proc Univariate Example 1 The UNIVARIATE Procedure Variable: bmi Moments N 100 Sum Weights 100 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Tests for Location: Mu0=0 Test -Statistic p Value Student's t t Pr > |t| <.0001 Sign M 50 Pr >= |M| <.0001 Signed Rank S 2525 Pr >= |S| <.0001
Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest Value ptid Obs Value ptid Obs A B C B B A A C B B
* High resolution graphs can also be produced. The following makes a histogram and normal plot ; ODS GRAPHICS ON; PROC UNIVARIATE DATA = weight; VAR bmi; HISTOGRAM bmi / NORMAL MIDPOINTS=20 to 40 by 2; INSET N = 'N' (5.0) MEAN = 'Mean' (5.1) STD = 'Sdev' (5.1) MIN = 'Min' (5.1) MAX = 'Max' (5.1)/ POS=NW HEADER='Summary Statistics'; LABEL bmi = 'Body Mass Index (kg/m2)'; TITLE 'Histogram of BMI'; PROBPLOT bmi/NORMAL (MU=est SIGMA=est); RUN;
* PROC SGPLOT can do several types of plots - here a boxplot ; PROC SGPLOT; VBOX bmi; * Vertical boxplot; TITLE 'Boxplot of BMI'; RUN;
* Using SGPLOT to make side-by-side boxplots; PROC SGPLOT; TITLE "Boxplot of BMI for Men and Women"; HBOX bmi/CATEGORY=sex; RUN; Later we will see how to format the value 1 and 2 so they display as men and women.
DATA weight; INFILE ‘C:\SAS_Files\tomhs.data' ; ptid clinic age sex height weight cholbl 3.0 ; bmi = (weight* )/(height*height); RUN; Program 4
PROC FREQ DATA=weight; TABLES sex clinic ; TITLE 'Frequency Distribution of Clinical Center and Gender'; RUN; Frequency Distribution of Sex and Clinical Center The FREQ Procedure Cumulative Cumulative sex Frequency Percent Frequency Percent Cumulative Cumulative clinic Frequency Percent Frequency Percent A B C D
*2-Way Frequency Tables ; PROC FREQ DATA=weight; TABLES sex*clinic/CHISQ ; ; TITLE 'Cross Tabulation of Clinical Center and Sex'; RUN;
Cross Tabulation of Clinical Center and Sex The FREQ Procedure Table of sex by clinic sex clinic Frequency| Percent | Row Pct | Col Pct |A |B |C |D | Total | 12 | 20 | 30 | 11 | 73 | | | | | | | | | | | | | | | | 6 | 9 | 6 | 6 | 27 | 6.00 | 9.00 | 6.00 | 6.00 | | | | | | | | | | | Total Percent men in clinic A
Statistics for Table of sex by clinic Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V
USEFUL TABLE OPTIONS CHISQ – performs chi-square analyses for 2-way tables MISSING – includes missing data as a separate category LIST – makes condensed table (useful when looking at 3-way or higher tables)
* Using PROC SGPLOT for bar charts; PROC SGPLOT; VBAR clinic; TITLE "Vertical Bar Chart of Clinical Center"; LABEL clinic = "Clinical Center"; Plot can be imbedded into an HTML document or kept as a separate file. File can be inserted in Office documents.
* DATALABEL puts values on top of bar; PROC SGPLOT; YAXIS LABEL = "Mean Cholesterol" VALUES = (0 to 300 by 50); VBAR clinic/RESPONSE=cholbl STAT=MEAN DATALABEL ; TITLE 'Mean Cholesterol by Clinical Center'; LABEL clinic = "Clinical Center"; RUN;
PROC SGPLOT DATA=weight; YAXIS LABEL = "Body Mass Index (BMI)" ; XAXIS LABEL = "Age (y)" ; REG X=age Y=bmi/clm; WHERE sex = 2; TITLE 'Plot of BMI and Age for Women'; RUN;
Pearson Correlation Coefficients, N = 27 Prob > |r| under H0: Rho=0 bmi age bmi age Correlation Coefficient P-value testing if correlation is significantly different from zero PROC CORR DATA=weight; VAR bmi age; WHERE sex = 2; TITLE 'Correlation of BMI and Age for Women'; RUN ;
ODS GRAPHICS ; PROC REG DATA=weight ; MODEL bmi=age; WHERE sex = 2; TITLE 'Simple Linear Regression'; RUN; Partial Output Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 age Regression equation: bmi = *age *Note: many options for plotting within proc reg. See Ch 5,H,I ODS graphics on will produce many plot by default.
Fit plot from PROC REG
Exercise 3 See exercise 3 in course notes