Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.

Similar presentations


Presentation on theme: "Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School."— Presentation transcript:

1 Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School

2 Test data set (CLINIC) SUBJECT GENDERAGE_GROUPBLOOD_TYPEHRSBPDBP 1M1A8013080 2M1B6812870 3M2O.12072 4M1A4814086 5F2A5616094 6F1B6010964 7F2O8211870 8F2O64.76 9F1A56.88 10F1B88188110 11M1B6412080 12M2B6212076

3 PROC MEANS DATA=data_set_name NOPRINT; Is equivalent to PROC SUMMARY DATA=data_set_name; PROC MEANS vs. PROC SUMMARY

4 Creating a SUMMARY Data Set Containing MEANS PROC MEANS DATA=CLINIC NOPRINT; /**************************************** Equivalent to PROC SUMMARY DATA=CLINIC; *****************************************/ CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP; RUN; Listing of data set OUT1 Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP 1 0 12 66.1818 133.300 80.5000 2 F 1 6 67.6667 143.750 83.6667 3 M 1 6 64.4000 126.333 77.3333

5 Using a BY statement Instead of a CLASS Statement PROC SORT DATA=CLINIC; BY GENDER; RUN; PROC MEANS DATA=CLINIC NOPRINT; BY GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP; RUN; Listing of data set OUT1 Obs GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP 1 F 0 6 67.6667 143.750 83.6667 2 M 0 6 64.4000 126.333 77.3333

6 Creating a SUMMARY Data Set Containing MEANS Broken Down by GENDER and AGE_GROUP PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP; RUN; AGE_ GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP. 0 12 66.1818 133.300 80.5000 1 1 7 66.2857 135.833 82.5714 2 1 5 66.0000 129.500 77.6000 F. 2 6 67.6667 143.750 83.6667 M. 2 6 64.4000 126.333 77.3333 F 1 3 3 68.0000 148.500 87.3333 F 2 3 3 67.3333 139.000 80.0000 M 1 3 4 65.0000 129.500 79.0000 M 2 3 2 62.0000 120.000 74.0000

7 Explaining the _TYPE_ Variable Class VariablesRepresentation GENDERAGE_GROUPBinaryDecimal 00000 01011 10102 11113 CLASS GENDER AGE_GROUP;

8 Demonstrating the NWAY Option PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP; RUN; AGE_ GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000

9 Outputting More than One Statistic PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN =M_HR M_SBP M_DBP N =N_HR N_SBP N_DBP MAX =MAX_HR MAX_SBP MAX_DBP MEDIAN =MED_HR MED_SBP MED_DBP; RUN; GENDER _TYPE_ _FREQ_ M_HR M_SBP M_DBP N_HR N_SBP 0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6 N_DBP MAX_HR MAX_SBP MAX_DBP MED_HR MED_SBP MED_DBP 12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78

10 Partial List of Some Available Statistics KeywordDescription________________________________ MEANMean NNumber of non-missing values NMISSNumber of missing values MINSmallest non-missing value MAX Largest value MEDIANMedian RANGERange - difference between the minimum and maximum values Q125 th percentile Q375 th percentile QRANGEInterquartile range (difference between 25 th and 75 th percentile) STDStandard deviation STDERRStandard error UCLMUpper bound of the 95% confidence interval LCLMLower bound of the 95% confidence interval

11 Demonstrating the AUTONAME OUTPUT option PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN = N = MAX = MEDIAN = / AUTONAME; RUN; GENDER _TYPE_ _FREQ_ HR_Mean SBP_Mean DBP_Mean HR_N SBP_N 0 12 66.1818 133.300 80.5000 11 10 F 1 6 67.6667 143.750 83.6667 6 4 M 1 6 64.4000 126.333 77.3333 5 6 SBP_ DBP_ DBP_N HR_Max SBP_Max DBP_Max HR_Median Median Median 12 88 188 110 64 124 78 6 88 188 110 62 139 82 6 80 140 86 64 124 78

12 Another Way of Naming Output Variables PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=; RUN; Listing of Data Set OUT1 AGE_ GENDER GROUP _TYPE_ _FREQ_ HR SBP DBP F 1 3 3 68.0000 148.5 87.3333 F 2 3 3 67.3333 139.0 80.0000 M 1 3 4 65.0000 129.5 79.0000 M 2 3 2 62.0000 120.0 74.0000

13 Dropping Unneeded Variables in the Output Dataset PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1(DROP= _:) MEAN=M_HR M_SBP M_DBP; RUN; Listing of Data Set OUT1 AGE_ GENDER GROUP M_HR M_SBP M_DBP F 1 68.0000 148.5 87.3333 F 2 67.3333 139.0 80.0000 M 1 65.0000 129.5 79.0000 M 2 62.0000 120.0 74.0000

14 Demonstrating the CHARTYPE Procedure Option PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP; VAR HR SBP DBP; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP; RUN; Demonstrating CHARTYPE Option AGE_ GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP. 00 12 66.1818 133.300 80.5000 1 01 7 66.2857 135.833 82.5714 2 01 5 66.0000 129.500 77.6000 F. 10 6 67.6667 143.750 83.6667 M. 10 6 64.4000 126.333 77.3333 F 1 11 3 68.0000 148.500 87.3333 F 2 11 3 67.3333 139.000 80.0000 M 1 11 4 65.0000 129.500 79.0000 M 2 11 2 62.0000 120.000 74.0000

15 Demonstrating the CHARTYPE Procedure Option PROC PRINT DATA=OUT1 NOOBS; TITLE "Demonstrating CHARTYPE Option"; WHERE _TYPE_ EQ "10"; RUN; Demonstrating CHARTYPE Option AGE_ GENDER GROUP _TYPE_ _FREQ_ M_HR M_SBP M_DBP F. 10 6 67.6667 143.750 83.6667 M. 10 6 64.4000 126.333 77.3333

16 Another Way to Name Variables (instead of using a VAR statement ) PROC MEANS DATA=CLINIC NOPRINT; CLASS GENDER; ***VAR STATEMENT OPTIONAL; OUTPUT OUT=OUT1 MEAN(HR) =M_HR N(HR SBP DBP) =N_HR N_SBP N_DBP MAX(SBP) =MAX_SBP MEDIAN(SBP DBP) =MED_SBP MED_DBP; RUN; GENDER _TYPE_ _FREQ_ M_HR N_HR N_SBP N_DBP MAX_SBP MED_SBP MED_DBP 0 12 66.1818 11 10 12 188 124 78 F 1 6 67.6667 6 4 6 188 139 82 M 1 6 64.4000 5 6 6 140 124 78

17 Multi-way Breakdowns Using a TYPES Statement PROC MEANS DATA=CLINIC NOPRINT CHARTYPE; CLASS GENDER AGE_GROUP BLOOD_TYPE; VAR HR SBP DBP; TYPES GENDER AGE_GROUP*GENDER BLOOD_TYPE*GENDER; OUTPUT OUT=OUT1 MEAN=M_HR M_SBP M_DBP; RUN; AGE_ BLOOD_ GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP F. 100 6 67.6667 143.750 83.6667 M. 100 6 64.4000 126.333 77.3333 F. A 101 2 56.0000 160.000 91.0000 F. B 101 2 74.0000 148.500 87.0000 F. O 101 2 73.0000 118.000 73.0000 M. A 101 2 64.0000 135.000 83.0000 M. B 101 3 64.6667 122.667 75.3333 M. O 101 1. 120.000 72.0000 F 1 110 3 68.0000 148.500 87.3333 F 2 110 3 67.3333 139.000 80.0000 M 1 110 4 65.0000 129.500 79.0000 M 2 110 2 62.0000 120.000 74.0000

18 Using the _TYPE_ Values to Create Multiple Data Sets DATA GENDER AGE_BY_GENDER BLOOD_BY_GENDER; SET OUT1; IF _TYPE_ = "100" THEN OUTPUT GENDER; ELSE IF _TYPE_ = "110" THEN OUTPUT AGE_BY_GENDER; RUN; Listing of Data Set GENDER AGE_ BLOOD_ GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP F. 100 6 67.6667 143.750 83.6667 M. 100 6 64.4000 126.333 77.3333 Listing of Data Set AGE_BY_GENDER AGE_ BLOOD_ GENDER GROUP TYPE _TYPE_ _FREQ_ M_HR M_SBP M_DBP F 1 110 3 68.0000 148.5 87.3333 F 2 110 3 67.3333 139.0 80.0000 M 1 110 4 65.0000 129.5 79.0000 M 2 110 2 62.0000 120.0 74.0000

19 Examples of TYPES Statements TYPES A A*C D*C; TYPES A*(B C D); TYPES () A A*C*D;

20 Using PROC FREQ to Count Frequencies PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER; RUN; Listing of Data Set NUMBER AGE_ GROUP COUNT PERCENT 1 7 58.3333 2 5 41.6667

21 Renaming the COUNT Variable PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=NUMBER(RENAME=(COUNT=N_AGE) DROP=PERCENT); RUN; Listing of Data Set NUMBER AGE_ GROUP N_AGE 1 7 2 5

22 Using PROC MEANS to Count Frequencies PROC MEANS DATA=CLINIC NOPRINT NWAY; CLASS AGE_GROUP; VAR HR; /* ANY NUMERIC VARIABLE */ OUTPUT OUT=COUNTS(RENAME=(_FREQ_ = N_AGE) DROP=_TYPE_ DUMMY) N=DUMMY; RUN; Listing of Data Set COUNTS AGE_ GROUP N_AGE 1 7 2 5

23 Using PROC FREQ to Count Frequencies in a Two-way Table PROC FREQ DATA=CLINIC NOPRINT; TABLES GENDER*BLOOD_TYPE / OUT=FREQOUT(DROP=PERCENT RENAME=(COUNT=NUMBER)); RUN; Listing of Data Set FREQOUT BLOOD_ GENDER TYPE NUMBER F A 2 F B 2 F O 2 M A 2 M B 3 M O 1

24 Using PROC FREQ to Output More than One Data Set PROC FREQ DATA=CLINIC NOPRINT; TABLES AGE_GROUP / OUT=OUT1; TABLES GENDER / OUT=OUT2; TABLES GENDER*AGE_GROUP / OUT=OUT3; RUN; Listing of Data Set OUT1 AGE_GROUP COUNT PERCENT 1 7 58.3333 2 5 41.6667 ---------------------------------------------------------------- Listing of Data Set OUT2 GENDER COUNT PERCENT F 6 50 M 6 50 ---------------------------------------------------------------- Listing of Data Set OUT3 GENDER AGE_GROUP COUNT PERCENT F 1 3 25.0000 F 2 3 25.0000 M 1 4 33.3333 M 2 2 16.6667


Download ppt "Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School."

Similar presentations


Ads by Google