Download presentation
Presentation is loading. Please wait.
Published byDomenic Green Modified over 9 years ago
1
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1
2
Chapter 10: ANALYZING COUNTS AND TABLES SAS ESSENTIALS -- Elliott & Woodward2
3
LEARNING OBJECTIVES SAS ESSENTIALS -- Elliott & Woodward3 To be able to use PROC FREQ to create one-way frequency tables To be able to use PROC FREQ to create two-way (cross- tabulation) tables To be able to use two-by-two contingency tables to calculate relative risk measures To be able to use Cohen's kappa to calculate inter-rater reliability
4
10.1 USING PROC FREQ SAS ESSENTIALS -- Elliott & Woodward4 PROC FREQ is a multipurpose SAS procedure for analyzing count data. It can be used to obtain frequency counts for one or more individual variables or to create two-way tables (cross-tabulations) from two variables. A simplified syntax is PROC FREQ ; TABLES requests ;
5
SAS ESSENTIALS -- Elliott & Woodward5 Table 10.1. Common Options for PROC FREQ OptionMeaning DATA=dataname Specify which data set to use ORDER=option Specifies the order in which results are listed in the output table Options are DATA, FORMATTED, FREQ and ORDER. This is illustrated in an upcoming example. PAGE Specifies that only one table will appear per page (not applicable to HTML output.) ALPHA=n Sets the level for confidence limits (default 0.05) COMPRESS Begins the next table on the same page when possible (not applicable to HTML output.) NOPRINT Used when you want to capture output but not display tables.
6
SAS ESSENTIALS -- Elliott & Woodward6 Table 10.2 Common Statements for PROC FREQ OptionMeaning EXACT Produces exact p-values for tests. Fisher’s Exact Test automatically calculated for a 2x2 table. OUTPUT= dataname Creates an output data set containing statistics from an analysis WEIGHT variable Identifies a weight variable that contains summarized counts TABLES ; Specifies which tables will be displayed. More information about this statement is given below. TEST Specifies which statistical tests will be performed (Requires a TABLES statement) BY, FORMAT, LABEL, WHERE These statements are common to most procedures, and may be used here.
7
The TABLES Statement SAS ESSENTIALS -- Elliott & Woodward7 The TABLES statement is required for all of the examples in this chapter. Its format is: TABLES ; where variable-combinations specifies frequency or cross-tabulation tables. Options for the TABLE statement follow a slash (/). For example, TABLES A*B / CHISQ; requests that the chi-square and related statistics will be reported for the cross-tabulation A*B.
8
More about TABLES SAS ESSENTIALS -- Elliott & Woodward8 To obtain counts of the number of subjects observed in each category of group (GP), use the following: PROC FREQ; TABLES GP; RUN; To produce a cross-tabulation of GENDER by treatment GP: PROC FREQ; TABLES GENDER*GP;RUN; The variables specified in the TABLES statement can be either categorical/character or numeric. To request chi-square statistics for a table, include the option /CHISQ at the end of the TABLES statement. For example, PROC FREQ; TABLES GENDER*GP/CHISQ;
9
SAS ESSENTIALS -- Elliott & Woodward9 Table 10.3. Sample TABLES statements Table SpecificationDescription TABLES A;Specifies frequencies for a single variable TABLES A*B;Specifies a crosstabulation between two variables TABLES A*B B*C X*Y; Also, TABLES A*(B C D); is the same as TABLES A*B A*C A*D; Specify several cross-tabulation tables TABLES (A -- C)*X; is the same as TABLES A*X B*X C*X; Use a range of variables in a TABLES statement
10
SAS ESSENTIALS -- Elliott & Woodward10 TABLE 10.4 Options for the TABLE Statement OptionDescription AGREERequest Kappa statistic (inter-rater reliability). RELRISKRequests relative risk calculations FISHERRequests Fisher’s Exact test for tables greater than 2 x 2 SPARSERequests all possible combinations of levels MISSINGRequests missing values treated as nonmissing CELLCHI2Displays the contribution to chi-square NOCOLSuppresses column percentages for each cell NOCUMSuppresses cumulative frequencies NOFREQSuppresses frequency count for each cell NOPERCENTSuppresses row percentage and column percentage in cross- tabulation tables NOPRINTSuppresses tables but displays statistics NOROWSuppresses row percentage for each cell TESTP=(list)Specifies a test based on percentages (Goodness-of-Fit test.)
11
10.2 ANALYZING ONE-WAY FREQUENCY TABLES SAS ESSENTIALS -- Elliott & Woodward11 When count data are collected, you can use PROC FREQ to produce tables of the counts by category as well as to perform statistical analyses on the counts. This section describes how to create tables of counts by category and how to perform a goodness-of-fit test. Do Hands On Example p 246 (AFREQ1.SAS) (Frequencies)
12
ORDER= Option SAS ESSENTIALS -- Elliott & Woodward12 The ORDER=FORMATTED option for PROC FREQ specifies the order in which the categories are displayed in the table. You must first create a custom format in a PROC FORMAT command to define the order that you want to be used in your output table. For Example: PROC FORMAT; VALUE $FMTRACE "AA"="African American" "H"="Hispanic" "OTH"="Other " "C"="White"; RUN ;
13
Apply Your Created Format SAS ESSENTIALS -- Elliott & Woodward13 To cause PROC FREQ to display categories in the Formatted order, apply your created FORMAT: PROC FREQ ORDER= FORMATTED DATA=" C: \SASDATA \SURVEY"; TABLES RACE; FORMAT RACE $FMTRACE.; RUN; Do Hands On Exercise p 248. (AFREQ2.SAS) The ORDER= option species the order in which the categories are displayed. In this case, they are displayed in FORMATTED order. You must also apply the format to the variable for it to be correctly used.
14
10.3 CREATING ONE-WAY FREQUENCY TABLFS FROM SUMMARIZED DATA SAS ESSENTIALS -- Elliott & Woodward14 The following example illustrates how to summarize counts from a data set into a frequency table Suppose your data is in this summarized form: CENTS 152 CENTS 100 NICKELS 49 DIMES 59 QUARTERS 21 HALF 44 DOLLARS 21 Do the Hands On Exercise p 250 (AFREQ3.SAS) This means there are 49 nickels
15
Testing Goodness of Fit in a One-Way Table SAS ESSENTIALS -- Elliott & Woodward15 A goodness-of-fit test of a single population is a test to determine if the distribution of observed frequencies in the sample data closely matches with the expected number of occurrences under a hypothetical distribution for the population. The hypotheses being tested are as follows: H 0 : The population follows the hypothesized distribution. H a : The population does not follow the hypothesized distribution.
16
Goodness-of-fit using PROC FREQ SAS ESSENTIALS -- Elliott & Woodward16 A chi-square statistic is calculated, and a decision can be made based on the p-value associated with that statistic. A low p-value indicates that the data do not follow the hypothesized, or theoretical, distribution. If the p-value is sufficiently low (usually <0.05), you will reject the null hypothesis. The syntax to perform a goodness-of-fit test is as follows: PROC FREQ; TABLES variable/ CHISQ TESTP=(list of ratios);
17
Goodness-of-fit Example SAS ESSENTIALS -- Elliott & Woodward17 As an example, we will use data from an experiment conducted by the nineteenth-century monk Gregor Mendel. According to a genetic theory, crossbred pea plants show a 9:3:3:1 ratio. From 556 plants, you expect (9/16) x 556 = 312.75 yellow smooth peas (56.25%) (3/16) x 556 = 104.25 yellow wrinkled peas (18.75%) (3/16) x 556 = 104.25 green smooth peas (18.75%) (1/16) x 556 = 34.75 green wrinkled peas (6.25%)
18
Actual Observed Data SAS ESSENTIALS -- Elliott & Woodward18 After growing 556 of these pea plants, Mendel observed the following: 315 have yellow smooth peas 108 have yellow wrinkled peas 101 have green smooth peas 32 have green wrinkled peas
19
The Goodness-of-Fit Code SAS ESSENTIALS -- Elliott & Woodward19 Hypothesizing a 9:3:3 :1 Ratio: PROC FREQ ORDER=DATA ; WEIGHT NUMBER; TITLE 'GOODNESS OF FIT ANALYSIS'; TABLES COLORTYPE / NOCUM CHISQ TESTP=(0.5625 0.1875 0.1875 0.0625); RUN; Do Hands on Example p 252 (AFRE4.SAS) Note these proportions are in a 9:3:3:1 ratio
20
10.4 ANALYZING TWO-WAY TABLES SAS ESSENTIALS -- Elliott & Woodward20 To create a cross-tabulation table using PROC FREQ for relating two variables, use the TABLES statement with both variables listed and separated by an asterisk (*), (e.g., A*B). A cross-tabulation table is formed by counting the number of occurrences in a sample across two grouping variables. The number of columns in a table is usually denoted by c and the number of rows by r. Thus, a table is said to be an r x c table, that is, it has r x c cells.
21
Test of Independence SAS ESSENTIALS -- Elliott & Woodward21 The hypotheses associated with a test of independence are as follows: H 0 : The variables are independent (no association between them). H a : The variables are not independent. For example, a null hypothesis could be that there is no association between handedness (left and right handed) to hair color
22
Test of Homogeneity SAS ESSENTIALS -- Elliott & Woodward22 The null hypothesis is that the populations have the same distribution (they are homogeneous). In this case, the hypotheses are as follows: H 0 : The populations are homogeneous. H a : The populations are not homogeneous. For example, a null hypothesis could be that from two populations, (male and female) the distribution of handedness the same.
23
Testing These Hypotheses SAS ESSENTIALS -- Elliott & Woodward23 The chi-square test of independence or homogeneity is reported by PROC FREQ (the tests are mathematically equivalent) by the use of the I CHISQ option in the TABLES statement. For example, PROC FREQ; TABLES GENDER*GP/ CHISQ; Do Hands On Exercise p 254. (AFREQ5.SAS) Use the same code to test for either independence or homogeneity
24
Example code to perform a Chi-Square Test on an rx c contingency table (Crime Example) SAS ESSENTIALS -- Elliott & Woodward24 PROC FREQ DATA=DRINKERS; WEIGHT COUNT; TABLES CRIME*DRINKER/CHISQ; TITLE 'Chi Square Analysis of a Contingency Table'; RUN; Note that WEIGHT COUNT; is needed since the data are in summary form.
25
Summarized Data for the Crime Case SAS ESSENTIALS -- Elliott & Woodward25 CRIME DRINKER COUNT Arson150 Arson043 Rape188 Rape062 Violence1155 Violence0110 Stealing1379 Stealing0300 Coining118 Coining014 Fraud163 Fraud0144 Notice how the data are in summarized for. For Arson, there were 50 “Drinkers” (DRINKER=1) and 43 “Non- Drinkers) (DRINKER=0)
26
Results of Chi Square Analysis SAS ESSENTIALS -- Elliott & Woodward26 Observe the statistics table. The Chi-Ssquare value is 49.73 and the p- value is p < 0.0001. Thus, you reject the null hypothesis of no association (independence) and conclude that there is evidence of a relationship between drinking status and type of crime committed.
27
Creating a Contingency Table from Raw Data, the 2 x 2 Case SAS ESSENTIALS -- Elliott & Woodward27 In the previous example (CRIME) the data were in summary form, and you needed to use the WEIGHT COUNT; statement to reflect that. If your data are in raw form – one record per observation, you do not need the WEIGHT statement. Do Hands on Example p 257 (AFREQ6.SAS) For this data, each subject has one record – thus you have one record per observation.
28
Output from 2x2 Chi-Square Analysis SAS ESSENTIALS -- Elliott & Woodward28 The Resulting Table of counts: Statistical Results – note in particular the Chi-Square and Fisher Two-Sided Pr<=p values. The chi-square statistic, 8.29, p = 0.004, indicates an association between CLEANER and RASH (rejects the null hypothesis). The two-sided Fisher results p = 0.0095 provides the same decision.
29
Tables with Small Counts in Cells SAS ESSENTIALS -- Elliott & Woodward29 When you summarize counts in tables, and there are small numbers in one or more cells, a typical chi-square statistical analysis may not be valid. Do Hands On Example p 259 (AFREQ7.SAS) Observe the warning message "WARNlNG: 50% of the cells have expected counts <5. Chi-square may not be a valid test.“ In this case, the Fisher's Exact test (given in Table 10.13) is the more reliable test and should be used instead of the Chi-Square test.
30
10.5 GOING DEEPER: CALCULATING RELATIVE RISK MEASURES SAS ESSENTIALS -- Elliott & Woodward30 Two-by-two contingency tables are often used when examining a measure of risk. A measure of this risk in a retrospective (case-control) study is called the odds ratio (OR). In a case- control study, a researcher takes a sample of subjects and looks back in time for exposure (or nonexposure). If the data are collected prospectively, where subjects are selected by presence or absence of a risk and then observed over time to see if they develop an outcome, the measure of risk is called relative risk (RR). Either way RR=1 or OR=1 means no risk observed.
31
Testing Relative Risk in PROC FREQ SAS ESSENTIALS -- Elliott & Woodward31 In PROC FREQ, the option to calculate the values for OR or RR is RELRISK and appears as an option to the TABLES statement as shown here (for the RASH data): TABLES CLEANER*RASH /RELRISK; In the results, a risk measure > 1 indicates that exposure is harmful and a risk measure <1 implies that exposure is a benefit. Do Hands On Example p 261 (AFREQ6.SAS)
32
Results of Risk Analysis SAS ESSENTIALS -- Elliott & Woodward32 Typically, the Odds Ratio is the statistic of interest The OR= 0.1346 specifies the odds of Row1/Row2 - that is, for cleaner 1 versus cleaner 2. Because OR is <1, this indicates that the odds of a person's having a rash who is using cleaner 1 is less than they are when the person is using cleaner 2.
33
10.6 GOING DEEPER: INTER-RATER RELIABILITY (KAPPA) SAS ESSENTIALS -- Elliott & Woodward33 A method for assessing the degree of agreement between two raters is Cohen's kappa coefficient. For example, kappa is useful for analyzing the consistency of two raters who evaluate subjects on the basis of a categorical measurement. Data for inter-rater reliability analysis RATER A Psyc.Neuro.Organic Rater B Psych. 7514 Neuro. 541 Organic. 0010 80515 Two Raters A and B compared…
34
Code used to Calculate Kappa SAS ESSENTIALS -- Elliott & Woodward34 PROC FREQ WEIGHT WT; TABLE RATERl*RATER2 / AGREE ; TEST KAPPA; TITLE 'KAPPA EXAMPLE FROM FLEISS'; RUN; Do Hands on Exercise p 262 (AKAPPA1.SAS) These options provide the Kappa test results.
35
Results of Kappa Analysis SAS ESSENTIALS -- Elliott & Woodward35 This is Kappa – the primary statistic of interest Interpretation of kappa statistic Kappa ValueInterpretation <0No agreement 0.0–0.20Poor agreement 0.21–0.40Fair agreement 0.41–0.60Moderate agreement 0.61–0.80Substantial agreement 0.81–1.00Almost perfect agreement How to interpret kappa. See text for additional explanation of these results.
36
Calculating Weighted Kappa SAS ESSENTIALS -- Elliott & Woodward36 For the case in which rated categories are ordinal (that is the categories of interest are in a meaningful order), it is appropriate to use the weighted kappa statistic, because it is designed to give partial credit to ratings that are close to but not on the diagonal. For example, in a test of recognition of potentially dangerous airline passengers, suppose a procedure is devised that classifies passengers into three categories: 1 =No threat/Pass 2 =Concern/Recheck 3 =Potential threat/Detain. Do Hands on Exercise p 265 (AKAPPA2.SAS) Note how these 3 categories are ascending in danger – they have a definite order (they are ordinal.)
37
Weighted Kappa Results SAS ESSENTIALS -- Elliott & Woodward37 The code uses this code to caclulate the weighted kappa: TABLE RATER1*RATER2 /AGREE; TEST WTKAP; For this analysis report the “Weighted Kappa” value kappa=0.7413. Use the same interpretation of kappa table as before. See text for the interpretation of other results.
38
10.7 SUMMARY SAS ESSENTIALS -- Elliott & Woodward38 This chapter discusses the capabilities of PROC FREQ for creating one- and two-way frequency tables, analyzing contingency tables, calculating measures of risk, and measuring inter-rater reliability (using KAPPA). Continue to Chapter 11: COMPARING MEANS USING T- TESTS
39
SAS ESSENTIALS -- Elliott & Woodward39
40
SAS ESSENTIALS -- Elliott & Woodward40
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.