Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract

Slides:



Advertisements
Similar presentations
Spectre (Clinical) %unistats A flexible macro to give you.... “proc univariate” descriptive statistics with category counts and percentages (plus optional.
Advertisements

Simple Logistic Regression
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
A Simple Guide to Using SPSS© for Windows
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
15b. Accessing Data: Frequencies in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Adding Automated Functionality to Office Applications.
Testing for a Relationship Between 2 Categorical Variables The Chi-Square Test …
Introduction to SAS Essentials Mastering SAS for Data Analytics
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Introduction to SAS Essentials Mastering SAS for Data Analytics
 In Chapter 10 we tested a parameter from a population represented by a sample against a known population ( ).  In chapter 11 we will test a parameter.
CHAPTER 11 SECTION 2 Inference for Relationships.
1 An Introduction to SPSS for Windows Jie Chen Ph.D. 6/4/20161.
Database Applications – Microsoft Access Lesson 4 Working with Queries 36 Slides in Presentation.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Dan Piett STAT West Virginia University Lecture 12.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
CHAPTER 7 LESSON B Creating Database Reports. Lesson B Objectives  Describe the components of a report  Modify report components  Modify the format.
Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 8:16-17.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Practical Solutions Comparing Proportions & Analysing Categorical Data.
Introduction to Marketing Research
Session 1 Retrieving Data From a Single Table
March 28 Analyses of binary outcomes 2 x 2 tables
Using MS Access for SQL CIS 523 Fall 2009 McCoey.
CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.
Applied Business Forecasting and Regression Analysis
Access Tutorial 3 Maintaining and Querying a Database
Notes on Logistic Regression
Loops BIS1523 – Lecture 10.
Data Virtualization Demoette… Parameterized Queries
Categorical Data Aims Loglinear models Categorical data
John Loucks St. Edward’s University . SLIDES . BY.
An Interactive Tutorial for SPSS 10.0 for Windows©
Microsoft Visual Basic 2005 BASICS
Advanced Analytics Using Enterprise Miner
ASPIRE Workshop 5: Analysis Supplementary Slides
Typical biostatistics tasks
Data Analysis for Two-Way Tables
Elementary Statistics
Tutorial 3 – Querying a Database
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
We’ll now consider 2x2 contingency tables, a table which has only 2 rows and 2 columns along with a special way to analyze it called Fisher’s Exact Test.
Two Categorical Variables: The Chi-Square Test
Click Headings Above to View Content
Descriptive Analysis and Presentation of Bivariate Data
Producing Descriptive Statistics
CHAPTER 17 The Report Writer Module
Data Processing, Basic Data Analysis, and the
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Shelly Cashman: Microsoft Access 2016
Chapter 8 Using Document Collaboration and Integration Tools
Introduction to SAS Essentials Mastering SAS for Data Analytics
Chapter 13 Excel Extension: Now You Try!
Presentation transcript:

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract ABSTRACT As programmers, we are often asked to program statistical analysis procedures to run against the data. Sometimes the specifications we are given by the statisticians outline which statistical procedures to run. But other times, the statistical procedures to use need to be data dependent. To run these procedures based on the results of previous procedures' output requires a little more preplanning and programming. We present a macro that dynamically determines which statistical procedure to run based on previous procedure output. The user can specify parameters (for example, fshchi, plttwo, catrnd, bimain, and bicomp), and the macro returns counts, percents, and the appropriate p-value for Chi-Square versus Fisher Exact, and the p-value for Trend and Binomial CI, if applicable. Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout Output – Plots Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract INPUT 12 macro parameters Only 2 are required Parameter Description Default Value Required indsn input data set - include libname (i.e., adam, adeff) Yes sortby sort data - if need to sort descending specify whrcls where clause used to subset input data set No kpvars variables to keep sortby* tbvars table statement grpvar group variable (i.e., variable that p-value will be for) expcnt minimum expected counts for each cell used for determining if Chi-Sqr or Fisher Exact is used 5 fshchi threshold used to determine whether Chi-Sqr or Fisher Exact p-value will be used 0.25 plttwo option for a twoway plot, if want 2-way plot specify GROUPVERTICAL, GROUPHORIZONTAL or STACKED catrnd if Cochran-Armitage trend test is needed Y bimain if two group comparison and/or proportional Binomial CI is needed then specify main group that others will be compared against; embed in quotes bicomp if two group comparison and/or proportional Binomial CI are needed then specify all comparator groups; embed each group in quotes separated by an exclamation mark (!) Conditional on bimain Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout Output – Plots Alternate Layout Conclusions * If macro parameter is not specified, then the value will be determined with what is provided in the sortby parameter Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract MACRO PROCESS Certain macro parameters are needed to execute. If these are not specified, then they will be determined based on the value of sortby. Retrieve the data PROC FREQ is executed. By default it will produce a cross tabulation frequency (CTF), Chi-Square statistic and Fisher Exact statistic. Upon request it will produce the Cochran Armitage trend test and a frequency plot. The CTF is used to determine if Chi-Square or Fisher Exact test statistic should be used. This is based on if more than the desired threshold (fshchi) of cells have expected counts that is less than the desired expected counts (expcnt) (i.e., if more than 25% of cells have expected counts less than 5). If group comparison and/or Binomial Proportions CI (BiCI) are needed, then the main group is specified separately from the comparison groups. The macro will determine if a pairwise comparison and/or a BiCI can be done. If a pairwise comparison can be done, the macro will loop through each comparison group and produce a Chi-Square and Fisher Exact test statistic for each pairwise comparison. If a BiCI can be generated, the macro will generate one for each pair and put the BiCI in the format of (x.x, x.x). Depending on which test statistic should be used based on the finding in step 4, the test statistic and p-value are captured in a data set (t_pvals). Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout Output – Plots Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract IN DEPTH LOOK AT STEP 3 ods output crosstabfreqs=ctf (where=(_TYPE_ ne '00') drop = Table _TABLE_ Missing Percent); ods output chisq = chi_oall; ods output fishersexact = fis_oall; %if &catrnd = Y %then ods output trendtest = trend; ; proc freq data = outdsn order = data; tables &tbvars / OUTPCT chisq cmh fisher expected %if &plttwo ne %then plots=freqplot(twoway=&plttwo); %if &catrnd = Y %then trend; ; /* this semicolon ends the tables statement - do NOT delete */ run; Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout If catrnd is Y then the Cochran-Armitage trend test is run and the output TREND is generated. If plttwo is specified, then the PROC FREQ will produce the corresponding plot based on the value of plttwo. The only possible values for plttwo are GROUPHORIZONTAL GROUPVERTICAL STACKED Output – Plots Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract IN DEPTH LOOK AT STEP 5 ods output chisq = chi_&x; ods output fishersexact = fis_&x; proc freq data = ctf order = data; where _TYPE_ = '11'; weight FREQUENCY; tables &tbvars / %if &&nvar2_&x = 2 %then binomial; alpha=0.05 chisq fisher; /* only execute binomial proportion CI if data is 2x2 */ %if &&nvar2_&x = 2 and &numvars = 2 %then %do; exact riskdiff; output out=bci_&x (keep = L_RDIF1 U_RDIF1) riskdiff; %end; where also &grpvar in (&bimain "&bigrp"); run; Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout Part of the processing of step 1 is to determine the number of variables used in PROC FREQ. If there are only two variables, a pairwise comparison can be done. However, in order to do a Binomial Proportions CI, then the program will determine the number of levels for each variable being compared. In order to produce a BiCI, the data must be a 2x2 table. In other words, two variables with at most 2 values each. A data set is produced for the pairwise comparison and/or BiCI if it was possible to run the tests. Output – Plots Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD DEFAULT OUTPUT DATA SETS Abstract Cross Tabulation Frequencies (CTF) Test Stat & P-Val (T_PVALS) TRTA CRIT1FL _TYPE_ Frequency Expected RowPercent ColPercent ARM D Y 11 24 24.2832 85.7143 24.4898 N 4 3.7168 14.2857 26.6667   10 28 . ARM C 25.1504 82.7586 5 3.8496 17.2414 33.3333 29 ARM B 22 21.6814 88 22.449 3 3.3186 12 20 25 ARM A 26.885 90.3226 28.5714 4.115 9.6774 31 01 98 15 TRTAn * pvalue Tvalue Test † 1 FIS 2 0.4653 0.47 3 0.6978 0.7 OALL 0.8546 0.85 Input Macro Process * TRTAn is set to OALL for the overall comparison and is the one record that will always be produced. TRTAn will be set to some numeric value which is programmatically determined based on the value of bicomp . The order in which the values in bicomp appear correspond to the numeric order. † Test will either be CHI for Chi-Square or FIS for Fisher’s Exact and the p- value and t- value captured will be associated with the indicated test In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout OPTIONAL OUTPUT DATA SETS Output – Plots Cochran-Armitage Trend Test (TREND) Binomial Proportion (BICI) Table Name1 Label1 cValue1 nValue1 Table TRTA * CRIT1FL _TREND_ Statistic (Z) 0.6903 0.69028 PR_TREND One-sided Pr > Z 0.245 0.245009 P2_TREND Two-sided Pr > |Z| 0.49 0.490018 L_RDIF1 U_RDIF1 TRTAn * bici -0.18772 0.14127 1 (-18.8, 14.1) -0.24807 0.09679 2 (-24.8, 9.7) -0.21231 0.12014 3 (-21.2, 12.0) Alternate Layout Order of variables on the PROC FREQ table statement will effect the layout of CTF and values of RowPercent and ColPercent. In addition, the output for BICI is assessing the Risk Difference between rows therefore the order in which the table is specified is important. If you which to assess the difference between VARx and the table statement has VARy*VARx then the BICI will produce Risk Difference for VARy. See Alternate Layout if the table statement in PROC FREQ is CRIT1FL * TRTA Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD DEFAULT OUTPUT DATA SETS Abstract Cross Tabulation Frequencies (CTF) CRIT1FL TRTA _TYPE_ Frequency Expected RowPercent ColPercent Y ARM D 11 24 24.2832 24.4898 85.7143 ARM C 25.1504 82.7586 ARM B 22 21.6814 22.449 88 ARM A 28 26.885 28.5714 90.3226   10 98 . N 4 3.7168 26.6667 14.2857 5 3.8496 33.3333 17.2414 3 3.3186 20 12 4.115 9.6774 15 01 29 25 31 Input Macro Process With table statement CRIT1FL * TRTA instead of TRTA * CRIT1FL the RowPercent and ColPercent values are reversed, so extra care should be taken when selecting the percentage In Depth of Step 3 In Depth of Step 5 Output – Data Sets Alternate Layout OPTIONAL OUTPUT DATA SETS Binomial proportion CI is based on Row Risk difference. If the need is to assess the TRTA Row Risk difference then TRTA should be listed first. In this illustration, CRIT1FL is first on the table statement and therefore the risk assessment is based on CRIT1FL and not TRTA. Output – Plots Binomial Proportion (BICI) L_RDIF1 U_RDIF1 TRTAn * bici -0.48307 0.36307 1 (-48.3, 36.3) -0.52527 0.19834 2 (-52.5, 19.8) -0.50073 0.28095 3 (-50.1, 28.1) Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD PLOTS: TABLE STATEMENT IN PROC FREQ IS TRTA*CRIT1FL Abstract If a plot is desired, then the type of the plot should be specified. The option specified in the macro call will determine the layout of the plot. In addition, the table variables will determine the layout. For example, if table statement in PROC FREQ is TRTA * CRIT1FL then the orientation will be as displayed below. GROUPHORIZONTAL Input Macro Process In Depth of Step 3 To view sample layout if the table statement in PROC FREQ is CRIT1FL * TRTA refer to Alternate Layout In Depth of Step 5 Output – Data Sets STACKED GROUPVERTICAL Alternate Layout Output – Plots Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD PLOTS: TABLE STATEMENT IN PROC FREQ IS CRIT1FL*TRTA Abstract GROUPHORIZONTAL GROUPVERTICAL Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets STACKED Alternate Layout Output – Plots Alternate Layout Conclusions Click Headings Above to View Content

Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting; Lynn Mullins, PPD Abstract CONCLUSION This macro provides an effective solution for running statistics based on the data. It is very robust by providing output data sets and multiple plots. You no longer have to run your program each time the data changes to see which statistic to run. With the passing of just a couple macro parameters, the macro with do it all. Input Macro Process In Depth of Step 3 In Depth of Step 5 Output – Data Sets Contact Information Richann Watson DataRich Consulting (513) 843-4081 richann.watson@datarichconsulting.com Lynn Mullins PPD (910) 558-4343 Lynn.mullins@ppdi.com Alternate Layout Output – Plots Alternate Layout Conclusions Click Headings Above to View Content