Lisa Mendez, PhD & Andrew Kuligowski Case Study: Using Base SAS® to Automate Quality Checks of Excel ® Workbooks that have Multiple Worksheets Lisa Mendez, PhD & Andrew Kuligowski
Overview The Process Determine how to identify the smaller problems within the larger, overwhelming problem Solve each problem using SAS code Implementing the code Lessons learned
Background Unfamiliar with the data – thrown into the deep end There were 5 markets ADHD, BNZD, CNNB, CDNE, and PAIN Each market had 7 Excel Workbooks that needed to be checked Each Workbook had various multiple worksheets ADHD – 7 worksheets BNZD – 24 worksheets CNNB – 7 worksheets CDNE – 5 worksheets PAIN – 55 worksheets
The Overarching Problem Let’s do the math! 5 markets multiplied by 7 workbooks (35 workbooks) that had a total of 98 workbooks that needed to be checked That’s 3,430 worksheets FOR 27 QUARTERS!!!!! For a grand total of 92,610 worksheets That can be just a little overwhelming…
Getting the Data into SAS® XLSX Engine Allows you to read and write Microsoft Excel files as if they were data sets in a library Advantage is that it accesses the XLSX file directly - does not use the Microsoft data APIs as a go-between You have to have a license for SAS/ACCESS to PC Files to utilize the XLSX engine SAS University Edition, the SAS/ACCESS product is part of the that package libname Cadhd1 XLSX "C:\Users\lmendez\Documents\RMPDC\Deliverables2017_Q2\ADHD\RMPD_Patient Tracking_ADHD_NDW_2018Q2.xlsx";
Getting the Data into SAS® libname Cadhd1 XLSX "C:\Users\lmendez\Documents\RMPDC\Deliverables2017_Q2\ ADHD\RMPD_Patient Tracking_ADHD_NDW_2018Q2.xlsx"; The libname statement sets up the datasets, and you will see them in the cadhd1 library, but the datasets will be empty Names of datasets are the names of the worksheets
Loading the Data Using PROC SQL & SAS Dictionary tables
Loading the Data Note: All Caps where libname="CADHD1"
Loading the Data
Loading the Data
Loading the Data Macro variables (will be used in the macro) Output from the log: 52 53 %put &snamlist_1; /* show the macro variable snamlist in the log */ LOOKUP*STATE_SUBGRP*STATE_SUPERGRP*ZIP_SUBGRP_AMPH*ZIP_SUBGRP_METH*ZIP_SUBGRP_OTH_ANAL*ZIP_SUBGRP_OTH_ANTI*ZIP_SUPER 54 %put &n_1; /* show the macro variable n_1 I the log */ 8
Loading the Data SAS Macro 54 %put &n_1; /* show the macro variable n_1 I the log */ 8 LOOKUP*STATE_SUBGRP*STATE_SUPERGRP*ZIP_SUBGRP_AMPH*ZIP_SUBGRP_METH*ZIP_SUBGRP_OTH_ANAL*ZIP_SUBGRP_OTH_ANTI*ZIP_SUPER
Validating Worksheet & Variable Names Need templates to compare Load templates each quarter Ensure permanent template library (libname statement) By Market List of variable names List of worksheet names
Validating Worksheet Names Once templates are loaded, compare worksheet names
Validating Worksheet Names Dataset created after PROC SQL compare for Worksheet Names All worksheet names match – no errors
Validating Worksheet Names Create an error report
Validating Variable Names Once templates are loaded, compare variable names Use Proc Contents to get a current list variable names
Validating Worksheet & Variable Names Dataset created after PROC SQL compare for Variable Names Everything Matches Note: change variable names either before PROC SQL, or in the PROC SQL statement
Exporting Error Report to Excel® The macro variable ‘x’ is used to number the reports that correspond with each workbook
Exporting Error Report to Excel® Used within a macro One Excel file per Market Multiple worksheets for each workbook checked No errors for this workbook Each worksheet corresponds to a workbook
Exporting Error Report to Excel® Sample of worksheet error
Exporting Error Report to Excel® Lessons learned: Do not output if there are no errors, or output “no error” message, because most of the workbooks do not have variable name or worksheet name errors
Validating Data A macro variable was created, using the same methods as before for all the worksheet/dataset names The macro variable was used in conjunction with a macro to execute a data step multiple times to check all the data within a worksheet/dataset
Validating Data
Validating Data Similar code was written to check the products within a workbook A pre-loaded template was used to ensure the correct products were in the correct worksheet/dataset A macro was used, along with a data step, and a PROC SQL step to compare product names in the pre-loaded template with the product names of the current data
Validating Data An exception report was created for the values check Utilized lesson learned from previous Excel export For these exception reports, only MS Excel workbooks were created for each worksheet only if any errors were found
Exporting Error Report to Excel®
Exporting Error Report to Excel®
Deleting Datasets Many macros are used to create many datasets in the process of checking one workbook To ensure there is enough space in the SAS session, PROC Datasets is used to clean up the libraries used in the program
Deleting Datasets To delete all files in a SAS data library at one time use the KILL option CAUTION: The KILL option deletes all members of the library immediately after the statement is submitted
Conclusion When faced with overwhelming task break it down Solve one problem at a time Doing research online may help provide different solutions Find one that works for your problem, and YOU prefer Don’t be afraid to code your program and do some steps that are not as efficient (“down and dirty”) When utilizing macros, get the program to work before coding the macro(s) Enhance your program for efficiency when you have more time
Contact Information Name: Lisa Mendez Company: IQVIA GS Email: mendezla@sbcglobal.net Name: Andrew Kuligowski Company: HSN Email: kuligowskiconference@gmail.com