PROC DOC III: Self-generating Codebooks Using SAS® Ms. Hadden has been using and loving SAS since the days of punch cards and computers the size of a “tiny house.” She spends most of her time in support of health policy analytics at Abt Associates Inc. and loves a good SAS reporting challenge. I am an ardent life long learner and reads voraciously, loves photography and volunteers at the MSPCA Boston Adoption Center walking, training and photographing dogs. PharmaSUG 2017 Paper #QT07
Introduction This paper will demonstrate how to use good documentation practices and SAS® to easily produce attractive, camera-ready data codebooks (and accompanying materials such as label statements, format assignment statements, etc.) We’ll go briefly through the process, concepts first, then step by step. More details are included in the paper.
What is a SAS® Program? At the highest level, it is SOFTWARE A set of instructions At the most basic level, it is a TEXT FILE Troy Martin Hughes will tell you it is software -
What does SAS do with text files? From within a SAS program: Reads, as in input files and include files Writes, as in output files Executes instructions SAS can read all types of “text” files – flag, csv, html, etc. Similarly, SAS can write all types of “text” files, and can use include files that have been created within the same program (prior to inclusion of course).
Codebook program overview We read metadata and execute instructions in the text file (program) We write out text files of SAS instructions We write out text %include files We read (and %include) text files of SAS instructions We write out output files This program for generating a codebook covers many uses of text files in SAS – it does not include “control” files, for example.
First steps Create a modified copy of SASHELP.HEART Create a documentation spreadsheet of PROC CONTENTS output (PROC EXPORT) Review spreadsheet and modify if needed In real life, we assume you’ve practiced good documentation and coding – have labeled your data sets, variables and created formats. For our example, we used SASHELP.HEART. Although it’s not strictly necessary, I have added this step just to emphasize that. Copy – add a couple of variables, a data set label, and variable labels Create – we could use PROC DATASETS, dictionary tables, sashelp.vcolumn, or PROC CONTENTS among others. I use PC because of sort var names among other things – output objects are the best Review – your boss may want to redo labels, or categorize variables. For example, there may be privacy concerns with IDs.
First steps Copy – add a couple of variables, a data set label, and variable labels Create – we could use PROC DATASETS, dictionary tables, sashelp.vcolumn, or PROC CONTENTS among others. I use PC because of sort var names among other things – output objects are the best Review – your boss may want to redo labels, or categorize variables. For example, there may be privacy concerns with IDs.
Second steps Import modified spreadsheet (PROC IMPORT) Use data from modified spreadsheet to write code to perform various documentation tasks. Copy – add a couple of variables, a data set label, and variable labels Create -
Macros & friends Different macros are constructed to report on: “header information” missing values details on non-missing values “header information” (i.e. variable name, label, etc.), missing values (including different kinds of special missing values), and then details on non-missing values, differential by variable type (character, continuous, categorical). Additionally, the program accesses the metadata and outputs text files with macro calls to the macros created above conditional upon the variable type in the metadata and reporting macros, that are then reused in the program as include files
Macros & friends %MACRO header(varname,order); DATA temp&order.a; LENGTH sentence1 . . . $ 500 blurb $ 25000 . . .; SET dd.heart_cb (WHERE=(name="&varname")); namecolon=CATS('^{STYLE [FONTSIZE=10pt FONTWEIGHT=bold]',name,":}"); labelfmt=CATS('^{STYLE [FONTSIZE=10pt FONTWEIGHT=bold]',label,"}"); sentence1=CATX(' ',namecolon,labelfmt); . . . blurb=CATT(sentence1,sentence2,sentence3); LABEL blurb =" "; RUN; . . . %MEND; This is one of a number of macros – but we don’t really want to call differential macros by hand – it would take hours, so…
Macros & friends The rest of the program: accesses the metadata outputs text files with macro calls to the macros created above conditional upon the variable type in the metadata outputs reporting macros to be reused in the program as include files. This is where the review of the metadata spreadsheet comes in handy
Macros & friends header missing missing detail detail This is where the review of the metadata spreadsheet comes in handy detail
Macros & friends This is where the review of the metadata spreadsheet comes in handy
Macros & friends This codebook printout does use a special style template, noborders. The code for this is available on the support.SAS.com website and provided in the zip file of code available from me.
Don’t stop with a codebook! Similarly, metadata can be accessed to create label, format, and length, etc. statements. The resulting statements can be included in other programs seamlessly. This is where the review of the metadata spreadsheet comes in handy
Conclusion SAS provides numerous opportunities for creating self-documenting data sets. With care at the onset of a project, programmers can ensure quality data and accurate documentation. SAS enables the creation of user-friendly documentation, and self-generation of components of SAS programs. Only code snippets are shown here: full code is available from the author upon request.
Contact me! Name: Louise Hadden Organization: Abt Associates Inc. Address: 55 Wheeler St. City, State ZIP: Cambridge, MA 02138 Work Phone: 617-349-2385 E-mail: louise_hadden@abtassoc.com LinkedIn: Louise Hadden Twitter: ceeott56 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.