SAS Programming Techniques for Decoding Variables on the Database Level By Chris Speck PAREXEL International RTSUG – Wednesday, March 23, 2011.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

CC SQL Utilities.
What is a Database By: Cristian Dubon.
How SAS implements structured programming constructs
DIVERSE REPORT GENERATION By Chris Speck PAREXEL International Durham, NC.
Axio Research E-Compare A Tool for Data Review Bill Coar.
Outline Proc Report Tricks Kelley Weston. Outline Examples 1.Text that spans columnsText that spans columns 2.Patient-level detail in the titlesPatient-level.
Beginning Data Manipulation HRP Topic 4 Oct 19 th 2011.
Lecture-5 Though SQL is the natural language of the DBA, it suffers from various inherent disadvantages, when used as a conventional programming language.
 Monday, 9/30/02, Slide #1 CS106 Introduction to CS1 Monday, 9/30/02  QUESTIONS (on HW02, etc.)??  Today: Libraries, program design  More on Functions!
Basic And Advanced SAS Programming
© OCS Biometric Support 1 Updating an MS SQL database from SAS Jim Groeneveld, OCS Biometric Support, ‘s Hertogenbosch, Netherlands. PhUSE 2010 – CC04.
Jeremy W. Poling B&W Y-12 L.L.C. Can’t Decide Whether to Use a DATA Step or PROC SQL? You Can Have It Both Ways with the SQL Function!
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
Automating survey data validation using SAS macros Eric Bush, DVM, MS Centers for Epidemiology and Animal Health Fort Collins, CO.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Fruitful functions. Return values The built-in functions we have used, such as abs, pow, int, max, and range, have produced results. Calling each of these.
SAS SQL SAS Seminar Series
Managing Passwords in the SAS System Allen Malone Senior Analyst/Programmer Kaiser Permanente.
XP New Perspectives on Microsoft Office Access 2003 Tutorial 12 1 Microsoft Office Access 2003 Tutorial 12 – Managing and Securing a Database.
Different Decimal Places For Different Laboratory Tests PharmaSug 2004, TT01 A. Cecilia Mauldin.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Adventures in ODS: Producing Customized Reports Using Output from Multiple SAS® Procedures Stuart Long Westat, Durham,
INTRODUCTION TO SAS MACRO PROCESSING James R. Bence, Ph.D., Co-Director Quantitative Fisheries Center Professor Department of Fisheries and Wildlife March.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 Back Up with Each Submit One approach for keeping a dynamic back up copy of your current work.
Key Applications Module Lesson 21 — Access Essentials
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
Define your Own SAS® Command Line Commands Duong Tran – Independent Contractor, London, UK Define your Own SAS® Command Line Commands Duong Tran – Independent.
Jessica Bennett, Advance America Barbara Ross, Flexshopper LLC PharmaSUG 2015 Paper #QT06.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Introduction to SAS Macros Center for Statistical Consulting Short Course April 15, 2004.
Code Generation. 2 Overview of presentation Goal Background Dynamic SQL Method Examples.
YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From 15+ Years of SAS Programming John Pirnat Kaiser Permanente.
Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015.
An Introduction Katherine Nicholas & Liqiong Fan.
Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.
Dynamic Generation of Data Steps on basis of Unique By-Group Permutations David Rosenfeld City of Toronto.
SAS ® is a very powerful tool when producing Graphics. A single graphical data step can easily create a Kaplan Meier Plot, but there is no single graphical.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Beginning Data Manipulation HRP Topic 4 Oct 14 th 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Using Dictionary Tables to Profile SAS Datasets By Phillip Julian February 11, 2011.
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Build your Metadata with PROC CONTENTS and ODS OUTPUT Louise S. Hadden Abt Associates Inc.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Beautiful PROC CONTENTS Output Using the ODS Excel Destination Suzanne Dorinski SESUG 2015 Disclaimer: Any views expressed are those of the author and.
Hints and Tips SAUSAG Q SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with.
SAS and Other Packages SAS can interact with other packages in a variety of different ways. We will briefly discuss SPSSX (PASW) SUDAAN IML SQL will be.
Greg Steffens Noumena Solutions
GENERICITY New Metadata Concepts Applied to SAS Macro Programming
Sirena Hardy HRMS Trainer
Microsoft Office Access 2003
How to Create Data Driven Lists
3 Iterative Processing.
Hunter Glanz & Josh Horstman
Dictionary Tables and Views, obtain information about SAS files
Examining model stability, an example
Never Cut and Paste Again
Passing Simple and Complex Parameters In and Out of Macros
Troubles with Text Data
Frank DiIorio CodeCrafters, Inc. Philadelphia PA
Presentation transcript:

SAS Programming Techniques for Decoding Variables on the Database Level By Chris Speck PAREXEL International RTSUG – Wednesday, March 23, 2011

Libraries can be viewed as discrete units. May require a change in perspective for the beginning and intermediate programmer. Crucial now that programmers are under greater pressure to apply regulatory standards to clinical research data. Programmers must now deal with metadata on the library level, which can be especially difficult with legacy data. One tool for this is the Meta-Engine

You’re given a library of 55 datasets of varying quality from 2001 which you must convert to submission ready data. Many datasets have variables with different kinds of non-native SAS formats. New variables equivalent to the decode of these formatted variables must be made. All non-native formats need to be stripped. What are you going to do?

One solution is to make one program per dataset. Another is to create one massive program that updates datasets one at a time. Some of the obvious flaws to this approach include Specific only to one project Involves much unnecessary rework Disorganized Difficult to debug

SAS program that manipulates metadata of entire libraries. Portable Streamlined Easy to understand and debug Loops through a library one dataset at a time to make quick and uniform changes. Relies much on the SAS macro facility Dictionary tables Proc format

Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable Meta-Engine Structure

Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable Meta-Engine Structure %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. %DO Looping Mechanism

Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable Meta-Engine Structure %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. %DO Looping Mechanism Code to Adjust Dataset labels

Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable Meta-Engine Structure %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. %DO Looping Mechanism Code to Adjust Dataset labels Code Calling the Decode() macro Does this once for every variable needing decoding.

Meta-Engine Overview Decode( ) Macro Input: A WORK dataset with a formatted variable Output: A WORK dataset with the decode of this variable Meta-Engine Structure %DO Looping Mechanism Uses PROC SQL and Dictionary Tables. %DO Looping Mechanism Code to Adjust Dataset labels Code Calling the Decode() macro Does this once for every variable needing decoding. Code to make further changes

Meta-Engine Overview The Meta-Engine macro asks for two library names: The one that contains the existing database, and the one that will contain the corrected, submission ready database. For example: %macro MetaEngine(lib=, outlib=); %mend MetaEngine; %MetaEngine(lib=MYLIB, outlib=MYNEWLIB); >

%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit;

%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)

%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~)

%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~) Macro variable THISDS will represent a dataset name for every loop iteration

%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~) Macro variable THISDS will represent a dataset name for every loop iteration Loops till you run out of tildes

%DO Looping Mechanism proc sql noprint; select memname into :dsnames separated by '~' from dictionary.columns where libname="&LIB" and varnum=1; quit; %let i = %eval(1); %do %while (%scan(&dsnames,&i,~) ne ); %let thisds =%scan(&dsnames,%eval(&i),~); %let i = %eval(&i+1); %end; Using Dictionary Tables to produce a list of datasets in the library separated by tildes (~) Macro variable THISDS will represent a dataset name for every loop iteration Loops till you run out of tildes >

Adjusting Dataset Labels %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel);

Adjusting Dataset Labels %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); Data Steps won’t save them

Adjusting Dataset Labels %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); Data Steps won’t save them Resetting label macro variable before each loop iteration

Adjusting Dataset Labels %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); Data Steps won’t save them Resetting label macro variable before each loop iteration Dictionary Table assigns label of THISDS to macro variable DSLABEL. Used to assign dataset label to final dataset.

Adjusting Dataset Labels %let dslabel=; proc sql noprint; select memlabel into :dslabel from dictionary.tables where libname="&lib" and memname="&thisds"; quit; %let dslabel=%trim(&dslabel); %if &thisds=DATA1 %then %let dslabel=Label for DATA1; %else %if &thisds=DATA2 %then %let dslabel=Label for DATA2; Data Steps won’t save them Resetting label macro variable before each loop iteration In case you want to manually adjust dataset labels (not in paper) Dictionary Table assigns label of THISDS to macro variable DSLABEL. Used to assign dataset label to final dataset.

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end;

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Input dataset Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Decode variable name Will equal list of all decoded variables for later processing

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Decode variable name Will equal list of all decoded variables for later processing Builds decode list

Calling the Decode Macro data ds0; set &lib..&thisds; run; %let fv=; %if &thisds=DATA1 %then %do; %decode(fmtlib=&lib, ds=ds0, newds=ds0_00, var=D1VAR1, newvar=D1VAR1C); %let fv=&fv D1VAR1; %end; Base dataset equal to THISDS DECODE MACRO Parameters: Format library Input dataset Output dataset (with 1 new decode variable) Variable to be decoded Decode variable name Will equal list of all decoded variables for later processing Builds decode list Used DS0_00 numbering scheme with NEWDS parameter because it will be the parameter DS in the next macro call, producing DS0_001. Final product should be DS1.

Calling the Decode Macro How it would appear in real code

Decode Macro proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable.

Decode Macro proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1); run; 2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value. 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable.

Decode Macro proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1); run; Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable. Why do we use a substring function here? 2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.

Decode Macro proc sql noprint; select format into :fmt from dictionary.columns where libname="WORK" and memname=%upcase("&ds") and name="&var"; quit; proc format noprint cntlout=fmt (keep=length) library=%upcase(&fmtlib) fmtlib; select %substr(&fmt,1,%length(&fmt)-1); run; Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. 1. Finds variable’s format using SAS Dictionary table COLUMNS and assigns it to the FMT macro variable. Why do we use a substring function here? To remove the trailing period 2. Gets full range of &FMT format using FMTLIB option. Saves to dataset FMT. LENGTH is max length of format value.

Decode Macro data _null_; set fmt; if _n_=1 then call symput('len',cats(put(length,best.))); run; 2.5. Assigns max length of format to &LEN to prevent truncation.

Decode Macro data _null_; set &ds; if _n_=1 then do; if length(vlabel(&var))<=38 then call symput('newlabel',cats(vlabel(&var))||"-C"); else if length(vlabel(&var))=39 then call symput('newlabel',cats(vlabel(&var))||"C"); else if length(vlabel(&var))>=40 then call symput('newlabel',substr(cats(vlabel(&var)), 1,length(vlabel(&var))-1)||"C"); end; run; 3. Retrieves variable label with VLABEL

Decode Macro data _null_; set &ds; if _n_=1 then do; if length(vlabel(&var))<=38 then call symput('newlabel',cats(vlabel(&var))||"-C"); else if length(vlabel(&var))=39 then call symput('newlabel',cats(vlabel(&var))||"C"); else if length(vlabel(&var))>=40 then call symput('newlabel',substr(cats(vlabel(&var)), 1,length(vlabel(&var))-1)||"C"); end; run; 3. Retrieves variable label with VLABEL Adjusts label so decode variables will have unique labels. Truncates label if >40 characters. Assigns to macro variable &NEWLABEL

Decode Macro data &newds; length &newvar $&len; set &ds; &newvar=put(&var,&fmt); label &newvar="&newlabel"; run; 4. Creates output dataset (&NEWDS). Derives decode variable (&NEWVAR) with variable format (&FMT). Assigns it a length (&LEN) and a label (&NEWLABEL).

Decode Macro data &newds; length &newvar $&len; set &ds; &newvar=put(&var,&fmt); label &newvar="&newlabel"; run; proc datasets nolist lib=work memtype=data; delete fmt &ds; run; quit; 5. Garbage collection 4. Creates output dataset (&NEWDS). Derives decode variable (&NEWVAR) with variable format (&FMT). Assigns it a length (&LEN) and a label (&NEWLABEL).

Further Adjustments data ds2; set ds1; %if &thisds=DATA3 %then %do; label D3VAR1C="Trunc. decode label which was too long-C"; %end; run; For further information see my previous paper SAS Programming Techniques for Adjusting Metadata on the Database Level.

Completing Database Loop data &outlib..&thisds %if %length(&dslabel)>0 %then (label="&dslabel");; set ds2; format &fv; run; Creates final dataset in output library Assigns dataset label DS2 could be DS3 or any number depending on the adjustments performed. Strips formats off of decoded variables. Process repeats for every dataset in library.

Other Possibilities Possible to automate variable decoding. PROC SQL produces list of variables with non- native formats. List informs inner loop calling %DECODE() once for each variable. A gain in automation, a loss in adaptability. Not all formats exist in the same catalog. Not all variable names may be 8 characters long. Not all formatted variables may require decodes.

Other Possibilities The Meta-Engine can be tweaked depending on the task. Some ideas include: Testing libraries for SAS 5 compliance Excluding certain datasets Renaming datasets Splitting a dataset into two if it takes up too much memory. Adjusting dataset and variable metadata. See my previous paper SAS Programming Techniques for Adjusting Metadata on the Database Level.

The Meta-Engine offers a quick and streamlined approach for a programmer to begin thinking about metadata on the library or database level. Programmers can begin to manipulate whole libraries intuitively as if they were datasets. The Meta-Engine in its entirety plus further information can be found in my paper SAS Programming Techniques for Decoding Variables on the Database Level. Conclusion

Chris Speck, Senior Programmer PAREXEL International 2520 Meridian Parkway, Suite 200 Durham, NC Work Phone: Fax: Contact Information