A Better Way to Flip (Transpose) a SAS® Dataset

Slides:



Advertisements
Similar presentations
Copyright © 2006, SAS Institute Inc. All rights reserved. Think FAST! Use Memory Tables (Hashing) for Faster Merging Gregg P. Snell Data Savant Consulting.
Advertisements

Examples from SAS Functions by Example Ron Cody
Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011.
Next Presentation: Presenter: Arthur Tabachneck Copy and Paste from Word or Excel to SAS Art holds a PhD from Michigan State University, has been a SAS.
Objectives Understand the software development lifecycle Perform calculations Use decision structures Perform data validation Use logical operators Use.
Enough really good SAS ® tips to fill a book Arthur Tabachneck, President and CEO, myqna.org.
Jeremy W. Poling B&W Y-12 L.L.C. Can’t Decide Whether to Use a DATA Step or PROC SQL? You Can Have It Both Ways with the SQL Function!
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
_______________________________________________________________________________________________________________ PHP Bible, 2 nd Edition1  Wiley and the.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
A Brief Introduction to PROC TRANSPOSE prepared by Voytek Grus for
San Francisco  Theme: Strength in Numbers  Big Data and Business Intelligence (BI) Applications.
Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced.
Introduction to Oracle - SQL Additional information is available in speaker notes!
A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
TASS Meeting Setting GuessingRows when Importing Excel Files September 19th, 2008 Setting GuessingRows when importing Excel Files Dr. Arthur Tabachneck,
SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious.
An Introduction to Proc Transpose David P. Rosenfeld HR Consultant, Workforce Planning & Data Management City of Toronto.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
SAS ® Global Forum 2014 March Washington, DC Arthur Tabachneck Thornhill, ON Canada Tom Abernathy New York, NY Matt Kastin Penn Valley, PA.
TASS Meeting Quickly Finding Project Code March 13th, 2009 A way to quickly find all of your project code Dr. Arthur Tabachneck Director, Data Management.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
2012 OrlandoFlorida April 22-25, 2012 Sometimes one needs an option with unusual dates Arthur TabachneckMatthew Kastin Thornhill, OntarioLouisville, Colorado.
Creating Database Objects
Tips for Mastering Relational Databases Using SAS/ACCESS®
Data Manipulation in SAS
Cleveland SQL Saturday Catch-All or Sometimes Queries
A Universal File Flattener
Chapter 6: Modifying and Combining Data Sets
Chapter 2: Getting Data into SAS
Dynamic SQL Writing Efficient Queries on the Fly
Some ways to encourage quality programming
ECONOMETRICS ii – spring 2018
Former Chapter 23: Selecting Efficient Sorting Strategies
Fall 2017 Questions and Answers (Q&A)
Chapter 7: Macros in SAS Macros provide for more flexible programming in SAS Macros make SAS more “object-oriented”, like R Not a strong suit of text ©
Topics Introduction to File Input and Output
SAS Essentials How SAS Thinks
PROC DOC III: Self-generating Codebooks Using SAS®
A Better Way to Flip (Transpose) a SAS® Dataset
Chance to make SAS-L History!
Increase Your Productivity by Doing Less
Bring the Vampire out of the Shadows: Understanding the RETAIN and COUNT functions in SAS® Steve Black.
Lisa Mendez, PhD & Andrew Kuligowski
Optimizing Exam Schedules at Oklahoma State University
Lab 3 and HRP259 Lab and Combining (with SQL)
Never Cut and Paste Again
Lab 2 and Merging Data (with SQL)
Let’s Talk About Variable Attributes
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Use of PROC TABULATE Out File to Customize Tables
Automate Repetitive Programming Tasks: Effective SAS® Code Generators
File Sharing and Processing Grouped Data
Data Manipulation (with SQL)
5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.
a useful SAS 9.2 feature I wasn’t aware of *
Topics Introduction to File Input and Output
ECE 352 Digital System Fundamentals
Efficient Use of Disk Space in SAS® Application Programs
Creating Database Objects
Writing Robust SAS Macros
Changing a file from being long to being wide*
Presentation transcript:

A Better Way to Flip (Transpose) a SAS® Dataset Next Presentation: A Better Way to Flip (Transpose) a SAS® Dataset Presenter: Arthur Tabachneck Art holds a PhD from Michigan State University, has been a SAS user since 1974, is president of the Toronto Area SAS Society and has received such recognitions as the SAS Customer Value Award (1999), SAS-L Hall of Fame (2011), SAS Circle of Excellence (2012) and, in 2012, was recognized as being the first SAS user to have been awarded more than 10,000 points on the SAS Discussion Forums Copyright © 2010, SAS Institute Inc. All rights reserved. 1

A Better Way to Flip (Transpose) a SAS® Dataset Art297 KSharp Arthur Tabachneck Thornhill, ON Canada Xia Ke Shan Beijing, China Joe Whitehurst Robert Virgile Atlanta, GA Lexington, MA Joe Whitehurst Astounding Copyright © 2010, SAS Institute Inc. All rights reserved.

Presentation Overview What the %transpose macro is How the macro works The macro's benefits #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 3

What the %transpose macro is It's a SAS macro Looks and feels almost exactly like PROC TRANSPOSE Doesn't have all of the capabilities of PROC TRANSPOSE as it was designed for just one purpose: to convert tall files into wide files Has virtually the same options and statements as PROC TRANSPOSE + a few more Is easier to use than PROC TRANSPOSE Runs significantly faster than PROC TRANSPOSE #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 4

Have you ever had to flip a SAS dataset from being tall to being wide? i.e., from: idnum date var1 1 2001JAN SD 2001FEB EF 2001MAR HK 2 GH 2001APR MM 2001MAY JH #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 5

flipping a SAS dataset to: idnum var1_2001JAN var1_2001FEB var1_2001MAR var1_2001APR var1_2001MAY 1 SD EF HK   2 GH MM JH #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 6

if you have, you are probably already familiar with PROC TRANSPOSE data=have out=want (drop=_:) prefix=var1_; by idnum; var var1; id date; run; Not difficult, but you need to know which are options which are statements that you have to "drop" unwanted system variables that you have to specify a prefix and you have to ensure that your data has been pre-sorted #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 7

would you be interested in knowing how to obtain the same result with the following code? it may look like the PROC TRANSPOSE code, but: No system variables to drop %transpose( data=have, out=want, by=idnum, var=var1, id=date, delimiter=_) No need for a prefix (var names automatically included) No need to differentiate between options and statements as they are all of the form: parameter=value, easier to code (less to type) runs 10 times faster than PROC TRANSPOSE #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 8

if you needed to flip a more complex SAS dataset from being tall to being wide i.e., from: idnum date var1 var2 1 31MAR2013 SD 30JUN2013 2 EF 30SEP2013 3 HK 31DEC2013 4 HL 5 GH 6 MM 7 JH 8 MS #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 9

flipping a more complex SAS dataset to: idnum var1 Qtr1 var2 Qtr2 Qtr3 Qtr4 1 SD 2 EF 3 HK 4 HL 5 GH 6 MM 7 JH 8 MS #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 10

Again, you could use PROC TRANSPOSE but it would require two steps First you have to make the table even taller (i.e., one record for each by variable and var combination) proc transpose data=have out=tall ; by idnum date; var var1-var2; format date qtr1.; run; #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 11

That will create a taller file (i.e., 1 record for each by variable and var combination) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 12

That will create a taller file (with var names now in _NAME_ and values in COL1) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 13

Then, to make the table wide (i.e., one record for each by variable) You need to run PROC TRANSPOSE a 2nd time proc transpose data=tall out=want (drop=_:) delimiter=_Qtr; by idnum; id _name_ date; var col1; run; #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 14

But there are some problems with the method result: idnum var1 Qtr1 var2 Qtr2 Qtr3 Qtr4 1 SD 2 EF 3 HK 4 HL 5 GH 6 MM 7 JH 8 MS The numeric variables are now character variables #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 15

and if the first idnum was missing data for one date var1 var2 1 31MAR2013 SD 30JUN2013 2 EF 31DEC2013 4 HL 5 GH 6 MM 30SEP2013 7 JH 8 MS #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 16

the output variable order will be a bit distorted idnum var1 Qtr1 var2 Qtr2 Qtr3 Qtr4 1 SD 2 EF 3 HK 5 GH 6 MM 7 JH 8 MS #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 17

would you be interested in knowing how to obtain the right result with the following code? %transpose(data=have, out=need, by=idnum, id=date, format=qtr1., delimiter=_Qtr, var=var1-var2) How about if you knew that the macro: only requires one step only needs one pass through the data doesn't produce distorted results can run more than 50 times faster than PROC TRANSPOSE #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 18

if we have: dataset have How the macro works if we have: dataset have idnum date var1 var2 1 31MAR2013 SD 30JUN2013 2 EF 30SEP2013 3 HK 31DEC2013 4 HL 5 GH 6 MM 7 JH 8 MS #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 19

and we need: dataset need How the macro works and we need: dataset need idnum var1 Qtr1 var2 Qtr2 Qtr3 Qtr4 1 SD 2 EF 3 HK 4 HL 5 GH 6 MM 7 JH 8 MS and we submit: %transpose(data=have, out=want, by=idnum, id=date, format=qtr1., delimiter=_Qtr, var=var1-var2, sort=yes) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 20

How the macro works: since SORT was set to 'yes', the macro will run PROC SORT : with the NOEQUALS option (+ any options you declare in the SORT_OPTIONS parameter), and using a KEEP option : the macro will create and run a datastep like the following: data work.want (drop=date ___: var1-var2); set work.have (keep=idnum date var2 var1); by idnum notsorted; retain want_chr want_num; array have_chr(*)$ var2; array have_num(*) var1; array want_chr(*)$ var2_Qtr1 var2_Qtr2 var2_Qtr3 var2_Qtr4; array want_num(*) var1_Qtr1 var1_Qtr2 var1_Qtr3 var1_Qtr4; if first.idnum then call missing(of want_chr(*)); ___nchar=put(date,labelfmt.)*dim(have_chr); do ___i=1 to dim(have_chr); want_chr(___nchar+___i)=have_chr(___i); end; if first.idnum then call missing(of want_num(*)); ___nnum=put(date,labelfmt.)*dim(have_num); do ___i=1 to dim(have_num); want_num(___nnum+___i)=have_num(___i); if last.idnum then output; run; labelfmt. is a format, created by the macro, and reflects the ordered names of the transposed variables (i.e., from 1 to n) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 21

and you can use almost all the features that you can with PROC TRANSPOSE plus some additional ones %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 22

the %transpose macro's features the %transpose() features parameter: libnames the names of the SAS libraries where your data reside and where you want the transposed file written the %transpose() features %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * While data and out can be assigned 1 or 2 level filenames if your input or output files often use certain libraries you can assign them to these parameters as defaults #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 23

the %transpose macro's features the %transpose() features parameter: autovars determines whether char(acter), num(eric) or all variables should be transposed if the var parameter is null the %transpose() features %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * Where PROC TRANSPOSE will only include all numeric variables if there is no var statement, this parameter lets you indicate if you want all numeric variables, all character variables, or both #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 24

the %transpose macro's features parameter: var_first determines which is named first in transposed variables: YES: prefix var name delimiter id value %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 25

the %transpose macro's features parameter: var_first determines which is named first in transposed variables: YES: prefix var name delimiter id value NO: prefix id value delimiter var name %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 26

the %transpose macro's features parameter: sort whether the input dataset should be sorted (YES or NO): %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * If set to YES this parameter will ensure that your data are automatically presorted #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 27

How many of you have ever: forgotten to run proc sort before running another proc that required sorted data? run proc sort before another proc only to discover that the file hadn't been sorted? run proc sort before another proc but didn't include a keep dataset option to limit the amount of data that had to be processed? run proc sort but didn't include the noequals option? #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 28

Compare the performance of the following three sets of almost identical code run on a file with 40,000 records and 1,002 variables PROC SORT data=have out=need; by idnum date; run; took 6.96 seconds CPU time DATA need; set need; _name_="var1"; run; took 3.67 seconds CPU time PROC TRANSPOSE data=need out=want (drop=_:) delimiter=_Qtr; by idnum; var var1; id _name_ date; format date Qtr1.; run; took 1.51 seconds CPU time #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 29

Compare the performance of the following three sets of almost identical code run on a file with 40,000 records and 1,002 variables PROC SORT data=have (keep=idnum date var1) out=need noequals; by idnum date; run; took 0.48 seconds CPU time DATA need; set need; _name_="var1"; run; took 0.06 seconds CPU time PROC TRANSPOSE data=need out=want (drop=_:) delimiter=_Qtr; by idnum; var var1; id _name_ date; format date Qtr1.; run; took 0.43 seconds CPU time #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 30

Compare the performance of the following three sets of almost identical code run on a file with 40,000 records and 1,002 variables %transpose( var=var1, id=date, format=Qtr1., sort=yes, delimiter=_Qtr) took 0.46 seconds CPU time i.e., twice as fast as the optimized code and 25 times faster than the non-optimized code + on a more complex transposition of a larger dataset the macro ran more than 50 times faster #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 31

the %transpose macro's features parameter: sort whether the input data should be sorted (YES or NO): for both, only &by, &id, &var and &copy variables will be used %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * Where PROC TRANSPOSE will take up unnecessary system time unless you include a keep or drop statement, the macro will always ensure that only relevant variables are kept + if set to YES this parameter will ensure that your data are presorted using the noequals sort option and any other options you specify in the sort_options parameter #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 32

the %transpose macro's features parameter: sort_options whether additional options should be specified if the sort parameter is set to YES %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * While the keep and noequals sort options will always be used, based on your data there are other sort options you may want to specify that could increase efficiency and/or ensure that your data are sorted (e.g., presorted, tagsort and force) #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 33

the %transpose macro's features parameter: use_varname whether the transposed variable names should include the name of the var variable %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * With PROC TRANSPOSE var variable names are not automatically included in the new transposed variable names this parameter controls whether var variable names will be automatically included in the transposed variable names #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 34

the %transpose macro's features parameter: guessingrows the number of rows to be read to determine the correct order for the set of transposed variables %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * With PROC TRANSPOSE the transposed variables will be in the order they are initially found in the data this parameter controls the order based on the values found in the first guessingrows' records #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 35

the %transpose macro's features parameter: all parameters %transpose( libname_in=, libname_out=, data=, out=, by=, prefix=, var=, autovars=, id=, var_first=, format=, delimiter=, copy=, drop=, sort=, sort_options=, use_varname=, guessingrows=) * * * * * * F E A T U R E * * * * * * Since they are all macro named parameters you have direct control over their default values If you set them to commonly used values, they don't have to be specified UNLESS you want to change their value #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 36

Benefits of the approach less typing thus fewer errors contains some features that would be nice to see available with all SAS procs easier to learn than PROC TRANSPOSE runs faster than PROC TRANSPOSE insures that you ALWAYS get the benefit of two critical SAS efficiency methods more likely to provide the desired results #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 37

Presentation Overview What the %transpose macro is  How the macro works  The macro's benefits  #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 38

The macro, paper and Powerpoint can be found at: http://www.sascommunity.org #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 39

The macro, paper and Powerpoint can be found at: http://www.sascommunity.org #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 40

The macro, paper and Powerpoint can be found at: http://www.sascommunity.org #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 41

The macro, paper and Powerpoint can be found at: http://www.sascommunity.org #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 42

Questions? #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved. 43

Your comments and questions are valued and encouraged Contact the Authors Arthur Tabachneck, Ph.D. President, myQNA, Inc. Thornhill, ON art297@rogers.com Joe Whitehurst High Impact Technologies Atlanta, GA joewhitehurst@gmail.com Robert Virgile Robert Virgile Associates, Inc. Lexington, MA rvirgile@verizon.net Xia Ke Shan Beijing, China xiakeshan@yahoo.com.cn #SASGF11 Copyright © 2010, SAS Institute Inc. All rights reserved.