Hints and Tips SAUSAG Q2 2015. SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Haas MFE SAS Workshop Lecture 3:
MS-Access XP Lesson 1. Introduction to MS-Access Database Management System Software (DBMS) Store data in databases Database is a collection of table.
Axio Research E-Compare A Tool for Data Review Bill Coar.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Outline Proc Report Tricks Kelley Weston. Outline Examples 1.Text that spans columnsText that spans columns 2.Patient-level detail in the titlesPatient-level.
Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Bellevue University CIS 205: Introduction to Programming Using C++ Lecture 3: Primitive Data Types.
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Extending MATLAB Write your own scripts and/or functions Scripts and functions are plain text files with extension.m (m-files) To execute commands contained.
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
Chapter 10:Processing Macro Variables at Execution Time 1 STAT 541 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Stacking Rich Text Format (RTF) - %SRiT Duong Tran – Independent Contractor, London, UK Stacking Rich Text Format (RTF) - %SRiT Duong Tran – Independent.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
RTSUG 04Feb2014: Beyond Directory Listings in SAS By: Jim Worley.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
Key Data Management Tasks in Stata
Prepared by: Luigi Muro – Consultant
File Systems Long-term Information Storage Store large amounts of information Information must survive the termination of the process using it Multiple.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
INTRODUCTION TO SAS MACRO PROCESSING James R. Bence, Ph.D., Co-Director Quantitative Fisheries Center Professor Department of Fisheries and Wildlife March.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 Back Up with Each Submit One approach for keeping a dynamic back up copy of your current work.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
1 Unknown Knowns: Database Construction from Unknown Files and Variables William Klein.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
Define your Own SAS® Command Line Commands Duong Tran – Independent Contractor, London, UK Define your Own SAS® Command Line Commands Duong Tran – Independent.
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007.
SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
CC07 PhUSE 2011 Seven Sharp tips for Clinical Programmers David Garbutt Rohit Banga BIOP AG.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
CS4432: Database Systems II
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Using the Macro Facility to Create HTML, XML and RTF output Rick Langston, SAS Institute Inc.
Silberschatz and Galvin  C Programming Language Kingdom of Saudi Arabia Ministry of Higher Education Al-Majma’ah University College of Education.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Eclipse Navigation & Usage.
Chapter 2: Getting Data into SAS
Two “identical” programs
Exceptions and files Taken from notes by Dr. Neil Moore
Creating Macro Variables in SQL (Review)
ISC440: Web Programming 2 Server-side Scripting PHP 3
Instructor: Raul Cruz-Cano
Topics Introduction to File Input and Output
Exceptions and files Taken from notes by Dr. Neil Moore
Chance to make SAS-L History!
Stata Basic Course Lab 2.
Topics Introduction to File Input and Output
Tips and Tricks for Using Macros to Automate SAS Reporting.
Automate & Zip files Using MACRO
Writing Robust SAS Macros
Presentation transcript:

Hints and Tips SAUSAG Q2 2015

SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with a duplicate key. NOUNIQUERECS and UNIQUEOUT are related options. proc sort data=results out=chk_results uniqueout=rest_of_results nouniquekeys; by key_1 key_n ; run; Prior to that the usual method was: proc sort data=results out=chk_results; by key_1 key_n ; run; data chk_results; set chk_results; by key_1 key_n ; if not (first.key_n and last.key_n); run; 2

SQL macro variables - TRIMMED By default when creating macro variables in PROC SQL leading and trailing blanks are retained. This can be inconvenient especially for numerics, and usually solved by adding a %LET mvar = &mvar. In 9.3 the TRIMMED option was added to address this. Eg: data raw_data; value = 2; output; value = 15; output; value = 1; output; run; proc sql noprint; select sum(value) into :total_u from raw_data ; select sum(value) into :total_t TRIMMED from raw_data ; quit; %put Total (untrimmed) : ***&total_u***; %put Total (trimmed) : ***&total_t***; %let total_u = &total_u; %put Total (re-trimmed): ***&total_u***; run; Giving: Total (untrimmed) : *** 18*** Total (trimmed) : ***18*** Total (re-trimmed): ***18*** 3

SQL macro variable range creation In 9.3 it is possible to specify an open range when creating macro variables in PROC SQL. The old method was to specify a large end value or (more creatively) to use &sysmaxlong. Eg: proc sql noprint; select distinct(value) into :o_val1-:o_val from raw_data ; select distinct(value) into :val1- from raw_data ; quit; %let num_vals = &sqlobs; %put Number of distinct values: &num_vals; %put &val1, &val2, &val3; run; Giving: Number of distinct values: 3 1, 2, 15 4

DOSUB and DOSUBL DOSUB and DOSUBL were introduced as experimental functions in 9.3 (production in 9.4) and provide an extension to the CALL EXECUTE concept. They provide the ability to immediately execute SAS code and then return to the calling data step, whereas CALL EXECUTE stacks the code for execution after the data step has completed (but it does immediately resolve macro calls). DOSUB takes a quoted literal string which is a file reference containing code to be executed, and DOSUBL takes (only) a literal string of the code to be executed. For example: data dosubtst ; rc1 = dosubl('data tst; a=42; run;') ; rc2 = dosubl('%runcode(parm);'); run; Most uses would be to call a macro for convenience. A return code of 0 means the code could be executed, non zero not. 5

DOSUB and DOSUBL (cont) This opens the possibility of executing global statements such as LIBNAME, macros and data step(s) within the code and then accessing the results (via macro variables) or via the OPEN, FETCH and CLOSE functions. There are plenty of examples of interesting usage on the net, for example: There is a problem with macro variables being passed before 9.3 TS1M2 so check for the workaround if applicable. Over is an example of how it can be used to create recursive code (but this can be dangerous!) 6

DOSUB and DOSUBL (cont) Just for later perusal, recursive code to calculate a factorial: filename code temp; data _null_; file code; put 'data _null_;'; put ' x = input(symget("parm"),32.);'; put ' y = coalesce(input(symget("control"),32.),1);'; put ' if x > 0 then do;'; put ' call symputx("control",x*y);'; put ' call symputx("parm",x-1);'; put ' rc = dosub("code");'; put ' end;'; put ' else'; put ' call symputx("result",y);'; put 'run;'; run; data factorial; x = 5; call symputx('parm',x); call symputx('control',.); rc = dosub('code'); y = input(symget('result'),32.); run;

Tracking task progress in EG The SYSECHO global statement which displays a string in the EG task status bar (and window) can be combined with the DOSUBL function to build a neat task progress display. It takes the total number of records that will be processed plus how often the display is to be updated, and maintains a running display on the status bar. There is a undocumented bug in 9.3 TS1M0 (probably fixed in M2), which abends the task after 32 DOSUBL calls within a data step, so set to say 5% display as a minimum. %macro display_pct_complete(totalrecs,by_pct=10); if int(100*(_n_-1)/(&totalrecs)) ne int(100*_n_/(&totalrecs)) then do; if mod(int(100*(_n_-1)/(&totalrecs)),&by_pct) = 0 then do; * Report on each complete by_pct ; drop _rc _s; _rc = dosubl(cat('SYSECHO "Percentage complete: ', put(int(100*(_n_-1)/(&totalrecs)),3.), '%";')); _s = sleep(.1,1); * A delay for fast code ; end; end %mend display_pct_complete; data results; set large_dataset nobs=totrecs; * processing ; %display_pct_complete(totrecs,by_pct=1); run; 8

ZIP file processing New with 9.4 is the FILENAME ZIP access method which makes processing standard WinZip like zip files much easier compared to using the undocumented SASZIPAM filename engine or unnamed pipes. It makes the zip file look and act like a directory, allowing selective file read/write access. It does have a limitation in that it won’t handle other zip types like bzip2, so pipes still have their place, so long as the data in a line feed delimited format not binary. This is an example of using a complicated pipe construct to read a group of related datasets (ID_DATA_01, ID_DATA_02 etc) from a zip file containing bzipped members without having to unzip any of them, something the new ZIP engine can’t handle. The data is CSV-like data. filename archive pipe "unzip -p '&latest_archive' 'ID_DATA_*.csv.bz2' | bunzip2"; data id_data; infile archive dsd dlm='~' termstr=lf missover lrecl=300; length id $20. type_code $6. ; input id type_code ; run; 9

ZIP file processing (cont) The advantage of the FILENAME ZIP access method is that all the standard, and more importantly, the less used filename options are available (and work properly). Probably the most useful is the binary streaming, or RECFM=S option. filename inzip zip "path/ebcdic_data.zip" member="VB_data"; * Reads a variable blocked mainframe sourced EBCDIC file with RDW from a ZIP archive ; data ebcdic_data; infile inzip recfm=s nbyte=_datalen; length line $300.; * maximum variable line length ; * Read the (4 byte) Record Descriptor Word to determine the line length ; _datalen = 4; input; * Reset the amount of data to read next based on the RDW (only the first 2 bytes used); * and save the line length in the dataset. ; _datalen = input(_infile_,s370fibu2.)-4; data_len = _datalen; * Read the exact number of bytes in the variable length line ; input; line = _infile_; run; 10

Questions? 11