Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Chapter 5: Creating and Managing Tables using PROC SQL 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Chapter 9: Introducing Macro Variables 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Performing Queries Using PROC SQL Chapter 1 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Basic And Advanced SAS Programming
SAS: Managing Memory and Optimizing System Performance Jacek Czajkowski 09/29/2008.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Chapter 3: Combining Tables Horizontally using PROC SQL 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 10:Processing Macro Variables at Execution Time 1 STAT 541 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
SE: CHAPTER 7 Writing The Program
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Chapter 19: Introduction to Efficient SAS Programming 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Controlling Input and Output
SAS Basics. Windows Program Editor Write/edit all your statement here.
Lecture 4 Ways to get data into SAS Some practice programming
An Introduction Katherine Nicholas & Liqiong Fan.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 4: Combining Tables Vertically using PROC SQL 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
17b.Accessing Data: Manipulating Variables in SAS ®
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Chapter 23: Selecting Efficient Sorting Strategies 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Chapter 6: Modifying and Combining Data Sets
Chapter 19: Introduction to Efficient SAS Programming
Chapter 13: Creating Samples and Indexes
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Former Chapter 23: Selecting Efficient Sorting Strategies
Chapter 1: Introduction to SAS
Chapter 4: Sorting, Printing, Summarizing
Chapters 5 and 7 supplement
Chapter 24 (4th ed.): Creating Functions
Introduction to DATA Step Programming: SAS Basics II
Using Macros to Solve the Collation Problem
Chapter 24: Querying Data Efficiently
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina

2 Outline Executing Only Necessary Statements Executing Only Necessary Statements Eliminating Unnecessary Passes through the Data Eliminating Unnecessary Passes through the Data Reading and Writing Only Essential Data Reading and Writing Only Essential Data Storing Data in SAS Data Sets Storing Data in SAS Data Sets Avoiding Unnecessary Procedure Invocation Avoiding Unnecessary Procedure Invocation

3 Executing Only Necessary Statements Position of subsetting IF Position of subsetting IF –Even when subsetting IF is constructed from calculated variables, try to position it as soon as possible in the program to prevent unnecessary processing of cases that will not be output

4 Executing Only Necessary Statements data doerasch; data doerasch; set doe2012.doedb; nonmiss=nmiss(of q1-q25); if nonmiss le 7; array q q1-q25; sone=0;stwo=0;sthree=0;sfour=0;sfive=0;totscore=0; do over q; totscore+q; if q eq 1 then sone+1; else if q eq 2 then stwo+1; else if q eq 3 then sthree+1; else if q eq 4 then sfour+1; else if q eq 5 then sfive+1; end;run;

5 Executing Only Necessary Statements Using Conditional Logic Efficiently Using Conditional Logic Efficiently –IF-THEN/ELSE is efficient when  condition is based on character values  Data values are unevenly distributed  Condition comprises a small number of cases  Condition is ordered by frequency of occurrence  ELSE IF and DO/END constructions are used –SELECT (we’ve seen this once before) is efficient when  condition is based on numeric variables  Data values are evenly distributed  DO/END constructions are used

6 Executing Only Necessary Statements SELECT example SELECT example data a; set b; select(varname); when(num1) do;.. end; when(num2) do;.. end;.. otherwise do;.. end; end;

7 Executing Only Necessary Statements IF-THEN/ELSE conditions suggest a macro IF-THEN/ELSE conditions suggest a macro Macro here;

8 Eliminating Unncessary Passes through the Data Create multiple data sets from a single data set with OUTPUT (vs. multiple subsettingIFs) Create multiple data sets from a single data set with OUTPUT (vs. multiple subsettingIFs) data a b; set c; if condition1 then output a; else output b; run;

9 Eliminating Unncessary Passes through the Data Use WHERE in PROCs (e.g., PROC PRINT and PROC SORT) rather than a subsetting IF in a DATA step following by the PROC Use WHERE in PROCs (e.g., PROC PRINT and PROC SORT) rather than a subsetting IF in a DATA step following by the PROC Use PROC DATASETS to modify data attributes rather than the DATA step Use PROC DATASETS to modify data attributes rather than the DATA step –Cannot modify data (e.g., LENGTH, type), only descriptions (e.g., LABEL, NAME)

10 Reading and Writing Only Essential Data WHERE is more efficient than subsetting IF for variables in the input data set WHERE is more efficient than subsetting IF for variables in the input data set –WHERE selects cases in the input buffer –Subsetting IF selects cases after they have been loaded from the input buffer into the PDV (Program Data Vector) Subsetting IF can operate on any variable in the PDV, including newly-created variables Subsetting IF can operate on any variable in the PDV, including newly-created variables Subsetting IF can operate on external data sets Subsetting IF can operate on external data sets Subsetting IF can be embedded in conditional statements Subsetting IF can be embedded in conditional statements

11 Reading and Writing Only Essential Data proc print data=a (firstobs=4 obs=3); where condition; FIRSTOBS and OBS refer to the observations in the subset, not the original data set FIRSTOBS and OBS refer to the observations in the subset, not the original data set

12 Reading and Writing Only Essential Data When creating a new data set from an external file (e.g., called with an INFILE statement), position subsetting IF after the INPUT statement to improve efficiency When creating a new data set from an external file (e.g., called with an INFILE statement), position subsetting IF after the INPUT statement to improve efficiency –This works when the INPUT statement only reads a subset of the variables in the external file (e.g., through column-formatting)

13 Reading and Writing Only Essential Data Using KEEP= and DROP= in a SET statement is more efficient than using them in either a DATA statement, or using KEEP and DROP statements in a DATA step Using KEEP= and DROP= in a SET statement is more efficient than using them in either a DATA statement, or using KEEP and DROP statements in a DATA step –Reasoning is similar to reasoning for preferring WHERE over a subsetting IF (with similar disadvantages as well)

14 Storing Data in SAS Data Sets Advantages of storing data in a permanent SAS data set rather than reading from a raw data file when Advantages of storing data in a permanent SAS data set rather than reading from a raw data file when –More efficient when repeatedly using the data set in PROC or DATA steps –Self-documentation (labels, names, lengths, etc) –No processing of the raw data before using

15 Avoid Unnecessary Process Invocation Use procedures that can create multiple reports Use procedures that can create multiple reports –PROC SQL –DATASETS (you can process multiple data sets by including MODIFY/RUN blocks in PROC DATASETS, or use implicit RUN statements) –FREQ (e.g. TABLE A B A*B ) –TABULATE –BY groups –RUN group processing (DATASETS, CHART, PLOT, GLM REG, DATASETS