Effecting Efficiency Effortlessly Daniel Carden, Quanticate.

Slides:



Advertisements
Similar presentations
Adders Used to perform addition, subtraction, multiplication, and division (sometimes) Half-adder adds rightmost (least significant) bit Full-adder.
Advertisements

Chapter 11 Introduction to Programming in C
Relational Database and Data Modeling
Credit hours: 4 Contact hours: 50 (30 Theory, 20 Lab) Prerequisite: TB143 Introduction to Personal Computers.
Tutorial 9 – Creating On-Screen Forms Using Advanced Table Techniques
Haas MFE SAS Workshop Lecture 3:
Disk Storage, Basic File Structures, and Hashing
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Database Performance Tuning and Query Optimization
Chapter 4 Memory Management Basic memory management Swapping
Campaign Overview Mailers Mailing Lists
Organisation Of Data (1) Database Theory
Copyright © 2006, SAS Institute Inc. All rights reserved. Think FAST! Use Memory Tables (Hashing) for Faster Merging Gregg P. Snell Data Savant Consulting.
Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Executional Architecture
Choosing an Order for Joins
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Chapter 8 Improving the User Interface
S ORTING WITH SAS L ONG, VERY LONG AND LARGE, VERY LARGE D ATA Aldi Kraja Division of Statistical Genomics SAS seminar series June 02, 2008.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
1 Appendix A: Writing and Submitting SAS ® Programs A.1 Writing and Submitting SAS Programs.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Introduction of z/OS Basics © 2006 IBM Corporation Chapter 5: Working with data sets.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
SAS: Managing Memory and Optimizing System Performance Jacek Czajkowski 09/29/2008.
SAS SQL SAS Seminar Series
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
Key Data Management Tasks in Stata
Creating and Managing Indexes Using Proc SQL Chapter 6 1.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 4: Working with data sets.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Understanding Allocations Brian Chizever Cognos Corporation.
Formats to the Rescue Gary McQuown Data and Analytic Solutions Inc. Fairfax, VA.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
( ) 1 Chapter # 8 How Data is stored DATABASE.
ECONOMETRICS ii – spring 2018
Chapter 1: Introduction to SAS
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Stata Basic Course Lab 2.
Presentation transcript:

Effecting Efficiency Effortlessly Daniel Carden, Quanticate

SAS VIEWS WHERE STATEMENTS EFFICIENT CODE STRUCTURING SKIP MACRO FORMAT LIBRARIES CONTENTS:

Efficiency Metrics CPU time = the time the Central Processing Unit spends performing the operations you assign. I/O time = the time the computer spends on two tasks, input and output. Input refers to moving the data from storage areas such as disks or tapes into memory. Output refers to moving the results out of memory to storage or to a display device. Real time = clock time. Memory = the size of the work area that the CPU must devote to the operations in the program. Another important resource is data storage - how much space on disk/tape. A gain in efficiency is not usually absolute. A few programming techniques do improve performance in all areas.

Three types of SAS data view: DATA step views are a type of data step program. PROC SQL views are stored query expressions that read data values from their underlying files, which can include SAS data files, SAS/ACCESS views, DATA step views, other PROC SQL views, or relational database data. SAS/ACCESS views (also called view descriptors) describe data that is stored in DBMS (Database Management System) tables. SAS VIEWS

SAS datasets: SAS views vs. SAS data files Descriptor portion: name and properties of the data set : e.g. when it was created, number of observations and variables. Data portion contains the data values. SAS data file stores descriptor information and data values together. A SAS data view defines a virtual data set. It has the information required to access data values and is stored separately from the data values. SAS data file Descriptor portion Data portion Name and properties of dataset SAS data view References Data values Descriptor portion

SAS data views syntax: data labs / view = labs; set labsdata; gender = sex; label gender = 'Gender Type'; mid = (lowrang + hirang)/2; run; data labs2; set labs; run;

SAS views and resources SAS views cut I/O time and hence real time. Negligible effect on CPU time or increase it slightly. Best used when real execution times greatly exceed CPU times. If a large dataset is used as an intermediate dataset more than once then use a SAS view in the code. *Drawbacks of SAS views: fewer errors in log and cannot overwrite

data labs; set labsdata; gender = sex; label gender = 'Gender Type'; mid = (lowrang + hirang)/2; run; NOTE: DATA statement used: real time seconds cpu time 0.76 seconds data labs2; set labs; run; NOTE: DATA statement used: real time seconds cpu time 0.93 seconds data labs / view = labs; set labsdata; gender = sex; label gender = 'Gender Type'; mid = (lowrang + hirang)/2; run; NOTE: DATA STEP view saved on file WORK.LABS. NOTE: A stored DATA STEP view cannot run under a different operating system. NOTE: DATA statement used: real time 0.01 seconds cpu time 0.01 seconds data labs2; set labs; run; NOTE: View WORK.LABS.VIEW used: real time seconds cpu time 0.59 seconds NOTE: DATA statement used: real time seconds cpu time 1.10 seconds Total = 0.01s s = 21.66s Total = 17.39s s = 46.14s Method 1: Method 2:

Input Data SetInput Buffer Input data set variables - Automatic variables - New variables Output Data Set Output Buffer WHERE condition IF condition WHERE STATEMENTS

EFFICIENT CODE STRUCTURING

Invoke macros only when needed: Sort first, then invoke macro!!

Commenting out code by /* */: Advantages = Quick & ideal for making small comments Disadvantages = Can cause errors if left accidentally in code Can unintentionally comment out items if not closed Will still show commented-out code in the log Needs to be repeated if the code is already commented… SKIP MACRO

Skipping code with SKIP MACRO: EXAMPLE: 5 /* */ required The more comments, the more /* */s!! 1 2 EASY!

SKIP MACRO Syntax %macro skip; %mend skip; NB: Dont leave an unclosed %macro, will treat all submitted as macro code. Always close with %mend.

Efficient to restrict amount of data being read in by SAS. -A SAS Index is similar to a search function, allowing access to a subset of records from a large data set -Format libraries offer another way to subset the data FORMAT LIBRARIES

Scenario: D1 Height, weight, ethnicity for Patient 1 and Patient 2. D2 Lab test #1 results for Patient 1, Patient 2, Patient 3, Patient 4. Height, weight, ethnicity for Patient 1 and Patient 2. Lab test #1, #2, #3 results for Patient 1 and Patient 2. Situation: Objective: D3 Lab test #2 results for Patient 1, Patient 2, Patient 3, Patient 4. D4 Lab test #3 results for Patient 1, Patient 2, Patient 3, Patient 4.

Create a Format Library: data D1; set rawdata.D1; start = subjid; fmtname = '$Fsubj'; label = 'Y'; type = 'C'; run; proc format cntlin = D1; PROC format is used with the CNTLIN option to create the dataset into a Format Library. Need the following variables to do this: *START: The value to format into a label (the KEY). FMTNAME: The name of the format being created, which can be anything except the name of a format which is already defined. When the KEY is character, FMTNAME must start with a $ just like any PROC FORMAT value. TYPE: Either character (C) or numeric (N) format. LABEL: The label given to the KEY variable. This can be anything, but must not be the first byte in the KEY. *NB: There must not be any duplicates of the variable used as the KEY variable.

data D234; set D2 D3 D4; by subjid; run; data D1; set rawdata.D1; start = subjid; fmtname = '$Fsubj'; label = 'Y'; type = 'C'; run; data combine; merge D1 (in = a) D234 (in=b); by subjid; if a and b; run; proc format cntlin = D1; data D234; set D2 D3 D4; by subjid; if put (subjid,$Fsubj.)='Y'; run; data combine; merge D1 (in = a) D234 (in=b); by subjid; if a and b; run; BLUE code = Format library method RED code = Standard method CPU time: 11.24s. Real time: 2m37s CPU time: 12.25s. Real time: 5m53s

Effecting Efficiency Effortlessly Thanks for listening!