For a programming more efficient Claude Guyot PhUSE 2010 – Berlin Paper CS05.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Understanding Fossil Butte
Haas MFE SAS Workshop Lecture 3:
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
DIVERSE REPORT GENERATION By Chris Speck PAREXEL International Durham, NC.
P5, M1, D1.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Outline Proc Report Tricks Kelley Weston. Outline Examples 1.Text that spans columnsText that spans columns 2.Patient-level detail in the titlesPatient-level.
IPM Exam PreparationIPM Exam Preparation November 12 th pm – 5.15pm 2 hours & 15 minutes reading time.
Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth.
Gaining QWC* marks in Science *quality of written communication
1 Relational Query Optimization Module 5, Lecture 2.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
285 Final Project. Document Specification: Rough Draft Due April 10th Purpose: Purpose: Economy of effort Economy of effort Input from instructors and.
185 Final Project (Also covers Project Proposal and Document Specification)
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
Chapter 7 Indexing Objectives: To get familiar with: Indexing
Fifth Grade Science Project
Promoting Excellence in Family Medicine Enabling Patients to Access Electronic Health Records Guidance for Health Professionals.
185 Final Project (Also covers Project Proposal and Document Specification)
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
How to Write a Scientific Paper Hann-Chorng Kuo Department of Urology Buddhist Tzu Chi General Hospital.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Efficiencies with Large Datasets Greater Atlanta SAS Users Group July 18, 2007 Peter Eberhardt.
SAS SQL SAS Seminar Series
Lect 6 chapter 3 Research Methodology.
Exploring Microsoft Office XP - Microsoft Word 2002 Chapter 71 Exploring Microsoft Word Chapter 7 The Expert User: Workgroups, Forms, Master Documents,
BIOGRAPHY REPORT. GETTING STARTED A biography is simply a story written about someone’s life. For this project you will read a biography (or autobiography).
Understanding Fossil Butte
1 INITIAL SETUP OF THE ST ScI ELECTRONIC GRANTS MANAGEMENT SYSTEM BY AO DESIGNEES September, 2000.
PROC SQL Phil Vecchione. SQL Structured Query Language Developed by IBM in the early 1970’s From the 70’s to the late 80’s there were different types.
Strategies for Technical Communication in the Workplace
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
SE: CHAPTER 7 Writing The Program
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
1 SAS 1-liners SAS Coding Efficiencies. 2 Overview Less is more Always aim for robust, reusable and efficient code Coding efficiency versus processing.
CS146 References: ORACLE 9i PROGRAMMING A Primer Rajshekhar Sunderraman
Computer Science Project Criteria. Computer Science Project The project is intended to simulate the analysis, design, progamming and documentation stages.
Oracle tuning: a tutorial Saikat Chakraborty. Introduction In this session we will try to learn how to write optimized SQL statements in Oracle 8i We.
Prepare an Asset List Project 4 Due date: Friday, September 24 th.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services.
Research Report Writing Presentation How to write a complete research report Part 1: Introduction.
NOTETAKING Adapted by Jane Luddy MEd.
Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
1 Much ADaM about Nothing – a PROC Away in a Day EndriPhUSE Conference Rowland HaleBrighton (UK), 9th - 12th October 2011.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
 CONACT UC:  Magnific training   
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
SQL Server Statistics and its relationship with Query Optimizer
Components of thesis.
Supporting Patients’ Choices to Avoid Long Hospital Stays
ECONOMETRICS ii – spring 2018
Lecture 12 Lecture 12: Indexing.
Prentice Hall Reader Chapter 6
Author: Kaiqing Fan Company: Mastech Digital Inc.
Feedback open / closed questiond
Memo Writing.
Biology Writing a Lab Report
Four Rules for Effective Writing
Aim: How do I ACE the SAQ (Short Answer Question)?
Presentation transcript:

For a programming more efficient Claude Guyot PhUSE 2010 – Berlin Paper CS05

2 Agenda Reasons for a programming more efficient Save the time of programming Save your time and the time of the others Save the space

3 Reasons for a programming more efficient Requirements of Health autorities lead to work with more and more data Data pooling Multiplication of new standards Answers to Health Authorities on recent and old studies Deadlines often very short

4 Reasons for a programming more efficient Full SAS work

5 Reasons for a programming more efficient Slowness of the network

6 Save the time of programming Programs must be clear A program is not just a succession of statements, data steps or procedures in only one block All the different steps must be well dissociated by changing lines for each new statement, skipping lines to differentiate clearly all the parts, using the indentation to see clearly the beginning and the end of a data step, of a procedure or of a DO loop…

7 Save the time of programming Programs must be well documented Use a header at the beginning of the program to give: all useful information to explain the goal of the program, the pre-requisites necessary for the program, the history of all the changes/updates of the program the people who have worked on the program (author and all people who have brought updates) Insert comments in the program to explain each major step

8 Save the time of programming In Sanofi-Aventis all these rules are described in the Good Programming Practice (GPP) For some standard macro-programs, a document is written for each macro same information as the header but more detailled Explanation of the different macro-parameters different concret examples limits of the macro In conclusion, think to facilitate the work of the others

9 Save your time and the time of the others Simple rules Select the variables Keep only what it is useful for the aim of the program in the data steps and in the procedures to reduce the quantity of data to manage Gain CPU time and work space Statement KEEP to keep only useful variables Statement DROP to drop unnecessary variables

10 Save your time and the time of the others

11 Save your time and the time of the others The WHERE clause instead of IF statement In a DATA step, the WHERE clause allows to select records without reading all the dataset The IF condition reads all the dataset and then selects the records

12 Save your time and the time of the others

13 Save your time and the time of the others Choice the better way to join 2 datasets –Example: select the records of a dataset according to a selection done in a first dataset PATIENT datasetTHER dataset PATTHER_joint dataset

14 Save your time and the time of the others Merge of both datasets 2 PROC SORT 1 data step to merge

15 Save your time and the time of the others Joint with SQL Joint with macro-variable

16 Save your time and the time of the others Joint with a format

17 Save your time and the time of the others Joint with Hash method

18 Save your time and the time of the others Joint with Index

19 Save your time and the time of the others Performances of all methods: Windows

20 Save your time and the time of the others Performances of all methods: Windows

21 Save your time and the time of the others Performances of all methods: Unix

22 Save your time and the time of the others Performances of all methods: Unix

23 Save the space To fit the length of the variables as soon as possible (according to the CDISC recommendations) Not too small to not truncate the information But not too large to not take space unnecessarily Default length of the numeric variables is 8 but, according to the kind of information, can be less

24 Save the space Windows Unix

25 Save the space Compress the datasets Compression done on the character variables Important gain of space if lot of character variables But reversed effect if number of character variables not important enough due to creation of a flag at each record Decompression at each use of the dataset Decompression takes time To assess the benefit of the gain of space versus the time for decompression

26 Save the space

27 Save the space

28 Conclusion A lot of tricks exist to optimize the programs But, if each trick improves an aspect (CPU time or Space), it can have negative effect on the other aspect An assessment must be done prior to the choice in accordance with the priorities of the bussiness and with technology already in place

29 Thank you for your attention !! Contact: (33) Questions ? ? ?