Many to many, too many performance tests Christoph Baumer, Biometrical Practice BIOP, Basel, Switzerland PhUSE 2011 CS03.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Haas MFE SAS Workshop Lecture 3:
Copyright © 2006, SAS Institute Inc. All rights reserved. Think FAST! Use Memory Tables (Hashing) for Faster Merging Gregg P. Snell Data Savant Consulting.
Axio Research E-Compare A Tool for Data Review Bill Coar.
SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
S ORTING WITH SAS L ONG, VERY LONG AND LARGE, VERY LARGE D ATA Aldi Kraja Division of Statistical Genomics SAS seminar series June 02, 2008.
1 R elational D ata B ase A id Copyright © 2002 Sakman Software Corp.
Creating and Managing Views Using PROC SQL Chapter 7 1.
Performing Queries Using PROC SQL Chapter 1 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
J4www/jea Week 3 Version Slide edits: nas1 Format of lecture: Assignment context: CRUD - “update details” JSP models.
SAS PROCs ISYS 650. PROC Statement Syntax PROC name options; Statements statement options; … RUN;
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Basic And Advanced SAS Programming
PROC SQL – Select Codes To Master For Power Programming Codes and Examples from SAS.com Nethra Sambamoorthi, PhD Northwestern University Master of Science.
SAS SQL SAS Seminar Series
1 Software Testing (Part-II) Lecture Software Testing Software Testing is the process of finding the bugs in a software. It helps in Verifying and.
Different Decimal Places For Different Laboratory Tests PharmaSug 2004, TT01 A. Cecilia Mauldin.
1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Report Management Using the ODS DOCUMENT Destination and Report Metadata Brit Harvey February 2010.
Creating and Managing Indexes Using Proc SQL Chapter 6 1.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
PhUSE 20141October 2014 Ziekte gebied/ Overall subject Name presenterMonth-Year Title presentation PhUSE 2014 Berber SnoeijerOct 2014 Simple and Efficient.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
1 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Accessing Remote Data 2.4 Importing Text Files.
A SASInstitute SAS Advanced Programming Exam for SAS 9 Thousands of IT Professionals before you have already passed their A certification exams.
Formats to the Rescue Gary McQuown Data and Analytic Solutions Inc. Fairfax, VA.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Writing and Reading XML files with SAS (Statistical Analysis System) What is SAS ? SAS Institute (or SAS, pronounced "sass") is an American developer of.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
An Introduction Katherine Nicholas & Liqiong Fan.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
For a programming more efficient Claude Guyot PhUSE 2010 – Berlin Paper CS05.
CC07 PhUSE 2011 Seven Sharp tips for Clinical Programmers David Garbutt Rohit Banga BIOP AG.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
1 Ready To Become Really Productive Using PROC SQL? Sunil Gupta Gupta Programming.
Software Engineering Algorithms, Compilers, & Lifecycle.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
IFS180 Intro. to Data Management Chapter 10 - Unions.
SQL Database Management
Chapter 10: Accessing Relational Databases (Self-Study)
Putting tables together
Current outstanding balance
PROC SQL, Overview.
Integrity Constraints
Subsetting Rows with the WHERE clause
Inner Joins.
Combining Data Sets in the DATA step.
Lab 2 and Merging Data (with SQL)
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
3 Views.
Presentation transcript:

Many to many, too many performance tests Christoph Baumer, Biometrical Practice BIOP, Basel, Switzerland PhUSE 2011 CS03

1/10/2010Many to many, too many performance tests2 In software engineering, performance testing is testing that is performed, to determine how fast some aspect of a system performs under a particular workload. It can also serve to validate and verify other quality attributes of the system, such as scalability, reliability and resource usage. Wikipedia

1/10/2010Many to many, too many performance tests3 In software engineering, performance testing is testing that is performed, to determine how fast some aspect of a system performs under a particular workload. It can also serve to validate and verify other quality attributes of the system, such as scalability, reliability and resource usage. Wikipedia

1/10/2010Many to many, too many performance tests4 What is needed for performance testing in a clinical environment? - Transparent approach - Modular and flexible setting of testing environments - Simulation of real world scenarios

1/10/2010Many to many, too many performance tests5 Available options 1. Running a single program in a batch job. 2. Running a program multiple times. 3. Running programs in different orders

1/10/2010Many to many, too many performance tests6 Many to many Example – merging WHO ATC text with drugname WHO.THG medprodATCcodeofficial 5C02ABY 47R03CBY 373G03AAN J07BCN D06BBY D06BBY S01GXY S01GXY WHO.MP MedprodDrugname ALOMIDE BONDIL BONDIL EVISTA EVISTA HUMALOG IMDUR IMDUR WHO.ATC ATCcodeATCtxt S01FBSympathomimetics excl. antiglaucoma preparations S01GDECONGESTANTS AND ANTIALLERGICS S01GASympathomimetics used as decongestants S01GXOther antiallergics S01HLOCAL ANESTHETICS S01HALocal anesthetics S01JDIAGNOSTIC AGENTS S01JAColouring agents

1/10/2010Many to many, too many performance tests7 Many to many Programs Program 1 & 2: SQL join Program 3 & 4: Point option Program 5:Formats Program 6: Sorting and merging

1/10/2010Many to many, too many performance tests8 Program 1: %let description=sql inner join; proc sql; create table atc1 as select atc.atctxt, mp.drugname from thg inner join atc on thg.atccode eq atc.atccode inner join mp on thg.medprod eq mp.medprod; quit;

1/10/2010Many to many, too many performance tests9 Program 2: %let description=sql join with where clause; proc sql; create table atc2 as select atc.atctxt, mp.drugname from thg, atc, mp where thg.atccode eq atc.atccode and thg.medprod eq mp.medprod; quit;

1/10/2010Many to many, too many performance tests10 Many to many, too many performance tests Program 3: %let description=point loops with mp inside; data atc3; set thg; do k = 1 to nobs_atc; set atc(rename = (atccode = _atccode_)) nobs=nobs_atc point=k; if atccode eq _atccode_ then do; do l = 1 to nobs_mp; set mp(rename = (medprod = _medprod_)) nobs=nobs_mp point=l; if medprod = _medprod_ then output; end; keep atctxt drugname; run;

1/10/2010Many to many, too many performance tests11 Many to many, too many performance tests Program 4: %let description=point loops with atc inside; data at4; set thg; do k = 1 to nobs_mp; set mp(rename = (medprod = _medprod_)) nobs=nobs_mp point=k; if medprod = _medprod_ then do; do l = 1 to nobs_mp; set atc(rename = (atccode = _atccode_)) nobs=nobs_atc point=l; if atccode eq _atccode_ then output; end; keep atctxt drugname; run;

1/10/2010Many to many, too many performance tests12 Many to many, too many performance tests Program 5: %let description=Using formats; proc sql; create table fmt_mp as select medprod as start, drugname as label, 'mp' as fmtname, 'n' as type from mp; create table fmt_atc as select atccode as start, atctxt as label, 'atc' as fmtname, 'c' as type from atc; quit; proc format cntlin=fmt_mp; run; proc format cntlin=fmt_atc; run; data atc5; set thg; acttxt = put(atccode,atc.); drugname = put(medprod,mp.); keep atctxt drugname; run;

1/10/2010Many to many, too many performance tests13 Many to many, too many performance tests Program 6: %let description=sorting and merging; proc sort data=atc; by atccode; run; proc sort data=thg; by atccode; run; data atc0; merge atc thg; by atccode; keep atctxt medprod; run; proc sort data=mp; by medprod; run; proc sort data=atc0; by medprod; run; data atc0; merge atc0 mp; by medprod; keep atctxt drugname; run;

1/10/2010Many to many, too many performance tests14 Many to many, too many performance tests Base 2: libname who "D:\many_to_many\lib"; data thg; set who.thg; where official eq 'N'; run; proc sql; create table mp as select * from who.mp where medprod in (select medprod from thg); create table atc as select * from who.atc where atccode in (select atccode from thg); quit; Base 1: libname who "D:\many_to_many\lib"; data mp; set who.mp; run; data thg; set who.thg; run; data atc; set who.atc; run; Base 3: libname who "D:\many_to_many\lib"; data thg; set who.thg; if _n_ le 2000; run; proc sql; create table mp as select * from who.mp where medprod in (select medprod from thg); create table atc as select * from who.atc where atccode in (select atccode from thg); quit; Base conditionNumber of observations in result dataset BASE BASE BASE32000

1/10/2010Many to many, too many performance tests15 Many to many, too many performance tests PROG1 PROG2 PROG3 … BASE1 BASE2 … PROGx BASEx Main Folder is read and program names are stored %inc(BASE1) %inc(PROG1) %inc(BASE1) %inc(PROG2) %inc(BASE2) %inc(PROG1) %inc(BASE2) %inc(PROG1) %inc(BASE1) %inc(PROG3) %inc(BASE2) %inc(PROG3) Programs with all combinations are created

1/10/2010Many to many, too many performance tests16 Many to many, too many performance tests %inc(BASE1) %inc(PROG1.SAS) %inc(BASE1) %inc(PROG1) Running a single program in a batch job

1/10/2010Many to many, too many performance tests17 Many to many, too many performance tests %inc(BASE1) %inc(PROG1.SAS) %inc(PROG2.SAS) %inc(BASE1) %inc(PROG1.SAS) %inc(PROG1.SAS) Running a program multiple times Running programs in different orders

1/10/2010Many to many, too many performance tests18 Many to many, too many performance tests descriptiondurationbasenameprognamerepeatorder sql with inner join 9.089base2.sasprog1.sas1prog1.sas, prog2.sas sql join with where clause 7.418base2.sasprog2.sas2prog1.sas, prog2.sas sql with inner join base2.sasprog1.sas1prog1.sas, prog5.sas Using formats14.117base2.sasprog5.sas2prog1.sas, prog5.sas sql with inner join base2.sasprog1.sas1prog1.sas, prog6.sas sorting and merging base2.sasprog6.sas2prog1.sas, prog6.sas sql join with where clause base2.sasprog2.sas1prog2.sas, prog1.sas sql with inner join 8.73base2.sasprog1.sas2prog2.sas, prog1.sas sql join with where clause base2.sasprog2.sas1prog2.sas, prog5.sas Using formats17.022base2.sasprog5.sas2prog2.sas, prog5.sas Results dataset

1/10/2010Many to many, too many performance tests19 Many to many, too many performance tests &repeats: Specifies the number of repeats of a single program within a single run. This will add a program multiple times to a certain file / batch job. &n_runs: Specifies how many times each batch job is executed. &multiple: Specifies, if two programs run within the same batch job. Program options

1/10/2010Many to many, too many performance tests20 Many to many, too many performance tests Used files: prog1 – prog6, base3 %perf(n_runs=10); Compare programs with BASE3 as basis DescriptionFilenameMean duration (seconds) sql inner joinprog1.sas sql join with where clauseprog2.sas point loops with mp insideprog3.sas point loops with atc insideprog4.sas Using formatsprog5.sas Sorting and mergingprog6.sas0.0515

1/10/2010Many to many, too many performance tests21 Many to many, too many performance tests Used files: prog1,prog2, prog5, prog6, base2 %perf(n_runs=5,repeats=3); Compare programs with BASE2, running each program 3 times Mean duration (seconds) DescriptionFirst repeatSecond repeatThird repeat sql with inner join sql join with where clause Using formats Sorting and merging

1/10/2010Many to many, too many performance tests22 Many to many, too many performance tests Used files: prog1,prog2, prog5, prog6, base1 %perf(n_runs=5,repeats=3); Compare programs with BASE1, running each program 3 times Mean duration (seconds) DescriptionFirst repeatSecond repeatThird repeat sql inner join sql join with where clause Using formats Sorting and merging

1/10/2010Many to many, too many performance tests23 Many to many, too many performance tests Used files: prog1,prog2, prog5, prog6, base2 %perf(n_runs=10,multiple=YES); Compare programs with BASE2, running all combinations of programs Mean duration in seconds Previous Description sql inner join sql join with where clause Using formats Sorting and merging Program run at first place sql inner joinNA sql join with where clause NA Using formats NA Sorting and merging NA

Why? 1/10/2010Many to many, too many performance tests24 Many to many, too many performance tests Prepared for the final run Improve your programming Gain understanding of SAS

Thank you!

Questions? 1/10/2010Biop Presentation Title26 Many to many, too many performance tests