Doug Haigh, SAS Institute Inc.

Slides:



Advertisements
Similar presentations
Grid Wizard Enterprise Basic Tutorial Using Web Control Panel.
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Doug Haigh SAS Institute
Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger.
Multiprocessing with SAS ® Software Now Bill Fehlner, Kathleen Wong, Kifah Mansour SAS Toronto.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
Guide to extract/download multiple databases from Mainframe Tapes to PC using SAS PC Fereydoun J. Foroudian Blue Cross of California SAS is a registered.
M-grid Using Ubiquitous Web Technologies to create a Computational Grid R J Walters and S Crouch 21 January 2009.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Distributed components
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
Electrical Engineering Department Software Systems Lab TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Meeting recorder Application based on Software Agents.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved L22 (Chapter 25) Networking.
DEV-14: Understanding and Programming for the AppServer™
1 Chapter 1: Getting Started 1.1 Introducing SAS Enterprise Guide 1.2 Course Scenarios.
“SAS macros are just text substitution!” “ARRRRGGHHH!!!”
Computing for Research I Spring 2014 January 22, 2014.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
Managing Passwords in the SAS System Allen Malone Senior Analyst/Programmer Kaiser Permanente.
Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support.
VS Anywhere. Visual Studio Industry Partner VS Anywhere NEXT STEPS Contact us at: Websitehttps://vsanywhere.com Blog- Facebook.
Hello SAS 9.4: What's New? ChrIs Hemedinger, SAS.
Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali.
Introduction to HTML Reporting with SAS Welcome to HTML reporting with SAS Sam Gordji, Weir 107.
Client – Server Application Can you create a client server application: The server will be running as a service: does not have a GUI The server will run.
Batch processing and sysparm A step towards scheduling.
Copyright © 2006, SAS Institute Inc. All rights reserved. SAS Enterprise Guide Old Proc – New Tricks? Tim Trussell Academic Program, SAS Canada world diabetes.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
Macro Overview Mihaela Simion. Macro Facility Overview Definition : The SAS Macro Facility is a tool within base SAS software that contains the essential.
Define your Own SAS® Command Line Commands Duong Tran – Independent Contractor, London, UK Define your Own SAS® Command Line Commands Duong Tran – Independent.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Stored Processes An analyst’s perspective Sylvain Tremblay SAS Canada 24 February 2006.
ECEN “Internet Protocols and Modeling”, Spring 2012 Course Materials: Papers, Reference Texts: Bertsekas/Gallager, Stuber, Stallings, etc Class.
Macro Variable Resolution Enio Presutto York University, Toronto, Canada.
Introduction to SAS Macros Center for Statistical Consulting Short Course April 15, 2004.
BMTRY 789 Lecture 10: SAS MACRO Facility Annie N. Simpson, MSc.
Separating the Interface from the Engine: Creating Custom Add-in Tasks for SAS Enterprise Guide ® Peter Eberhardt Fernwood Consulting Group Inc.
Easier Platform Administration using SAS 9.4 Grid Option Sets SAS New South Wales User Group - Nov 2015 Andrew Howell ANJ Solutions Pty Ltd.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
While You Were Sleeping… SAS Is Hard At Work Andrea Wainwright- Zimmerman.
FOR MONDAY: Be prepared to hand in a one-page summary of the data you are going to use for your project and your questions to be addressed in the project.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Do not put content on the brand signature area NOBS for Noobs David B. Horvath, CCP, MS PhilaSUG Winter 2015 Meeting NOBS for Noobs.
Creating and Using Prompts in SAS® Enterprise Guide Joseph Urbi, WellPoint, Virginia Beach, VA.
Host and Application Security Lesson 9: Vulnerabilities, part 1.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Experiences with a SAS Grid Ray Lindsay ATO ACT SAS Users Group 21 May
SAS ® Global Forum 2014 March Washington, DC.
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Wednesday NI Vision Sessions
VIRTUAL NETWORK COMPUTING SUBMITTED BY:- Ankur Yadav Ashish Solanki Charu Swaroop Harsha Jain.
Chapter 13 FTP and Telnet Cisco Learning Institute Network+ Fundamentals and Certification Copyright ©2005 by Pearson Education, Inc. Upper Saddle River,
Scheduled Report ing From CounterPoint.
A brief introduction to the topic
Things You Should Know + Usage Tips
What is Bash Shell Scripting?
Grid Canada Testbed using HEP applications
Introducing – SAS® Grid Manager for Hadoop
How to Create Data Driven Lists
Implementing a Discrete Event Simulation Using the American Community Survey and SAS® University Edition by Michael C. Grierson Copyright © 2010,
Making Remote Processing Less Remote
Parallel Processing in Base SAS
Tips and Tricks for Using Macros to Automate SAS Reporting.
Presentation transcript:

Doug Haigh, SAS Institute Inc. Divide and Conquer Writing Parallel SAS® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc. Copyright © 2010, SAS Institute Inc. All rights reserved.

Introduction Have you ever wanted to Text and drive at the same time? Watch the big game and read a book at the same time? Be on vacation at the beach and get work done at the office? Humans are not good at doing two things at the same time but your SAS code can be

Introduction Parallel code is when two or more streams of execution occur at nearly the same time Time

Introduction Parallel SAS code requires SAS/CONNECT One CONNECT client to many CONNECT servers Parallel SAS code using SAS/CONNECT created by SAS Data Integration Studio SAS Enterprise Miner SAS Enterprise Guide / PROC SCAPROC SCAPROC = SAS Code Analyzer #SASGF15

Background SIGNON / SIGNOFF RSUBMIT / ENDRSUBMIT Establish/terminate connection to CONNECT server on Same machine Remote machine SAS Grid machine RSUBMIT / ENDRSUBMIT Sends SAS code to CONNECT server for processing May or may not wait for code to complete Same machine (SASCMD SIGNON) Remote machine (Spawner SIGNON) SAS Grid machine (grid-enabled SIGNON – SAS Grid Manager) #SASGF15

Simple SIGNON OPTIONS SASCMD="!SASCMD"; %let mySess=mySpawnerHost.myDomain.com 1234; %sysfunc(grdsvc_enable(mySess,server=SASApp)); SIGNON mySess; RSUBMIT mySess; data _NULL_;rc=sleep(5,1);run; ENDRSUBMIT; SIGNOFF mySess; #SASGF15

Simple SIGNON SIGNON RSUBMIT SIGNOFF CONNECT Client CONNECT Server(s) Time #SASGF15

Multiple SIGNONs Synchronous SIGNON mySess1; SIGNON mySess2; RSUBMIT mySess1; data _NULL_;rc=sleep(5,1);run; ENDRSUBMIT; RSUBMIT mySess2; SIGNOFF mySess1; SIGNOFF mySess2; Code runs on two CONNECT servers but is not parallel. SIGNON to mySess2 waits for SIGNON to mySess1 RSUBMIT to mySess1 waits for SIGNON to mySess2 RSUBMIT to mySess2 waits for SIGNON to mySess1 #SASGF15

Multiple SIGNONs Synchronous RSUBMIT SIGNOFF Code runs on two CONNECT servers but is not parallel. SIGNON to mySess2 waits for SIGNON to mySess1 RSUBMIT to mySess1 waits for SIGNON to mySess2 RSUBMIT to mySess2 waits for SIGNON to mySess1 SIGNOFF to mySess2 waits for SIGNOFF to mySess1 #SASGF15

Multiple SIGNONs Asynchronous SIGNON mySess1 SIGNONWAIT=NO; SIGNON mySess2 SIGNONWAIT=NO; RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(5,1);run; ENDRSUBMIT; RSUBMIT mySess2 WAIT=NO; SIGNOFF _ALL_; Better, but still not ideal Everything blocks when RSUBMIT to mySess1 is encountered since RSUBMIT cannot do anything until SIGNON completes. RSUBMIT to mySess2 cannot occur even if SIGNON to mySess2 is ready if SIGNON to mySess1 has not completed. #SASGF15

Multiple SIGNONs Asynchronous RSUBMIT SIGNOFF Better, but still not ideal Everything blocks when RSUBMIT to mySess1 is encountered since RSUBMIT cannot do anything until SIGNON completes. RSUBMIT to mySess2 cannot occur even if SIGNON to mySess2 is ready if SIGNON to mySess1 has not completed. #SASGF15

Multiple SIGNONs Asynchronous RSUBMIT SIGNOFF Worst case – code for mySess2 has to wait for signon to mySess1 even though mySess2 is ready #SASGF15

Reusing a Session SIGNON mySess1 SWAIT=NO; SIGNON mySess2 SWAIT=NO; RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(10,1);run; ENDRSUBMIT; RSUBMIT mySess2 WAIT=NO; data _NULL_;rc=sleep(5,1);run; WAITFOR _ALL_ mySess1; SIGNOFF _ALL_; Added WAITFOR to wait for _ALL_ executing sessions to finish Better parallelism but still suffers due to length of time mySess1 takes. Would be better if code Had used mySess2 instead of mySess1 Had only waited for mySess2 to finish #SASGF15

Reusing a Session SIGNON RSUBMIT SIGNOFF Added WAITFOR to wait for _ALL_ executing sessions to finish Better parallelism but still suffers due to length of time mySess1 takes. Would be better if code Had used mySess2 instead of mySess1 Had only waited for mySess2 to finish #SASGF15

Reusing a Session SIGNON RSUBMIT SIGNOFF Worst case where rsubmit is coded to go to mySess1 even though it could have been processed on mySess2 earlier #SASGF15

Reusing an Available Session SIGNON mySess1 SWAIT=NO; SIGNON mySess2 SWAIT=NO; RSUBMIT mySess1 WAIT=NO CMACVAR=myVar1; data _NULL_;rc=sleep(10,1);run; ENDRSUBMIT; RSUBMIT mySess2 WAIT=NO CMACVAR=myVar2; data _NULL_;rc=sleep(5,1);run; WAITFOR _ANY_ mySess1 mySess2; %determineAvailableSession(2); RSUBMIT mySess&openSess WAIT=NO; SIGNOFF _ALL_; Added CMACVAR to tell what macro variable to update with RSUBMIT progress determineAvailableSession to determine which session completed. Macro Variable values 0 – SIGNON, RSUBMIT completed 1 – SIGNON, RSUBMIT failed 2 – Already signed on, RSUBMIT in progress 3 – SIGNON in progress Much better parallelism WAITFOR _ANY_ waits for first available session to complete determineAvailableSession tells which session to use next #SASGF15

Reusing an Available Session %macro determineAvailableSession(numSessions); %global openSess; %do sess=1 %to &numSessions; %if (&&myVar&sess eq 0) %then %do; %let openSess=&sess; %let sess=&numSessions; %end; %mend; Loops through all session macro variables look for one that is complete #SASGF15

Reusing an Available Session SIGNON RSUBMIT SIGNOFF Better parallelism but still suffers due to length of time SIGNONs take. #SASGF15

Reusing an Available Session SIGNON RSUBMIT SIGNOFF Worse case – SIGNON to mySess2 is finished before mySess1 so initial code that is directed at specific host has to wait Would be better if code Had used mySess2 instead of mySess1 Had only waited for mySess2 to finish #SASGF15

Reusing the Best Available Session SIGNON mySess1 SWAIT=NO CMACVAR=mySignonVar1; … SIGNON mySessN SWAIT=NO CMACVAR=mySignonVarN; %waitForAvailableSession(N); RSUBMIT mySess&openSess WAIT=NO CMACVAR=myVar&openSess; data _NULL_;rc=sleep(10,1);run; ENDRSUBMIT; data _NULL_;rc=sleep(1,1);run; SIGNOFF _ALL_; Best use of parallelism Code uses first available session in all cases PROC SCAPROC spits out code like this Challenges How to do one-time initialization #SASGF15

Reusing the Best Available Session %macro waitForAvailableSession(numSessions); %global openSessID; %let sessFound=0; %do %while (&sessFound eq 0); %do sess=1 %to &numSessions; %if (&&mySignonVar&sess eq 0) %then %if (&&myVar&sess eq 0) %then %do; %let openSess=&sess; %let sess=&numSessions; %let sessFound=1; %end; %if (&sessFound eq 0) %then %let rc=%sysfunc(sleep(1,1)); %mend; This will need to make sure you initialize the macro variables mySignonVarX init to 3 myVarX init to 0 #SASGF15

Reusing the Best Available Session SIGNON RSUBMIT SIGNOFF Best parallelism #SASGF15

How about a macro to do all of this… Perform SIGNONs as needed SASCMD or Grid Retry SIGNONs if one fails Manage RSUBMITs to available hosts Retry RSUBMITs if one fails Display progress of RSUBMITs SIGNOFF hosts when no more work exists Email user when done

%Distribute Determine code that needs to be executed once when SIGNON completes LIBNAME, FILENAME Create code that can run at each iteration Base iteration differences on macro variables provided Rem_Host, Rem_iHost, Rem_Seed, Rem_NIterAll, Rem_Niter, Rem_JobIters, Rem_JobID, GlobalNSub Setup %Distribute parameters Run ** Predefined ** global macro Contents ** -------------- ---------------------------------------- ** Rem_Host Name of this host ** Rem_iHost Index of this host out of all hosts ** Rem_Seed Random seed, rerandomized for each chunk ** Rem_NIterAll Total number of iterations ** Rem_NIter Number of Iterations in a chunk ** Rem_JobIters Number of Iterations in this chunk * ** Rem_JobID Chunk number * ** GlobalNSub Number of iterations already submitted * ** ** * = Available only in TaskRSub #SASGF15

%Distribute Signing on... GridDistribute: Maximum number of nodes is 4 Processing... GridDistribute: Signing on to Host #1 GridDistribute: Signing on to Host #2 GridDistribute: Signing on to Host #3 GridDistribute: Signing on to Host #4 Stat: [0:00:00] ???? (0/0) 100000 GridDistribute: Host #1 is host2.mydomain.com GridDistribute: Host #2 is host4.mydomain.com GridDistribute: Host #3 is host1.mydomain.com GridDistribute: Host #4 is host3.mydomain.com Stat: [0:00:02] !!!. (0/0) 100000 Stat: [0:00:02] .... (8000/0) 100000 Stat: [0:00:05] !!!! (8000/8000) 100000 <similar lines deleted> Stat: [0:00:14] ...! (100000/94000) 100000 #SASGF15

Summary Writing parallel SAS code can significantly speed up processing Some SAS products will do it for you See the paper for discussion of additional considerations Information movement Data movement Output management RSUBMIT and the SAS Macro Facility SCAPROC = SAS Code Analyzer #SASGF15

Questions?

Session ID #1935

Additional Considerations Information movement %SYSLPUT / %SYSRPUT for macro variables %SYSLPUT remVar=&localVar /REMOTE=mySess1; RSUBMIT mySess1; … %SYSRPUT localVar=&remVar; ENDRSUBMIT;

Additional Considerations Data Movement Shared file system / RDBMS PROC UPLOAD/DOWNLOAD RLS Output Movement Log and List files LOG=, LIST= PROC PRINTTO #SASGF15

Additional Considerations RSUBMIT and the SAS Macro Facility RSUBMIT mySess1; %SYSRPUT localVar=&remVar; ENDRSUBMIT; needs to be quoted %NRSTR(%%)SYSRPUT localVar=&remVar; or wrapped in a macro %MACRO updateVar; %MEND; %updateVar;