Download presentation
Presentation is loading. Please wait.
Published byYazmin Beldin Modified over 10 years ago
1
Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger SAS Institute
2
Copyright © 2003, SAS Institute Inc. All rights reserved. The SAS Intelligence Value Chain Usability Interoperability Manageability Scalability
3
Copyright © 2003, SAS Institute Inc. All rights reserved. Scalable SAS/ACCESS OracleDB2SybaseTeradata Scalable Performance Data Server CPU 1Remote Host CPU 2 Clients Stored Process Scalable Servers OLAP Metadata SAS CONNECT SAS CONNECT SAS CONNECT THREAD 1 THREAD 2 Threaded Procedures THREAD N… Scalability – SAS 9 SAS Scalable Architecture Piping
4
Copyright © 2003, SAS Institute Inc. All rights reserved.
6
Scalability with SAS 9 parallel threads parallel processes
7
Copyright © 2003, SAS Institute Inc. All rights reserved. Software Scalability software must be designed to leverage scalable hardware −enter SAS 9 threads −multiple threads within a process or SAS session that execute in parallel −OS schedules threads to CPUs processes −multiple SAS sessions that execute in parallel −OS schedules SAS sessions to CPUs
8
Copyright © 2003, SAS Institute Inc. All rights reserved. SAS Process made up of many pieces −execution units −data structures −resources, etc. corresponds to an OS process −private address space −scheduled by the OS −resources managed by the OS
9
Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple SAS Processes make use of multiple processors on SMP machines make use of multiple remote processors across a network OS schedules processes to specific processors
10
Copyright © 2003, SAS Institute Inc. All rights reserved. Threads each process consists of one or more threads scheduled by the OS all threads in a process share an address space and must share the resources of the process
11
Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple Threads make use of multiple processors on SMP machine limited to one physical box, cannot leverage remote resources OS schedules threads to specific processors
12
Copyright © 2003, SAS Institute Inc. All rights reserved. Single Threaded V8 SAS
13
Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple Processes
14
Copyright © 2003, SAS Institute Inc. All rights reserved. SAS 9 Multiple Threads
15
Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple Processes and Multiple Threads
16
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT Multi-Process CONNECT multiple SAS sessions execute in parallel −Independent parallelism −Pipeline parallelism control and synchronization from parent or client SAS session OS controls scheduling of SAS processes to CPU’s
17
Copyright © 2003, SAS Institute Inc. All rights reserved. A Very Satisfied MP CONNECT Customer… "I've been dreaming of this capability within SAS for approximately 12 years. The first day back in the office after the course, within 30 minutes I was able to apply the technique to an existing program and reduce processing time by over 50%.” David Walker Centers for Disease Control and Prevention
18
Copyright © 2003, SAS Institute Inc. All rights reserved. Independent Parallelism Data Source BProc Sort Data Source A 0elapsed time
19
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Independent - Scale Up Execute Simultaneously SMP Server Read and Summarize SAS Data Read and Summarize SAS Data Extract Oracle Data PROC STEP DATA STEP
20
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Independent - Scale Out Parent SAS Session SAS Session n SAS Session 2
21
Copyright © 2003, SAS Institute Inc. All rights reserved. Serial SAS Job libname annual ‘path to annual sales lib’; proc sort data=annual.sales out=annual.ssales; by region; run; libname month ‘path to monthly sales lib’; data month.march; do i = 1 to 52; /* create data set */ output; end; run; /* continue execution */
22
Copyright © 2003, SAS Institute Inc. All rights reserved. Modified to use MP CONNECT options sascmd=“!sascmd” autosignon=yes; rsubmit task1 wait=no; libname annual ‘path to annual sales lib’; proc sort data=annual.sales out=annual.ssales; by region; run; endrsubmit; rsubmit task2 wait=no; libname month ‘path to monthly sales lib’; data month.march; do i = 1 to 52; /* create data set */ output; end; run; endrsubmit; waitfor _all_ task1 task2; /* continue local execution */
23
Copyright © 2003, SAS Institute Inc. All rights reserved. The Good News… everything we have talked about so far is available now with SAS/CONNECT in SAS 8!
24
Copyright © 2003, SAS Institute Inc. All rights reserved. Piping – Worth the Price of Admission to SAS 9… “…piping is the big one that has made a difference to our day - jobs have been cut by up to 60% meaning we can deliver in a much quicker time frame at end of month.” Charles Pollack SUNCORP METWAY
25
Copyright © 2003, SAS Institute Inc. All rights reserved. Pipeline Parallelism Data StepProc Sort 0 elapsed time Proc Sort Data Step Proc Sort
26
Copyright © 2003, SAS Institute Inc. All rights reserved. Piping Engine new feature of SAS/CONNECT 9 facilitates pipeline parallelism in both SMP environments and across remote machines overlaps dependent SAS procedure and/or data step execution −pipe the output of first data step/proc as input to next data step proc reduces disk space requirements −eliminates intermediate write to disk
27
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Piping – Scale Up Overlapped Execution SMP Server Read and Summarize SAS Data DATA STEP DATA STEP PROC STEP
28
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Piping – Scale Out Parent SAS Session SAS Session n SAS Session 2
29
Copyright © 2003, SAS Institute Inc. All rights reserved. When to Use MP CONNECT long running jobs independent data sources independent tasks tasks that can be overlapped utilize SMP hardware or processors on a network
30
Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations When Using MP CONNECT creating I/O bottleneck if multiple sessions read same input file −create multiple copies on separate mount points −create subsets of file on separate mount points −use SPDE engine to facilitate multi-user access to data
31
Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations When Using MP CONNECT (cont.) each MP CONNECT process has unique WORK library, all in temp file directory by default use WORK option on MP CONNECT session invocations −direct WORK to separate disks use SPDE engine to create partitioned temp lib −assign USER= as the libref −set USER= option to temp libref −utility files still go to WORK but one-level data sets go to USER=
32
Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations for MP CONNECT I/O bottlenecks WORK library CPU bottlenecks
33
Copyright © 2003, SAS Institute Inc. All rights reserved. Piping Syntax specify the new piping feature by including the pipe engine “sasesock” in the libname statement, as follows: libname libref sasesock “port-specifier” “port-specifier” will be described in the next few slides
34
Copyright © 2003, SAS Institute Inc. All rights reserved. Port Specification Syntax Piping supports 2 ways to specify a port: Explicit: a specific port number or service name. Implicit: a pseudo pipe name, and the piping engine will select the first available TCP/IP port.
35
Copyright © 2003, SAS Institute Inc. All rights reserved. Explicit Port Specification “:explicit port” specifies an explicit port on the machine where the asynchronous RSUBMIT is executing. libname foo SASESOCK “:256” “:port service” specifies a defined service name on the machine where the asynchronous RSUBMIT is executing. libname foo SASESOCK “:port1”
36
Copyright © 2003, SAS Institute Inc. All rights reserved. Explicit Port Specification - With Machine Name “machine-name:port-number” specify an explicit port number on the machine indicated by “machine name”. libname foo SASESOCK “apex.finance.com:256” “machine-name:port service” specify the name of the service on the machine specified by “machine name”. libname foo SASESOCK “apex.finance.com:port1”
37
Copyright © 2003, SAS Institute Inc. All rights reserved. Implicit Port Specification use this method if you do not want to specify an explicit port or service. Instead, the first available port is dynamically assigned to the alias that you specify. alias-for-implicitly-selected-port libname foo SASESOCK “autoPort” requires use of meta data repository
38
Copyright © 2003, SAS Institute Inc. All rights reserved. Sequential SAS Job libname outLib "c:\writeToDisk"; libname inLib "c:\writeToDisk"; /* redundant */ data outLib.Intermediate; do i=1 to 5; put 'Writing row ' i; output; end; run; data outLib.Final; set inLib.Intermediate; do j=1 to 5; put 'Adding data ' j; n2=j*2; output; end; run;
39
Copyright © 2003, SAS Institute Inc. All rights reserved. signon p1 sascmd=’sas'; rsubmit p1 wait=no; libname outLib sasesock “:pipe1"; data outLib.Intermediate; do i=1 to 5; put 'Writing row ' i; output; end; run; endrsubmit; signon p2 sascmd=’sas'; rsubmit p2 wait=no; libname inLib sasesock “:pipe1"; libname outLib "d:\temp"; data outLib.Final; set inLib.Intermediate; do j=1 to 5; put 'Adding data ' j; n2 = j*2; output; end; run; endrsubmit; waitfor _all_ p1 p2; Modified to Use Piping
40
Copyright © 2003, SAS Institute Inc. All rights reserved. Example: Utilizing Piping as an Intermediate Step /* ----------- Data Step - Process P1 ----- */ signon p1 sascmd=’!sascmd'; rsubmit p1 wait=no; libname outLib sasesock “:pipe1"; data outLib.Intermediate1; do i=1 to 5; put 'Writing row ' i; output; end; run; endrsubmit; /* ----------- Data Step - Process P2 ----- */ signon p2 sascmd=’!sascmd'; rsubmit p2 wait=no; libname inLib sasesock “:pipe1"; libname outLib sasesock “:pipe2"; data outLib.Intermediate2; set inLib.Intermediate1; do j=1 to 5; put 'Adding data ' j; n2 = j*2;output; end; run; endrsubmit; /* ----------- Data Step - Process P3 ----- */ signon p3 sascmd=’!sascmd'; rsubmit p3 wait=no; libname inLib sasesock “:pipe2"; libname outLib "d:\temp"; data outLib.Final; set inLib.Intermediate2; do k=1 to 5; put 'Adding data ' k; n3 = k*2; output; end; run; endrsubmit; waitfor _all_ p1 p2 p3;
41
Copyright © 2003, SAS Institute Inc. All rights reserved. Sequential Processing Data Step Sort Total time to complete: 5374 seconds Output from data step written to disk Output from proc sort written to disk
42
Copyright © 2003, SAS Institute Inc. All rights reserved. Pipeline Parallelism Same Machine Data Step Sort subset 1Sort subset 2Sort subset 3Sort subset 4 Merge data Proc Sort Total time: 2063 seconds %62 improvement over sequential test case
43
Copyright © 2003, SAS Institute Inc. All rights reserved. Hardware Considerations for Pipeline Parallelism What if your machine only has a single processor and you attempt to run a test such as this? The piping test case will probably take longer then the sequential test case. This is a result of contention for the I/O channel, the CPU and memory. If you work in an environment such as this, piping can still work for you. Simply farm the work out to other machines!
44
Copyright © 2003, SAS Institute Inc. All rights reserved. Pipeline Parallelism remote machines Data Step Sort subset 1Sort subset 2Sort subset 3 Sort subset 4 Merge data Total time: 2102 seconds %61 improvement over sequential test case
45
Copyright © 2003, SAS Institute Inc. All rights reserved. Performance Results Summary 8-way Windows platform Server machine 2-way Windows platform remote client machines
46
Copyright © 2003, SAS Institute Inc. All rights reserved. Piping Test Case two raw data files as input (sales and goals) two data steps to read each file and subset into 4 quarters four data steps to merge sales and goals for each of the 4 quarters
47
Copyright © 2003, SAS Institute Inc. All rights reserved. Sequential Processing Step 2 Data step separates by quarter Raw data file: Goals.txt Q1Q2Q4Q3 Step 1 Data step separates by quarter Raw data file: Sales.txt Q1Q2Q4Q3 Q1 SalesQ1 Goals Merge Step 3 Q2 SalesQ2 Goals Merge Step 4 Q3 SalesQ3 Goals Merge Step 5 Q4 GoalsQ4 Sales Merge Step 6 Total Time: 1274 seconds
48
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT - Independent Parallelism Total time: 439 seconds %Improvement over Sequential: 66% 4 Merges Step 2 Q4 Q1Q2Q3Q4Q1Q2 Q3 Sales.txt, raw data file Data step separates by quarter Goals.txt, raw data file Step 1
49
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT - Pipeline Parallelism Total time: 247 seconds %Improvement over Sequential: 81% %Improvement over MP Connect: 44% 4 Merges Q1 Sales.txt, raw data file Data step separates by quarter Q2Q3Q4 Goals.txt, raw data file Q1Q2Q3Q4 Step 1
50
Copyright © 2003, SAS Institute Inc. All rights reserved. Performance Results Summary 16-way IA-64 bit ES7000
51
Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations You must have sufficient CPU and I/O processing resources. If this is not available on one machine, then the work can be farmed out to additional machines. Piping is not effective when execution time for procs is short. Piping introduces a small performance overhead with the signon, a CPU overhead of additional processes and a complexity overhead to the SAS job.
52
Copyright © 2003, SAS Institute Inc. All rights reserved. Restrictions and Requirements supports single pass SAS procs and data steps only exs: SORT, SUMMARY, PRINT, DATA step, GANTT used with dependent SAS tasks when one SAS proc requires the output of another SAS proc requires a SAS/CONNECT license
53
Copyright © 2003, SAS Institute Inc. All rights reserved. Building Blocks - Combining Solutions combine scale up and scale out combine multiple threads with multiple processes combine SPDE engine with other scalable solutions combine MP CONNECT with other scalable solutions
54
Copyright © 2003, SAS Institute Inc. All rights reserved. Gartner’s Definition of Grid Computing “a grid is a collection of resources owned by multiple organizations that is coordinated to allow them to solve a common problem”
55
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Cluster Environment 32 node Linux cluster / MOSIX 1 Ghz Intel P3 processors 1 G RAM per processor 100 Mb backplane
56
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Cluster Environment Estimated Work Time for Wait i No. Time/20 Entire Distribution Time/20 Host Host Iter Iter Problem Efficiency Iter 4 task4 3940 0:04:17 446:05 96% 0:00:07 17 task17 3920 0:04:17 446:03 96% 0:00:09 18 task18 3900 0:04:17 445:26 96% 0:00:10 Total elapsed time: 14:30:03 Cumulative working time: 447:46 Cumulative waiting time: 15:14:54 Scaling efficiency: 96.50%
57
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Grid Environment 100 heterogeneous nodes W2K, WXP, variety of Unix OS’s combination of V8 SAS and SAS 9
58
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Grid Environment Estimated Work Time for Wait i No. Time/30 Entire Distribution Time/30 Host Host Iter Iter Problem Efficiency Iter 7 ld055 570 0:15:18 1060:11 204% 0:00:00 48 in028 1230 0:06:52 476:07 91% 0:00:00 97 hd204 3120 0:02:40 184:42 35% 0:00:01 Total elapsed time: 5:12:19 Cumulative working time: 468:41 Cumulative waiting time: 0:39:42 Scaling efficiency: 90.04%
59
Copyright © 2003, SAS Institute Inc. All rights reserved. Combining Parallel Processes and Threads
60
Copyright © 2003, SAS Institute Inc. All rights reserved. SAS 9 Partitioned Data Model SAS ® 8SAS 9 SPDE Engine & SPD Server ® data index metadata data1 data2 data3 data4 Bitmap/B-tree Hybrid index Bitmap/B-tree Index metadata
61
Copyright © 2003, SAS Institute Inc. All rights reserved. SAS Partitioned Data data1workdata2data3data4data5 data7data6 150 MB/s 1000 MB/s high I/O bandwidth bandwidth Index 1 data8 Index 2 meta data9 data10 Raid Groups
62
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and SPDE Engine single input, 4.8 GB, 20 million obs two data steps, two PROC FREQs 4-way unix box six iterations of implementation
63
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and SPDE Engine Data Step 1Data Step 2Proc Freq 1Proc Freq 2 partitioned input partitioned USER= 4 parallel sessions
64
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and SPDE Engine total improvement in elapsed time of 65%
65
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded SUMMARY two raw input files (goals and sales) two data steps to read input and calculate revenue and profit two PROC SUMMARYs to summarize by region and classify data by employee number final merge of projected and actual sales by region to determine those employees who met/exceeded goals
66
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded SUMMARY two raw input files (~1.5G each) 8-way 900 MHz unix box two data steps, two PROC SUMMARYs, and a merge
67
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded Summary Sales.txt Goals.txt 1 Step Merge Data step Summary
68
Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded Summary total improvement in elapsed time of 70%
69
Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations for Combining MP CONNECT and Threading tune threads per session on SMP −CPUCOUNT −THREADS/NOTHREADS −OS processor set command depends on −application, −data, and −hardware configuration
70
Copyright © 2003, SAS Institute Inc. All rights reserved. Summary many ways that SAS 9 is addressing performance through scalability scalability requires thought and balance between: −partitioned data −parallel I/O −parallel multi-threaded computation −data piping −parallel SAS sessions
71
Copyright © 2003, SAS Institute Inc. All rights reserved. Acknowledgements MP CONNECT −Cheryl.Doninger@sas.comCheryl.Doninger@sas.com −Renee.Palmer@sas.comRenee.Palmer@sas.com SPDE Engine −Billy.Clifford@sas.comBilly.Clifford@sas.com Scalable SAS Procedures −Robert.Ray@sas.comRobert.Ray@sas.com −Robert.Cohen@sas.comRobert.Cohen@sas.com
72
Copyright © 2003, SAS Institute Inc. All rights reserved. For More Info… Scalability and Performance Community −http://support.sas.com/rnd/scalabilityhttp://support.sas.com/rnd/scalability
73
Copyright © 2003, SAS Institute Inc. All rights reserved.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.