Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger.

Similar presentations


Presentation on theme: "Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger."— Presentation transcript:

1 Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger SAS Institute

2 Copyright © 2003, SAS Institute Inc. All rights reserved. The SAS Intelligence Value Chain  Usability  Interoperability  Manageability  Scalability

3 Copyright © 2003, SAS Institute Inc. All rights reserved. Scalable SAS/ACCESS OracleDB2SybaseTeradata Scalable Performance Data Server CPU 1Remote Host CPU 2 Clients Stored Process Scalable Servers OLAP Metadata SAS CONNECT SAS CONNECT SAS CONNECT THREAD 1 THREAD 2 Threaded Procedures THREAD N… Scalability – SAS 9 SAS Scalable Architecture Piping

4 Copyright © 2003, SAS Institute Inc. All rights reserved.

5

6 Scalability with SAS 9 parallel threads parallel processes

7 Copyright © 2003, SAS Institute Inc. All rights reserved. Software Scalability  software must be designed to leverage scalable hardware −enter SAS 9  threads −multiple threads within a process or SAS session that execute in parallel −OS schedules threads to CPUs  processes −multiple SAS sessions that execute in parallel −OS schedules SAS sessions to CPUs

8 Copyright © 2003, SAS Institute Inc. All rights reserved. SAS Process  made up of many pieces −execution units −data structures −resources, etc.  corresponds to an OS process −private address space −scheduled by the OS −resources managed by the OS

9 Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple SAS Processes  make use of multiple processors on SMP machines  make use of multiple remote processors across a network  OS schedules processes to specific processors

10 Copyright © 2003, SAS Institute Inc. All rights reserved. Threads  each process consists of one or more threads  scheduled by the OS  all threads in a process share an address space and must share the resources of the process

11 Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple Threads  make use of multiple processors on SMP machine  limited to one physical box, cannot leverage remote resources  OS schedules threads to specific processors

12 Copyright © 2003, SAS Institute Inc. All rights reserved. Single Threaded V8 SAS

13 Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple Processes

14 Copyright © 2003, SAS Institute Inc. All rights reserved. SAS 9 Multiple Threads

15 Copyright © 2003, SAS Institute Inc. All rights reserved. Multiple Processes and Multiple Threads

16 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT  Multi-Process CONNECT  multiple SAS sessions execute in parallel −Independent parallelism −Pipeline parallelism  control and synchronization from parent or client SAS session  OS controls scheduling of SAS processes to CPU’s

17 Copyright © 2003, SAS Institute Inc. All rights reserved. A Very Satisfied MP CONNECT Customer… "I've been dreaming of this capability within SAS for approximately 12 years. The first day back in the office after the course, within 30 minutes I was able to apply the technique to an existing program and reduce processing time by over 50%.” David Walker Centers for Disease Control and Prevention

18 Copyright © 2003, SAS Institute Inc. All rights reserved. Independent Parallelism Data Source BProc Sort Data Source A 0elapsed time

19 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Independent - Scale Up Execute Simultaneously SMP Server Read and Summarize SAS Data Read and Summarize SAS Data Extract Oracle Data PROC STEP DATA STEP

20 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Independent - Scale Out Parent SAS Session SAS Session n SAS Session 2

21 Copyright © 2003, SAS Institute Inc. All rights reserved. Serial SAS Job libname annual ‘path to annual sales lib’; proc sort data=annual.sales out=annual.ssales; by region; run; libname month ‘path to monthly sales lib’; data month.march; do i = 1 to 52; /* create data set */ output; end; run; /* continue execution */

22 Copyright © 2003, SAS Institute Inc. All rights reserved. Modified to use MP CONNECT options sascmd=“!sascmd” autosignon=yes; rsubmit task1 wait=no; libname annual ‘path to annual sales lib’; proc sort data=annual.sales out=annual.ssales; by region; run; endrsubmit; rsubmit task2 wait=no; libname month ‘path to monthly sales lib’; data month.march; do i = 1 to 52; /* create data set */ output; end; run; endrsubmit; waitfor _all_ task1 task2; /* continue local execution */

23 Copyright © 2003, SAS Institute Inc. All rights reserved. The Good News…  everything we have talked about so far is available now with SAS/CONNECT in SAS 8!

24 Copyright © 2003, SAS Institute Inc. All rights reserved. Piping – Worth the Price of Admission to SAS 9… “…piping is the big one that has made a difference to our day - jobs have been cut by up to 60% meaning we can deliver in a much quicker time frame at end of month.” Charles Pollack SUNCORP METWAY

25 Copyright © 2003, SAS Institute Inc. All rights reserved. Pipeline Parallelism Data StepProc Sort 0 elapsed time Proc Sort Data Step Proc Sort

26 Copyright © 2003, SAS Institute Inc. All rights reserved. Piping Engine  new feature of SAS/CONNECT 9  facilitates pipeline parallelism in both SMP environments and across remote machines  overlaps dependent SAS procedure and/or data step execution −pipe the output of first data step/proc as input to next data step proc  reduces disk space requirements −eliminates intermediate write to disk

27 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Piping – Scale Up Overlapped Execution SMP Server Read and Summarize SAS Data DATA STEP DATA STEP PROC STEP

28 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT – Piping – Scale Out Parent SAS Session SAS Session n SAS Session 2

29 Copyright © 2003, SAS Institute Inc. All rights reserved. When to Use MP CONNECT  long running jobs  independent data sources  independent tasks  tasks that can be overlapped  utilize SMP hardware or processors on a network

30 Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations When Using MP CONNECT  creating I/O bottleneck if multiple sessions read same input file −create multiple copies on separate mount points −create subsets of file on separate mount points −use SPDE engine to facilitate multi-user access to data

31 Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations When Using MP CONNECT (cont.)  each MP CONNECT process has unique WORK library, all in temp file directory by default  use WORK option on MP CONNECT session invocations −direct WORK to separate disks  use SPDE engine to create partitioned temp lib −assign USER= as the libref −set USER= option to temp libref −utility files still go to WORK but one-level data sets go to USER=

32 Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations for MP CONNECT  I/O bottlenecks  WORK library  CPU bottlenecks

33 Copyright © 2003, SAS Institute Inc. All rights reserved. Piping Syntax  specify the new piping feature by including the pipe engine “sasesock” in the libname statement, as follows: libname libref sasesock “port-specifier” “port-specifier” will be described in the next few slides

34 Copyright © 2003, SAS Institute Inc. All rights reserved. Port Specification Syntax  Piping supports 2 ways to specify a port:  Explicit: a specific port number or service name.  Implicit: a pseudo pipe name, and the piping engine will select the first available TCP/IP port.

35 Copyright © 2003, SAS Institute Inc. All rights reserved. Explicit Port Specification  “:explicit port”  specifies an explicit port on the machine where the asynchronous RSUBMIT is executing.  libname foo SASESOCK “:256”  “:port service”  specifies a defined service name on the machine where the asynchronous RSUBMIT is executing.  libname foo SASESOCK “:port1”

36 Copyright © 2003, SAS Institute Inc. All rights reserved. Explicit Port Specification - With Machine Name  “machine-name:port-number”  specify an explicit port number on the machine indicated by “machine name”.  libname foo SASESOCK “apex.finance.com:256”  “machine-name:port service”  specify the name of the service on the machine specified by “machine name”.  libname foo SASESOCK “apex.finance.com:port1”

37 Copyright © 2003, SAS Institute Inc. All rights reserved. Implicit Port Specification  use this method if you do not want to specify an explicit port or service. Instead, the first available port is dynamically assigned to the alias that you specify.  alias-for-implicitly-selected-port libname foo SASESOCK “autoPort”  requires use of meta data repository

38 Copyright © 2003, SAS Institute Inc. All rights reserved. Sequential SAS Job libname outLib "c:\writeToDisk"; libname inLib "c:\writeToDisk"; /* redundant */ data outLib.Intermediate; do i=1 to 5; put 'Writing row ' i; output; end; run; data outLib.Final; set inLib.Intermediate; do j=1 to 5; put 'Adding data ' j; n2=j*2; output; end; run;

39 Copyright © 2003, SAS Institute Inc. All rights reserved. signon p1 sascmd=’sas'; rsubmit p1 wait=no; libname outLib sasesock “:pipe1"; data outLib.Intermediate; do i=1 to 5; put 'Writing row ' i; output; end; run; endrsubmit; signon p2 sascmd=’sas'; rsubmit p2 wait=no; libname inLib sasesock “:pipe1"; libname outLib "d:\temp"; data outLib.Final; set inLib.Intermediate; do j=1 to 5; put 'Adding data ' j; n2 = j*2; output; end; run; endrsubmit; waitfor _all_ p1 p2; Modified to Use Piping

40 Copyright © 2003, SAS Institute Inc. All rights reserved. Example: Utilizing Piping as an Intermediate Step /* ----------- Data Step - Process P1 ----- */ signon p1 sascmd=’!sascmd'; rsubmit p1 wait=no; libname outLib sasesock “:pipe1"; data outLib.Intermediate1; do i=1 to 5; put 'Writing row ' i; output; end; run; endrsubmit; /* ----------- Data Step - Process P2 ----- */ signon p2 sascmd=’!sascmd'; rsubmit p2 wait=no; libname inLib sasesock “:pipe1"; libname outLib sasesock “:pipe2"; data outLib.Intermediate2; set inLib.Intermediate1; do j=1 to 5; put 'Adding data ' j; n2 = j*2;output; end; run; endrsubmit; /* ----------- Data Step - Process P3 ----- */ signon p3 sascmd=’!sascmd'; rsubmit p3 wait=no; libname inLib sasesock “:pipe2"; libname outLib "d:\temp"; data outLib.Final; set inLib.Intermediate2; do k=1 to 5; put 'Adding data ' k; n3 = k*2; output; end; run; endrsubmit; waitfor _all_ p1 p2 p3;

41 Copyright © 2003, SAS Institute Inc. All rights reserved. Sequential Processing Data Step Sort Total time to complete: 5374 seconds Output from data step written to disk Output from proc sort written to disk

42 Copyright © 2003, SAS Institute Inc. All rights reserved. Pipeline Parallelism Same Machine Data Step Sort subset 1Sort subset 2Sort subset 3Sort subset 4 Merge data Proc Sort Total time: 2063 seconds %62 improvement over sequential test case

43 Copyright © 2003, SAS Institute Inc. All rights reserved. Hardware Considerations for Pipeline Parallelism  What if your machine only has a single processor and you attempt to run a test such as this?  The piping test case will probably take longer then the sequential test case. This is a result of contention for the I/O channel, the CPU and memory.  If you work in an environment such as this, piping can still work for you. Simply farm the work out to other machines!

44 Copyright © 2003, SAS Institute Inc. All rights reserved. Pipeline Parallelism remote machines Data Step Sort subset 1Sort subset 2Sort subset 3 Sort subset 4 Merge data Total time: 2102 seconds %61 improvement over sequential test case

45 Copyright © 2003, SAS Institute Inc. All rights reserved. Performance Results Summary 8-way Windows platform Server machine 2-way Windows platform remote client machines

46 Copyright © 2003, SAS Institute Inc. All rights reserved. Piping Test Case  two raw data files as input (sales and goals)  two data steps to read each file and subset into 4 quarters  four data steps to merge sales and goals for each of the 4 quarters

47 Copyright © 2003, SAS Institute Inc. All rights reserved. Sequential Processing Step 2 Data step separates by quarter Raw data file: Goals.txt Q1Q2Q4Q3 Step 1 Data step separates by quarter Raw data file: Sales.txt Q1Q2Q4Q3 Q1 SalesQ1 Goals Merge Step 3 Q2 SalesQ2 Goals Merge Step 4 Q3 SalesQ3 Goals Merge Step 5 Q4 GoalsQ4 Sales Merge Step 6 Total Time: 1274 seconds

48 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT - Independent Parallelism Total time: 439 seconds %Improvement over Sequential: 66% 4 Merges Step 2 Q4 Q1Q2Q3Q4Q1Q2 Q3 Sales.txt, raw data file Data step separates by quarter Goals.txt, raw data file Step 1

49 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT - Pipeline Parallelism Total time: 247 seconds %Improvement over Sequential: 81% %Improvement over MP Connect: 44% 4 Merges Q1 Sales.txt, raw data file Data step separates by quarter Q2Q3Q4 Goals.txt, raw data file Q1Q2Q3Q4 Step 1

50 Copyright © 2003, SAS Institute Inc. All rights reserved. Performance Results Summary 16-way IA-64 bit ES7000

51 Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations  You must have sufficient CPU and I/O processing resources. If this is not available on one machine, then the work can be farmed out to additional machines.  Piping is not effective when execution time for procs is short. Piping introduces a small performance overhead with the signon, a CPU overhead of additional processes and a complexity overhead to the SAS job.

52 Copyright © 2003, SAS Institute Inc. All rights reserved. Restrictions and Requirements  supports single pass SAS procs and data steps only  exs: SORT, SUMMARY, PRINT, DATA step, GANTT  used with dependent SAS tasks  when one SAS proc requires the output of another SAS proc  requires a SAS/CONNECT license

53 Copyright © 2003, SAS Institute Inc. All rights reserved. Building Blocks - Combining Solutions  combine scale up and scale out  combine multiple threads with multiple processes  combine SPDE engine with other scalable solutions  combine MP CONNECT with other scalable solutions

54 Copyright © 2003, SAS Institute Inc. All rights reserved. Gartner’s Definition of Grid Computing “a grid is a collection of resources owned by multiple organizations that is coordinated to allow them to solve a common problem”

55 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Cluster Environment  32 node Linux cluster / MOSIX  1 Ghz Intel P3 processors  1 G RAM per processor  100 Mb backplane

56 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Cluster Environment Estimated Work Time for Wait i No. Time/20 Entire Distribution Time/20 Host Host Iter Iter Problem Efficiency Iter 4 task4 3940 0:04:17 446:05 96% 0:00:07 17 task17 3920 0:04:17 446:03 96% 0:00:09 18 task18 3900 0:04:17 445:26 96% 0:00:10 Total elapsed time: 14:30:03 Cumulative working time: 447:46 Cumulative waiting time: 15:14:54 Scaling efficiency: 96.50%

57 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Grid Environment  100 heterogeneous nodes  W2K, WXP, variety of Unix OS’s  combination of V8 SAS and SAS 9

58 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT in Grid Environment Estimated Work Time for Wait i No. Time/30 Entire Distribution Time/30 Host Host Iter Iter Problem Efficiency Iter 7 ld055 570 0:15:18 1060:11 204% 0:00:00 48 in028 1230 0:06:52 476:07 91% 0:00:00 97 hd204 3120 0:02:40 184:42 35% 0:00:01 Total elapsed time: 5:12:19 Cumulative working time: 468:41 Cumulative waiting time: 0:39:42 Scaling efficiency: 90.04%

59 Copyright © 2003, SAS Institute Inc. All rights reserved. Combining Parallel Processes and Threads

60 Copyright © 2003, SAS Institute Inc. All rights reserved. SAS 9 Partitioned Data Model SAS ® 8SAS 9 SPDE Engine & SPD Server ® data index metadata data1 data2 data3 data4 Bitmap/B-tree Hybrid index Bitmap/B-tree Index metadata

61 Copyright © 2003, SAS Institute Inc. All rights reserved. SAS Partitioned Data data1workdata2data3data4data5 data7data6 150 MB/s 1000 MB/s high I/O bandwidth bandwidth Index 1 data8 Index 2 meta data9 data10 Raid Groups

62 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and SPDE Engine  single input, 4.8 GB, 20 million obs  two data steps, two PROC FREQs  4-way unix box  six iterations of implementation

63 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and SPDE Engine Data Step 1Data Step 2Proc Freq 1Proc Freq 2 partitioned input partitioned USER= 4 parallel sessions

64 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and SPDE Engine total improvement in elapsed time of 65%

65 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded SUMMARY  two raw input files (goals and sales)  two data steps to read input and calculate revenue and profit  two PROC SUMMARYs to summarize by region and classify data by employee number  final merge of projected and actual sales by region to determine those employees who met/exceeded goals

66 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded SUMMARY  two raw input files (~1.5G each)  8-way 900 MHz unix box  two data steps, two PROC SUMMARYs, and a merge

67 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded Summary Sales.txt Goals.txt 1 Step Merge Data step Summary

68 Copyright © 2003, SAS Institute Inc. All rights reserved. MP CONNECT and Threaded Summary total improvement in elapsed time of 70%

69 Copyright © 2003, SAS Institute Inc. All rights reserved. Considerations for Combining MP CONNECT and Threading  tune threads per session on SMP −CPUCOUNT −THREADS/NOTHREADS −OS processor set command  depends on −application, −data, and −hardware configuration

70 Copyright © 2003, SAS Institute Inc. All rights reserved. Summary  many ways that SAS 9 is addressing performance through scalability  scalability requires thought and balance between: −partitioned data −parallel I/O −parallel multi-threaded computation −data piping −parallel SAS sessions

71 Copyright © 2003, SAS Institute Inc. All rights reserved. Acknowledgements  MP CONNECT −Cheryl.Doninger@sas.comCheryl.Doninger@sas.com −Renee.Palmer@sas.comRenee.Palmer@sas.com  SPDE Engine −Billy.Clifford@sas.comBilly.Clifford@sas.com  Scalable SAS Procedures −Robert.Ray@sas.comRobert.Ray@sas.com −Robert.Cohen@sas.comRobert.Cohen@sas.com

72 Copyright © 2003, SAS Institute Inc. All rights reserved. For More Info…  Scalability and Performance Community −http://support.sas.com/rnd/scalabilityhttp://support.sas.com/rnd/scalability

73 Copyright © 2003, SAS Institute Inc. All rights reserved.


Download ppt "Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger."

Similar presentations


Ads by Google