Download presentation
Presentation is loading. Please wait.
Published byLesley Lyons Modified over 9 years ago
1
www.OASUS.ca Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali
2
www.OASUS.caAgenda Grid Computing Introduction to Parallel Processing Type of Grids Why and When to Use Grid Early Findings Grid Components Considerations When using SAS Grid SAS/CONNECT MP/CONNECT (example) Questions / Comments
3
www.OASUS.ca Introduction to Parallel Processing Unsorted Deck Sorted Deck Illustration
4
www.OASUS.ca Introduction to Parallel Processing Unsorted Deck Sorted Deck Illustration
5
www.OASUS.ca Introduction to Parallel Processing 1 Minute 30 Seconds Standard Approach Unsorted Deck Sorted Deck
6
www.OASUS.ca 45 Seconds Parallel Approach Unsorted Deck Sorted Deck Introduction to Parallel Processing
7
www.OASUS.ca Introduction to Parallel Processing Parallel Processing Can Reduce Elapsed Time “Pipeline Parallelism” Can Reduce Elapsed Time Even Further Card Experiment vs. Parallel / Grid Computing Optimal Number of Processes Can Reduce Elapsed Time Some “Processors” Are Faster Than Others Data / Software Preparation Is Almost Always Required
8
www.OASUS.ca iMac Machine X DataSAS/CONNECT
9
www.OASUS.caSAS/CONNECT %LET server=F8DEV01; OPTIONS REMOTE=server; SIGNON; RSUBMIT; data work.test; A = 10; run; ENDRSUBMIT; SIGNOFF;
10
www.OASUS.ca iMac SAS/CONNECT (Pre SAS Version 8) Synchronous Processing
11
www.OASUS.ca LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; PROC SORT DATA=IN.DATA2; BY KEY; RUN; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; SAS/CONNECT (Pre SAS Version 8)
12
www.OASUS.ca SAS/CONNECT (Pre SAS Version 8) Sort Data1 Sort Data2 Merge Both iMac Sort Data1 Sort Data2 Merge Both Results
13
www.OASUS.ca iMac iMac SAS/CONNECT (Starting In SAS Version 8) MP/CONNECT Asynchronous Processing
14
www.OASUS.ca 14 LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; PROC SORT DATA=IN.DATA2; BY KEY; RUN; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; SAS/CONNECT (Starting In SAS Version 8)
15
www.OASUS.ca Sort Data1 Sort Data2 Merge BothMP/CONNECT Sort Data2 Merge Both iMac iMac Sort Data1 Sort Data2 Sort Data1 Sort Results Sort Results iMac Merge Both Results
16
www.OASUS.caMP/CONNECT /****** SORT DATA1 ******/ %LET remote1=F8DEV01; OPTIONS AUTOSIGNON=YES; RSUBMIT PROCESS=remote1 WAIT=NO; LIBNAME data1 "\\F8DEV01\PFM-System\Tools"; proc sort data=data1.data1; by city; run; ENDRSUBMIT; /****** SORT DATA2 ******/ %LET remote2=F8TEST01; OPTIONS AUTOSIGNON=YES; RSUBMIT PROCESS=remote2 WAIT=NO; LIBNAME data2 "\\F8DEV01\PFM-System\Tools"; proc sort data=data2.data2; by city; run; ENDRSUBMIT;
17
www.OASUS.caMP/CONNECT WAITFOR _all_ remote1 remote2 /****** MERGE DATA1 & DATA2 ******/ %LET remote3=F8PROD01; OPTIONS AUTOSIGNON=YES; RSUBMIT PROCESS=remote3; LIBNAME both "\\F8DEV01\PFM-System\Tools"; data both.sorted; merge both.data1 both.data2; by city; run; ENDRSUBMIT;
18
www.OASUS.ca “A parallel processing architecture in which computer resources are shared across a network and all machines function as one large supercomputer.” Grid Computing
19
www.OASUS.ca Utility Grid Compute Grid Multiple users that require processing Multiple machines available to process Dynamic allocation of process to available machine Task that can be decomposed into sub-units Sub-units dynamically allocated to available machines Sub-units able to run in parallel Grid Computing
20
www.OASUS.ca Grid Computing Grid Computing Why Use Budget constraints Higher volume of Data Tighter processing schedules Idle processing power of existing hardware Centrally Managed Hardware & Infrastructure
21
www.OASUS.ca Grid Computing Grid Computing When To Use Applications requiring hours / days to process Applications that are more processing intensive Applications that can be decomposed into sub-tasks
22
www.OASUS.ca Optimization in a grid of PC Laptops Case 1 60 laptops (266 - 400 Mhz) 600 Sales Territories 87% Improvement 92% Improvement Total Elapsed Time Grid Computing Grid Computing Early Findings
23
www.OASUS.ca Grid Computing Grid Computing Early Findings Case 2 – NIEHS - Heterogeneous Grid 99% Improvement Total Elapsed Time 100 nodes running mixture of W2K, WXP, variety of Unix OS’s Combination of SAS v8 and SAS v9 on nodes
24
www.OASUS.ca Grid Infrastructure SAS ® Programs\Data Grid Controller / Manager SAS Grid Solution SAS ® Grid Solution Grid Computing
25
www.OASUS.ca Grid Computing Grid Computing Grid Infrastructure SAS\CONNECT ® iMac iMac Asynchronous Connections SAS\MPCONNECT ®
26
www.OASUS.ca Grid Computing Grid Computing Grid Controller / Manager (Then)
27
www.OASUS.ca Grid Computing Grid Computing Grid Controller / Manager (Then)
28
www.OASUS.ca Grid Computing Grid Computing Grid Controller / Manager (Now)
29
www.OASUS.ca Grid Computing Grid Computing SAS ® Programs\Data (Then & Now) LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; PROC SORT DATA=IN.DATA2; BY KEY; RUN; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN;
30
www.OASUS.ca Considerations When Using SAS Grid Vecdet Mehmet-Ali SAS Grid Now @ Statistics Canada!
31
www.OASUS.ca From Dream to Reality – Introducing the SAS Grid Presented to: Informatics Branch May 6, 2014 Yves DeGuire Section Chief SAS Technology Center System Engineering Division Statistics Canada
32
www.OASUS.ca What is Grid Computing? Emerged in the academic research community with 2 primary goals: Reduce overall elapsed processing time Leverage commodity hardware Became mainstream with the SETI@Home project Today: a sophisticated computer infrastructure for the Enterprise with scalability, load balancing and high availability.
33
www.OASUS.ca Use Case #3: Parallel Processing Long running jobs broken into smaller tasks and dispatched to the grid. Likely submitted as a batch job. SAS programs must be modified first using MP Connect directives: Manually or Using SAS SCAPROC Another option: the SAS Data Integration loop transformation The easiest: directly from EG process flow! Myth: a SAS program will execute in parallel without any modifications!
34
www.OASUS.ca Parallel Processing & Grid Computing with SAS
35
www.OASUS.ca G-Tab (Generalized Tabulation System) Input: –Table specifications(xml) –Micro Data
36
www.OASUS.ca G-Tab (Generalized Tabulation System) G-Tab Input data Xml fil e Tabulated Output
37
www.OASUS.ca G-Tab (Generalized Tabulation System) Table specifications(xml) Domain variable list (Ex: Region, Province, AgeGroup, Sex, etc.) Analysis variable list (Ex: (Income, Expense, etc.) Weight variable (Ex: SWeight) Bootstrap weight variable specification (Ex: BSW1-BSW1000) Statistics: Level-1: ( MEAN,MAX,MIN,SUM,N,SUMWGT,MEDIAN,P1,P5,..,P99 ) –Calculated by PROC MEANS on Micro Data Level-2: (GINI,GEOMEAN) –Calculated by special algorithm on Micro Data Level-3: (DISTRIBUTION,PROPORTION,RATIO) –Calculated by using the results of Level-1 statistics –Example (RATIO): MEAN(Income) / MEAN(Expense)
38
www.OASUS.ca G-Tab (Generalized Tabulation System) Precision Measures (Bootstrap Variance Method) VAR (Variance) STD (Standard Deviation) CV (Coefficient of Variation) CILB (Confidence Interval Lower Bound) CIUB (Confidence Interval Upper Bound) QI (Quality Indicator)
39
www.OASUS.ca G-Tab (Sequential Processing) Process Flow Level-1Level-2Level-3 Precision Measures
40
www.OASUS.ca G-Tab (Sequential Processing) Data Flow Level-1 Statistics Level-2 GINI Level-3 Statistics Precision Measures Input data Level-2 GEOMEAN
41
www.OASUS.ca Considerations for Parallel Processing Can your job be divided into independent tasks? Many SAS programs contain modules that are independent. On a single server these tasks are performed sequentially. On the Grid they can be processed in parallel sessions. Identify dependent and independent tasks A task is dependent if it requires output from another task Finally consider the length of time required to process each task. If the tasks are short and take little time to process, you might not be able to offset the time required to start up multiple Grid sessions.
42
www.OASUS.ca G-Tab (Task Dependency) Data Flow Level-1 Statistics Level-2 GINI Level-3 Statistics Precision Measures Input data Level-2 GEOMEAN
43
www.OASUS.ca G-Tab Processing on the Grid Precision Measures Input Data split Level-1 Statistics Level-2 Gini Level-2 GeoMean Level-3 Statistics G-Tab Grid node Partial resul t Partial result Partial result Partial result Final Result
44
www.OASUS.ca G-Tab Level-1 Statistics Precision Measures Input Data split Level-1 Statistics G-Tab Grid node Partial resul t
45
www.OASUS.ca Table specifications(xml) Domain variable list (Ex: Region, Province, AgeGroup, Sex, etc.) Analysis variable list (Ex: (Income, Expense, etc.) Weight variable (Ex: SWeight) Bootstrap weight variable specification (Ex: BSW1-BSW1000) Statistics: Level-1: ( MEAN,MAX,MIN,SUM,N,SUMWGT,MEDIAN,P1,P5,..,P99 ) –Calculated by PROC MEANS on Micro Data Level-2: (GINI,GEOMEAN) –Calculated by special algorithm on Micro Data Level-3: (DISTRIBUTION,PROPORTION,RATIO) –Calculated from the results of Level-1 statistics –Example (RATIO): MEAN(Income) / MEAN(Expense) G-Tab (Generalized Tabulation System)
46
www.OASUS.ca G-Tab Sample Input Data ProvinceAgeGroupSexIncomeSWeightBSW1BSW2……BSW1000
47
www.OASUS.ca Proc means data=.. noprint ; Class province agegroup sex ; Var income / sweight ; /* (BSW1 – BSW1000) */ Output out=.. Mean= ; Run; Repetitive task Split data for parallel processing G-Tab Level-1 Statistics
48
www.OASUS.ca Level-1 Statistics Sub-task(1) Input Data ProvinceAgeGroupSexIncomeSWeightBSW1BSW2……BSW250
49
www.OASUS.ca Level-1 Statistics Sub-task(2) Input Data ProvinceAgeGroupSexIncomeSWeightBSW251BSW252……BSW500
50
www.OASUS.ca Level-1 Statistics Sub-task(3) Input Data ProvinceAgeGroupSexIncomeSWeightBSW501BSW502……BSW750
51
www.OASUS.ca Level-1 Statistics Sub-task(4) Input Data ProvinceAgeGroupSexIncomeSWeightBSW751BSW752……BSW1000
52
www.OASUS.ca Level-1 Statistics Sub-task(1) Results ProvinceAgeGroupSexIncome_MeanIncome1_MeanIncome2_Mean…Income250_Mean
53
www.OASUS.ca Level-1 Statistics Sub-task(2) Results ProvinceAgeGroupSexIncome251_MeanIncome252_Mean…Income500_Mean
54
www.OASUS.ca Level-1 Statistics Sub-task(3) Results ProvinceAgeGroupSexIncome501_MeanIncome502_Mean…Income750_Mean
55
www.OASUS.ca Level-1 Statistics Sub-task(4) Results ProvinceAgeGroupSexIncome751_MeanIncome752_Mean…Income1000_Mean
56
www.OASUS.ca Level-1 Statistics Results ProvinceAgeGroupSexIncome_MeanIncome1_MeanIncome2_Mean…Income1000_Mean
57
www.OASUS.ca G-Tab Parallel Processing Level-1 Results Input Data split Level-1 SWeight BSW1-BSW250 Level-1 BSW251-BSW500 Level-1 BSW501-BSW750 Level-1 BSW751-BSW1000 G-Tab Grid node Partial result Partial result Partial result Partial result
58
www.OASUS.ca G-Tab Precision Measures Let Y be the statistic to be considered. For example Ŷ can be a mean, a median, a sum, etc. The variance of Ŷ is given by: Where Ŷ j is the statistic calculated using the j th Bootstrap weight, B is the number of Bootstrap weights, Ŷ is the estimate produced using the Survey weight. Quality Indicator of the statistic is set based on the above calculations. Standard Deviation: Coefficient of Variation:
59
www.OASUS.ca Notes In the example: The input data was sliced vertically into 4. This gave the BEST elapsed processing time for average surveys. Slicing it into 5 sessions(200 BSW each) took longer to run. For bigger volume, 5 sessions could give better results. Other Considerations: Slice the input data horizontally Time cycles Warning: Maintain data integrity
60
www.OASUS.ca Conclusion Grid is a sophisticated computer infrastructure for the Enterprise with scalability, load balancing and high availability. A SAS program will NOT execute in parallel without any modifications! It must be modified first using MP Connect directives to run in parallel. Long running jobs should be broken into smaller tasks and dispatched to the grid. Parallel processing will reduce the overall elapsed processing time. The Future Of Grid Computing Is Now Here!
61
www.OASUS.ca Questions / Comments Greg McLean Project Leader System Engineering Division Statistics Canada Jean Talon Building 5 th Floor Section A6 170, Tunney’s Pasture driveway Ottawa, Ont., K1A 0T6 (613) 951-2396 Greg.McLean@statcan.gc.ca Vecdet Mehmet-Ali Project Leader System Engineering Division Statistics Canada Jean Talon Building 5 th Floor Section A2 170, Tunney’s Pasture driveway Ottawa, Ont., K1A 0T6 (613) 951-2390 Vecdet.Mehmet-Ali@statcan.gc.ca
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.