Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.OASUS.ca Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali.

Similar presentations


Presentation on theme: "Www.OASUS.ca Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali."— Presentation transcript:

1 www.OASUS.ca Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali

2 www.OASUS.caAgenda  Grid Computing  Introduction to Parallel Processing  Type of Grids  Why and When to Use Grid  Early Findings  Grid Components  Considerations When using SAS Grid  SAS/CONNECT  MP/CONNECT (example)  Questions / Comments

3 www.OASUS.ca Introduction to Parallel Processing Unsorted Deck Sorted Deck Illustration

4 www.OASUS.ca Introduction to Parallel Processing Unsorted Deck Sorted Deck Illustration

5 www.OASUS.ca Introduction to Parallel Processing 1 Minute 30 Seconds Standard Approach Unsorted Deck Sorted Deck

6 www.OASUS.ca 45 Seconds Parallel Approach Unsorted Deck Sorted Deck Introduction to Parallel Processing

7 www.OASUS.ca Introduction to Parallel Processing  Parallel Processing Can Reduce Elapsed Time  “Pipeline Parallelism” Can Reduce Elapsed Time Even Further Card Experiment vs. Parallel / Grid Computing  Optimal Number of Processes Can Reduce Elapsed Time  Some “Processors” Are Faster Than Others  Data / Software Preparation Is Almost Always Required

8 www.OASUS.ca iMac Machine X DataSAS/CONNECT

9 www.OASUS.caSAS/CONNECT %LET server=F8DEV01; OPTIONS REMOTE=server; SIGNON; RSUBMIT; data work.test; A = 10; run; ENDRSUBMIT; SIGNOFF;

10 www.OASUS.ca iMac SAS/CONNECT (Pre SAS Version 8) Synchronous Processing

11 www.OASUS.ca LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; PROC SORT DATA=IN.DATA2; BY KEY; RUN; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; SAS/CONNECT (Pre SAS Version 8)

12 www.OASUS.ca SAS/CONNECT (Pre SAS Version 8) Sort Data1 Sort Data2 Merge Both iMac Sort Data1 Sort Data2 Merge Both Results

13 www.OASUS.ca iMac iMac SAS/CONNECT (Starting In SAS Version 8) MP/CONNECT Asynchronous Processing

14 www.OASUS.ca 14 LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; PROC SORT DATA=IN.DATA2; BY KEY; RUN; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; SAS/CONNECT (Starting In SAS Version 8)

15 www.OASUS.ca Sort Data1 Sort Data2 Merge BothMP/CONNECT Sort Data2 Merge Both iMac iMac Sort Data1 Sort Data2 Sort Data1 Sort Results Sort Results iMac Merge Both Results

16 www.OASUS.caMP/CONNECT /****** SORT DATA1 ******/ %LET remote1=F8DEV01; OPTIONS AUTOSIGNON=YES; RSUBMIT PROCESS=remote1 WAIT=NO; LIBNAME data1 "\\F8DEV01\PFM-System\Tools"; proc sort data=data1.data1; by city; run; ENDRSUBMIT; /****** SORT DATA2 ******/ %LET remote2=F8TEST01; OPTIONS AUTOSIGNON=YES; RSUBMIT PROCESS=remote2 WAIT=NO; LIBNAME data2 "\\F8DEV01\PFM-System\Tools"; proc sort data=data2.data2; by city; run; ENDRSUBMIT;

17 www.OASUS.caMP/CONNECT WAITFOR _all_ remote1 remote2 /****** MERGE DATA1 & DATA2 ******/ %LET remote3=F8PROD01; OPTIONS AUTOSIGNON=YES; RSUBMIT PROCESS=remote3; LIBNAME both "\\F8DEV01\PFM-System\Tools"; data both.sorted; merge both.data1 both.data2; by city; run; ENDRSUBMIT;

18 www.OASUS.ca “A parallel processing architecture in which computer resources are shared across a network and all machines function as one large supercomputer.” Grid Computing

19 www.OASUS.ca  Utility Grid  Compute Grid  Multiple users that require processing  Multiple machines available to process  Dynamic allocation of process to available machine  Task that can be decomposed into sub-units  Sub-units dynamically allocated to available machines  Sub-units able to run in parallel Grid Computing

20 www.OASUS.ca Grid Computing Grid Computing Why Use  Budget constraints  Higher volume of Data  Tighter processing schedules  Idle processing power of existing hardware  Centrally Managed Hardware & Infrastructure

21 www.OASUS.ca Grid Computing Grid Computing When To Use  Applications requiring hours / days to process  Applications that are more processing intensive  Applications that can be decomposed into sub-tasks

22 www.OASUS.ca  Optimization in a grid of PC Laptops Case 1  60 laptops (266 - 400 Mhz)  600 Sales Territories 87% Improvement 92% Improvement Total Elapsed Time Grid Computing Grid Computing Early Findings

23 www.OASUS.ca Grid Computing Grid Computing Early Findings Case 2 – NIEHS - Heterogeneous Grid 99% Improvement Total Elapsed Time  100 nodes running mixture of W2K, WXP, variety of Unix OS’s  Combination of SAS v8 and SAS v9 on nodes

24 www.OASUS.ca Grid Infrastructure SAS ® Programs\Data Grid Controller / Manager SAS Grid Solution SAS ® Grid Solution Grid Computing

25 www.OASUS.ca Grid Computing Grid Computing Grid Infrastructure  SAS\CONNECT ® iMac iMac Asynchronous Connections  SAS\MPCONNECT ®

26 www.OASUS.ca Grid Computing Grid Computing Grid Controller / Manager (Then)

27 www.OASUS.ca Grid Computing Grid Computing Grid Controller / Manager (Then)

28 www.OASUS.ca Grid Computing Grid Computing Grid Controller / Manager (Now)

29 www.OASUS.ca Grid Computing Grid Computing SAS ® Programs\Data (Then & Now) LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; PROC SORT DATA=IN.DATA2; BY KEY; RUN; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA1; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; PROC SORT DATA=IN.DATA2; BY KEY; RUN; LIBNAME IN ‘\\Server1\Input’; LIBNAME OUT ‘\\Server1\Output’; DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY; RUN;

30 www.OASUS.ca Considerations When Using SAS Grid Vecdet Mehmet-Ali SAS Grid Now @ Statistics Canada!

31 www.OASUS.ca From Dream to Reality – Introducing the SAS Grid  Presented to: Informatics Branch  May 6, 2014 Yves DeGuire Section Chief SAS Technology Center System Engineering Division Statistics Canada

32 www.OASUS.ca What is Grid Computing? Emerged in the academic research community with 2 primary goals: Reduce overall elapsed processing time Leverage commodity hardware Became mainstream with the SETI@Home project Today: a sophisticated computer infrastructure for the Enterprise with scalability, load balancing and high availability.

33 www.OASUS.ca Use Case #3: Parallel Processing  Long running jobs broken into smaller tasks and dispatched to the grid.  Likely submitted as a batch job.  SAS programs must be modified first using MP Connect directives:  Manually  or Using SAS SCAPROC  Another option: the SAS Data Integration loop transformation  The easiest: directly from EG process flow!  Myth: a SAS program will execute in parallel without any modifications!

34 www.OASUS.ca Parallel Processing & Grid Computing with SAS

35 www.OASUS.ca G-Tab (Generalized Tabulation System)  Input: –Table specifications(xml) –Micro Data

36 www.OASUS.ca G-Tab (Generalized Tabulation System) G-Tab Input data Xml fil e Tabulated Output

37 www.OASUS.ca G-Tab (Generalized Tabulation System)  Table specifications(xml)  Domain variable list (Ex: Region, Province, AgeGroup, Sex, etc.)  Analysis variable list (Ex: (Income, Expense, etc.)  Weight variable (Ex: SWeight)  Bootstrap weight variable specification (Ex: BSW1-BSW1000)  Statistics: Level-1: ( MEAN,MAX,MIN,SUM,N,SUMWGT,MEDIAN,P1,P5,..,P99 ) –Calculated by PROC MEANS on Micro Data Level-2: (GINI,GEOMEAN) –Calculated by special algorithm on Micro Data Level-3: (DISTRIBUTION,PROPORTION,RATIO) –Calculated by using the results of Level-1 statistics –Example (RATIO): MEAN(Income) / MEAN(Expense)

38 www.OASUS.ca G-Tab (Generalized Tabulation System)  Precision Measures (Bootstrap Variance Method)  VAR (Variance)  STD (Standard Deviation)  CV (Coefficient of Variation)  CILB (Confidence Interval Lower Bound)  CIUB (Confidence Interval Upper Bound)  QI (Quality Indicator)

39 www.OASUS.ca G-Tab (Sequential Processing) Process Flow Level-1Level-2Level-3 Precision Measures

40 www.OASUS.ca G-Tab (Sequential Processing) Data Flow Level-1 Statistics Level-2 GINI Level-3 Statistics Precision Measures Input data Level-2 GEOMEAN

41 www.OASUS.ca Considerations for Parallel Processing  Can your job be divided into independent tasks?  Many SAS programs contain modules that are independent.  On a single server these tasks are performed sequentially.  On the Grid they can be processed in parallel sessions.  Identify dependent and independent tasks  A task is dependent if it requires output from another task  Finally consider the length of time required to process each task.  If the tasks are short and take little time to process, you might not be able to offset the time required to start up multiple Grid sessions.

42 www.OASUS.ca G-Tab (Task Dependency) Data Flow Level-1 Statistics Level-2 GINI Level-3 Statistics Precision Measures Input data Level-2 GEOMEAN

43 www.OASUS.ca G-Tab Processing on the Grid Precision Measures Input Data split Level-1 Statistics Level-2 Gini Level-2 GeoMean Level-3 Statistics G-Tab Grid node Partial resul t Partial result Partial result Partial result Final Result

44 www.OASUS.ca G-Tab Level-1 Statistics Precision Measures Input Data split Level-1 Statistics G-Tab Grid node Partial resul t

45 www.OASUS.ca  Table specifications(xml)  Domain variable list (Ex: Region, Province, AgeGroup, Sex, etc.)  Analysis variable list (Ex: (Income, Expense, etc.)  Weight variable (Ex: SWeight)  Bootstrap weight variable specification (Ex: BSW1-BSW1000)  Statistics: Level-1: ( MEAN,MAX,MIN,SUM,N,SUMWGT,MEDIAN,P1,P5,..,P99 ) –Calculated by PROC MEANS on Micro Data Level-2: (GINI,GEOMEAN) –Calculated by special algorithm on Micro Data Level-3: (DISTRIBUTION,PROPORTION,RATIO) –Calculated from the results of Level-1 statistics –Example (RATIO): MEAN(Income) / MEAN(Expense) G-Tab (Generalized Tabulation System)

46 www.OASUS.ca G-Tab Sample Input Data ProvinceAgeGroupSexIncomeSWeightBSW1BSW2……BSW1000

47 www.OASUS.ca Proc means data=.. noprint ; Class province agegroup sex ; Var income / sweight ; /* (BSW1 – BSW1000) */ Output out=.. Mean= ; Run;  Repetitive task  Split data for parallel processing G-Tab Level-1 Statistics

48 www.OASUS.ca Level-1 Statistics Sub-task(1) Input Data ProvinceAgeGroupSexIncomeSWeightBSW1BSW2……BSW250

49 www.OASUS.ca Level-1 Statistics Sub-task(2) Input Data ProvinceAgeGroupSexIncomeSWeightBSW251BSW252……BSW500

50 www.OASUS.ca Level-1 Statistics Sub-task(3) Input Data ProvinceAgeGroupSexIncomeSWeightBSW501BSW502……BSW750

51 www.OASUS.ca Level-1 Statistics Sub-task(4) Input Data ProvinceAgeGroupSexIncomeSWeightBSW751BSW752……BSW1000

52 www.OASUS.ca Level-1 Statistics Sub-task(1) Results ProvinceAgeGroupSexIncome_MeanIncome1_MeanIncome2_Mean…Income250_Mean

53 www.OASUS.ca Level-1 Statistics Sub-task(2) Results ProvinceAgeGroupSexIncome251_MeanIncome252_Mean…Income500_Mean

54 www.OASUS.ca Level-1 Statistics Sub-task(3) Results ProvinceAgeGroupSexIncome501_MeanIncome502_Mean…Income750_Mean

55 www.OASUS.ca Level-1 Statistics Sub-task(4) Results ProvinceAgeGroupSexIncome751_MeanIncome752_Mean…Income1000_Mean

56 www.OASUS.ca Level-1 Statistics Results ProvinceAgeGroupSexIncome_MeanIncome1_MeanIncome2_Mean…Income1000_Mean

57 www.OASUS.ca G-Tab Parallel Processing Level-1 Results Input Data split Level-1 SWeight BSW1-BSW250 Level-1 BSW251-BSW500 Level-1 BSW501-BSW750 Level-1 BSW751-BSW1000 G-Tab Grid node Partial result Partial result Partial result Partial result

58 www.OASUS.ca G-Tab Precision Measures Let Y be the statistic to be considered. For example Ŷ can be a mean, a median, a sum, etc. The variance of Ŷ is given by: Where Ŷ j is the statistic calculated using the j th Bootstrap weight, B is the number of Bootstrap weights, Ŷ is the estimate produced using the Survey weight. Quality Indicator of the statistic is set based on the above calculations. Standard Deviation: Coefficient of Variation:

59 www.OASUS.ca Notes In the example:  The input data was sliced vertically into 4.  This gave the BEST elapsed processing time for average surveys.  Slicing it into 5 sessions(200 BSW each) took longer to run.  For bigger volume, 5 sessions could give better results. Other Considerations:  Slice the input data horizontally  Time cycles Warning: Maintain data integrity

60 www.OASUS.ca Conclusion  Grid is a sophisticated computer infrastructure for the Enterprise with scalability, load balancing and high availability.  A SAS program will NOT execute in parallel without any modifications!  It must be modified first using MP Connect directives to run in parallel.  Long running jobs should be broken into smaller tasks and dispatched to the grid.  Parallel processing will reduce the overall elapsed processing time. The Future Of Grid Computing Is Now Here!

61 www.OASUS.ca Questions / Comments Greg McLean Project Leader System Engineering Division Statistics Canada Jean Talon Building 5 th Floor Section A6 170, Tunney’s Pasture driveway Ottawa, Ont., K1A 0T6 (613) 951-2396 Greg.McLean@statcan.gc.ca Vecdet Mehmet-Ali Project Leader System Engineering Division Statistics Canada Jean Talon Building 5 th Floor Section A2 170, Tunney’s Pasture driveway Ottawa, Ont., K1A 0T6 (613) 951-2390 Vecdet.Mehmet-Ali@statcan.gc.ca


Download ppt "Www.OASUS.ca Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali."

Similar presentations


Ads by Google