Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Parallel Programming on EGEE: Best practices.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Parallel Programming on EGEE: Best practices."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Parallel Programming on EGEE: Best practices Gergely Sipos MTA SZTAKI

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 2 Outline Parallel computing architectures –Supercomputers, clusters, EGEE grid Functional vs data parallelism Patterns and best practices for data parallelism –From jobs to master slave to workflow

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 3 Traditional distributed architectures: Shared memory architecture Multiple processors operate independently but share the same memory resources Only one processor can access the shared memory location at a time –Mutual exclusion provided by at system level Synchronization achieved by controlling tasks' reading from and writing to the shared memory Typical architecture of supercomputers CPU Memory

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 4 Traditional distributed architectures: Distributed memory architecture Multiple processors operate independently, each has its own private memory Data is shared across a network using message passing –User responsible for synchronization using message passing Typical architecture of clusters Memory CPU MemoryCPU Memory CPU Memory CPU Network

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 5 References SP Parallel Programming Workshop – Parallel Programming Introduction: http://www.mhpcc.edu/training/workshop/parallel_intro/ MAIN.html

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 6 Network EGEE architecture Some kind of distributed memory system: –Multiple processors operate independently, each has its own private memory (HDD & memory in a Computing element) –Direct communication between CEs is not possible (such as MPICH) Some kind of shared memory system: –Central services to share data between CEs (between jobs) (e.g. Storage elements) –Communicating through central services must be handled at user level  No mutual exclusion, locking, etc. Memory, HDD Computing Element Memory, HDD Computing Element Memory, HDD Computing Element Memory, HDD Computing Element „Shared memory” services Storage Elements LFC catalog AMGA database User Interface...

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 7 Functional Vs Data parallelism Functional Decomposition (Functional Parallelism) –Decomposing the problem into different jobs which can be distributed to multiple CEs for simultaneous execution –Different code run on different CEs –Good to use when there is not static structure or fixed determination of number of calculations to be performed Domain Decomposition (Data Parallelism) –Partitioning the problem's data domain and distributing portions to multiple instances of the same job for simultaneous execution –Same code runs on different CEs processing different data –Good to use for problems where:  data is static (e.g. factoring, solving large matrix or finite difference calculations, parameter studies)  dynamic data structure tied to single entity where entity can be subsetted (large multi- body problems)  domain is fixed but computation within various regions of the domain is dynamic (fluid vortices models) > 90% of grid applications employ data parallelism (parameter study)

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 8 Functional parallelism Job 1 on Computing Element #1 Job 2 on Computing Element #2 time The problem Job 3 on Computing Element #3 Job 4 on Computing Element #4 Same problem size does not guarantee equal execution time

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 9 Intra-job communication Memory, HDD Job 1 on Computing Element #1 Memory, HDD Job 2 on Computing Element #2 time Central service e.g. GFAL API or lcg-* for Storage Elements AMGA API for AMGA database LFC API for LFC catalog Sandboxes for UI The problem

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 10 Data parallelism Job 1 on Computing Element #1 Job 2 on Computing Element #2 Job 3 on Computing Element #3 Job 4 on Computing Element #4 The problem Same problem size does not guarantee equal execution time on the Grid

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 11 Data parallelism: Master-slave paradigm Master job Slave job Final result Local input Results User process running on the UI or on a central server such as WMS P-GRADE Portal server GANGA server GridWay server

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 12 Structure of the master Master job Slave job Final result Inputs Results Generate inputs Spawn slaves Monitor slaves Collect results Generate final result Job submit Check job status Get job output

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 13 Data distibution techniques One Dimensional Data Distribution Block Distribution Cyclic Distribution Parameter p 3 jobs Parameter p n > 3 jobs Parameter p Parameter q To slave 1 To slave 2 To slave 3 Two Dimensional Data Distribution Cyclic block Block Cyclic Block Block

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 14 Choose job number carefully Less jobs  long jobs: –Smaller submission overhead  Middleware overhead 5-10 minutes / job  Waiting queue overhead 0-X minutes / job  depends on the VO –Unequal utilization of resources  Slow and fast resources must do the same amount of work More jobs  short jobs: –Better load balancing  Faster machines do more  Overall execution time can be shorter –Submission overhead is bigger

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 15 Distribution of large data sets Slaves receives only data reference from master and download real data from Storage Element, AMGA, etc. Slaves put results into Storage Elements, AMGA, etc. and return references Master job Slave job References e.g. LFNs Generate local inputs Spawn slaves Monitor slaves Collect local results Generate result Job submit Check job status Get job output Central service Inputs Results References e.g. LFNs

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 16 Multi-level master-slave Master job Slave job Final result Input Results Generate inputs Spawn slaves Monitor slaves Collect results Job submit Check job status Get job output Master job Slave job Input Results Generate inputs Spawn slaves Monitor slaves Collect results Job submit Check job status Get job output Generate final result

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 17 Complex master-slave Final result Master job Slave job input results Generate inputs Spawn slaves Monitor slaves Collect results Slave job input results Generate inputs Spawn slaves Monitor slaves Collect results Slave job input results Generate inputs Spawn slaves Monitor slaves Collect results Generate final result

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 18 Complex master-slave = workflow Final result Master job Slave job input results Generate local inputs Spawn slaves Monitor slaves Collect local results Slave job input results Generate local inputs Spawn slaves Monitor slaves Collect local results Slave job input results Generate local inputs Spawn slaves Monitor slaves Collect local results Generate result Workflow manager 2 input

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 19 Workflow managers Mechanisms to tie pieces of application together in standard ways Better than doing it yourself –workflow systems handle many of the gritty details  you could implement them yourself  you would do it very badly (trust me) –useful 'additional' functionality beyond basic plumbing such as  Failure management  Resubmission  Data conversion Different requirements per scientific discipline or by application –Support for multiple levels of parallelization –Data semantics and / or Control flow semantics –Monitoring (especially for long-running workflows) –...

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 20 Askalon Bigbross Bossa Bea's WLI BioPipe BizTalk BPWS4J Breeze Carnot Con:cern DAGMan DiscoveryNet Dralasoft Enhydra Shark Filenet Fujitsu's i-Flow GridAnt Grid Job Handler GRMS (GridLab Resource Management System) Microsoft WWF Moteur NetWeaver Oakgrove's reactor ObjectWeb Bonita OFBiz OMII-BPEL Open Business Engine Oracle's integration platform OSIRIS OSWorkflow OpenWFE Q-Link Pegasus Pipeline Pilot Platform Process Manager P-GRADE PowerFolder PtolemyII Savvion Seebeyond GWFE GWES IBM's holosofx tool IT Innovation Enactment Engine ICENI Inforsense Intalio jBpm JIGSA JOpera Kepler Karajan Lombardi Microsoft WWF Sonic's orchestration server Staffware ScyFLOW SDSC Matrix SHOP2 Swift Taverna Triana Twister Ultimus Versata WebMethod's process modeling wftk XFlow YAWL Engine WebAndFlo Wildfire Werkflow wfmOpen WFEE ZBuilder …… Many workflow systems for different grid middleware

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 21 Askalon Bigbross Bossa Bea's WLI BioPipe BizTalk BPWS4J Breeze Carnot Con:cern DAGMan DiscoveryNet Dralasoft Enhydra Shark Filenet Fujitsu's i-Flow GridAnt Grid Job Handler GRMS (GridLab Resource Management System) Microsoft WWF Moteur NetWeaver Oakgrove's reactor ObjectWeb Bonita OFBiz OMII-BPEL Open Business Engine Oracle's integration platform OSIRIS OSWorkflow OpenWFE Q-Link Pegasus Pipeline Pilot Platform Process Manager P-GRADE PowerFolder PtolemyII Savvion Seebeyond GWFE GWES IBM's holosofx tool IT Innovation Enactment Engine ICENI Inforsense Intalio jBpm JIGSA JOpera Kepler Karajan Lombardi Microsoft WWF Sonic's orchestration server Staffware ScyFLOW SDSC Matrix SHOP2 Swift Taverna Triana Twister Ultimus Versata WebMethod's process modeling wftk XFlow YAWL Engine WebAndFlo Wildfire Werkflow wfmOpen WFEE ZBuilder …… Many workflow systems for different grid middleware gLite WMS EGEE Biomed community EGEE related DILIGENT project

22 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 22 During the course gLite WMS –Parametric jobs (master-slave) –DAG (workflow) GANGA –Parameter studies (master-slave) P-GRADE Portal –Workflows –Parameter studies (master-slave) –Workflow based parameter studies

23 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Thank you Questions?

24 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 24 Case studies Not related to EGEE at all. Case studies in general Numeric Weather Prediction Model, developed by Glenn Wightwick of IBM Australia Science & Technology –2D mesh –http://www.mhpcc.edu/training/workshop/parallel_intro/nwp_case _study.htmlhttp://www.mhpcc.edu/training/workshop/parallel_intro/nwp_case _study.html Monte Carlo Cellular Microphysiology –Parameter study –http://whitepapers.techrepublic.com.com/casestudy.aspx?docid= 109125http://whitepapers.techrepublic.com.com/casestudy.aspx?docid= 109125


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Parallel Programming on EGEE: Best practices."

Similar presentations


Ads by Google