A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
What we will cover Why we think that cluster programming models are not always enough for multi-core computing Why we think that Grid programming models are for many cases more appropriate Quick look over some programming models that have worked well in grids and we believe could be constructive in multi- core environments Look at where some of these ideas are reappearing in models for multi-core computing
Assumptions when programming clusters Nodes within an allocated set are all homogenous, both in terms of the configuration, and the loads being placed on them Once nodes have been allocated to a process they will not be used by any other user process until the first finishes
Assumptions when programming clusters Outside of very tightly coupled tasks on very large numbers of processors, the noise caused by other background tasks running on the node has a minimal effect on user processes Because nodes will run the same background tasks, large supercomputers able to handle the problem of background tasks through centralised control of when such tasks execute
Models for Programming Clusters Message passing, MPI Shared memory, OpenMP Embarrassingly parallel batch jobs
Properties of Multi-core systems Cores will be shared with a wide range of other applications dynamically Load can no longer be considered homogeneous across the cores Cores will likely not be heterogeneous as accelerators become common for scientific hardware Source code will often be unavailable, preventing compilation against the specific hardware configuration
Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)
Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)
Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)
Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)
Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)
Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)
Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)
Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)
Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)
Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)
Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)
Map-Reduce Developed by Google to simplify programming analysis functions to execute in their heterogeneous distributed computing environments Constructed around the ideas drawn from functional programming Has allowed the easy harnessing of huge amounts of computing power spanning many distributed resources
Boinc Developed by David Anderson at Berkley Is an abstracted version of the framework behind the project Designed to make the construction and management of trivially parallel tasks trivial Used by a range of other projects including climateprediction.net,
Martlet Developed for the analysis of data produced by the climateprediction.net project Based on ideas from functional programming Able to dynamically adjust the workflow to adapt to changing numbers of resources and data distributions
Grid-GUM and GpH Grid-GUM is a platform to support Glasgow parallel Haskell in a Grid environment Programmer defines places where programs could potentially have multiple threads executing Each processor has a thread and uses work stealing where possible to handle the dynamic and heterogeneous nature of tasks and resources. Intelligent scheduling reduces communication between disparate resources e.g. between machines or clusters
Styx Grid Services Developed at Reading University to huge amounts of analyse environmental data. Built on top of the Styx protocol originally developed for the P9 operating system Allows the effective construction of workflows pipelining processes Reduces the amount of data active in the system at any one time, and improves the performance of many stage analysis techniques
Abstract Grid Workflow Language (AGWL) XML based workflow language developed to hide much of the system dependant complexity AGWL never contains descriptions of the data transfer, partitioning of data, or locations of hardware At runtime, the underlying system examines the available resources and compiles the workflow into Concrete Grid Workflow Language automatically adding the detail
Programming Multi-Core Some ideas that appear in these projects are also appearing in some other places. These include; Microsoft’s LINQ constructs CodePlay’s Sieve Constructs Intel’s Thread Building Blocks API Dynamic DAG generation
LINQ Based on Lambda Calculus and now part of the.NET framework, LINQ is intended to provide a uniform way of accessing and applying functions to data stored in different data structures. This allows both the easy construction of pipelines, but also the automatic construction of parallel pipelines. This has much in common with Styx Grid Services.
Sieve Sieve is a range of language constructs and a supporting compiler that allows users to construct a range of parallel programming patterns. These patterns include marking points where the code can be split and automatically managing a pool of threads to execute this code complete with work stealing. This is the same pattern used by Grid-GUM and Glasgow Parallel Haskell
Thread Building Blocks Intel’s Thread Building Blocks is an API supporting a range of different parallel programming models. This includes divide and conquer methods and batch methods producing tasks to be handled by a thread pool, allowing dynamic load These are very similar to Boinc, Martlet and Map-Reduce
Dynamic Dependency Analysis Work carried out at a range of institutions including University of Tennessee and Oak Ridge National Laboratory Takes code written in a high level language, and dynamically converts this into a DAG of dependant tasks This can automatically generate thousands of tasks that can be scheduled to try and both keep all the cores busy all the time and adapt to changing resources
Conclusions Multi-core machine will operate in a much more heterogeneous and dynamic environment than clusters do today. Some aspects of grid computing have already started looking at the problems associated with such environments. Some approaches to programming multi-core machines already include some of these ideas. Functional programming appears a lot It is important that we remember why we must include such functionality in the models.