Jeremy Martin Alex Tiskin

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Spark: Cluster Computing with Working Sets
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Distributed components
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Reference: Message Passing Fundamentals.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Parallel Programming Models and Paradigms
Communication [Lower] Bounds for Heterogeneous Architectures Julian Bui.
Strategies for Implementing Dynamic Load Sharing.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Heterogeneous Parallelization for RNA Structure Comparison Eric Snow, Eric Aubanel, and Patricia Evans University of New Brunswick Faculty of Computer.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Bulk Synchronous Parallel Processing Model Jamie Perkins.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Example: Sorting on Distributed Computing Environment Apr 20,
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Data Structures and Algorithms in Parallel Computing Lecture 4.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
TensorFlow– A system for large-scale machine learning
Prepared by Oussama Jebbar
Overview Parallel Processing Pipelining
Clouds , Grids and Clusters
Introduction to Distributed Platforms
Alternatives to Mobile Agents
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
For Massively Parallel Computation The Chaotic State of the Art
Pattern Parallel Programming
PREGEL Data Management in the Cloud
Grid Computing.
CSC 480 Software Engineering
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Physical Architecture Layer Design
University of Technology
Client-Server Interaction
湖南大学-信息科学与工程学院-计算机与科学系
Threads, SMP, and Microkernels
Chapter 4: Threads.
Chapter 17: Database System Architectures
Summary Background Introduction in algorithms and applications
Outline Midterm results summary Distributed file systems – continued
CS110: Discussion about Spark
Scalable Parallel Interoperable Data Analytics Library
2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009
Lecture 4- Threads, SMP, and Microkernels
Hybrid Programming with OpenMP and MPI
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Introduction To Distributed Systems
Database System Architectures
Next-generation Internet architecture
An Orchestration Language for Parallel Objects
Distributed Systems (15-440)
Distributed Systems and Algorithms
Presentation transcript:

Jeremy Martin Alex Tiskin Dynamic BSP: Towards a Flexible Approach to Parallel Computing over the Grid Jeremy Martin Alex Tiskin

Topics The promise of the Grid. The BSP programming model. How the grid differs from BSP. Introducing ‘Dynamic BSP’. Example: Strassen’s algorithm.

Affordable supercomputing-on-demand. The promise of the Grid WWW is a vast, distributed information resource. The Grid will harness the internet’s untapped processing power as well as its information content. E.g. ScreenSaver LifeSaver computational chemistry project for cancer research. Affordable supercomputing-on-demand.

The BSP programming model We need better programming models to utilise the Grid effectively for problems that are not “embarrassingly parallel”. BSP model (s,p,l,g) Set of identical processors, communicating asynchronously by remote memory transfer. Global barrier synchronisation ensures data consistency. Performance and scalability can be predicted prior to implementation. BSP is widely used to program supercomputers and NOWs. Processor 1 Processor 2 Processor 3 Processor 4 Time

How the grid differs from BSP Processor heterogeneity: Time dependent resource sharing. Architectural differences; Network heterogeneity: BSP performance is usually constrained by slowest communication link in the network. Reliability and availability. Processors may fail or be withdrawn by service provider.

Introducing ‘Dynamic BSP’ …Building on previous work (e.g. Vasilev 2003, Tiskin 1998, Sarmenta 1999, Nibhanupudi & Szymanski 1996) The essence of our approach is to use a task farm together with parallel slackness. A problem is partitioned onto N ‘virtual processors’, such that N >> p (the number of available physical processors). Virtual processors are scheduled to run on physical processors using a fault-tolerant task farm. Unlike standard BSP, there is no persistence of data at processor nodes between supersteps. A fault-tolerant, distributed virtual shared memory is implemented. Any existing BSP algorithm could be implemented using this approach, but the cost prediction would be different due to the additional communication. We also allow the dynamic creation of child processes during a superstep.

Standard BSP computation Processor 1 Processor 2 Processor 3 Processor 4 Processor 5 Processor 6 Time Dynamic BSP computation Time out Master processor VP1 VP2 VP3 VP4 VP6 VP5 Grid processor 1 VP1 VP4 VP6 Grid processor 2 VP2 VP5 VP3 Grid processor 3 VP3 Processor dies Time Not shown: distributed shared memory nodes, dynamic process spawning

Example: Strassen’s algorithm Strassen discovered an efficient method for calculating C = AB where A and B are square matrices of dimension n by dividing each matrix into four sub-matrices of size n/2, e.g. The recursive algorithm derived from this spawns eight matrix multiplication subcomputations Strassen was able to reduce this to seven, by careful use of matrix additions and subtractions.

McColl and Valiant developed a two-tiered, recursive, generalised BSP implementation of Strassen’s algorithm: Initial data distribution; Recursive generation of sub-computations; Recursion stops at a level where there are sufficient sub-computations to utilise all the processors; Redistribution of data; Calculation of sub-computations; Additions to complete recursive steps.

Dynamic BSP would provide a more elegant framework to implement this recursive algorithm. Master generates the first ‘root’ task, which requests the data server to do some data parallel work without communication. Child tasks are spawned recursively (all within the master). Once the number of spawned tasks is big enough, they are distributed across the workers, who download data from the data server, synchronise, compute block-products and write them back to the data server. Child tasks terminate and suspended parent tasks (at the master) resume by issuing data-parallel computation tasks to the data server.

Summary We have proposed a modified version of BSP for Grid usage which counteracts problems of resource heterogeneity, availability and reliability. This would seem much harder to achieve for a message-passing paradigm such as MPI. Dynamic BSP also provides a more elegant programming model for recursive algorithms. Now we need a Grid implementation. Note that this would also serve as a vehicle for embarrassingly-parallel problems – which could be implemented with a single ‘huge’ BSP superstep.