Jeremy Martin Alex Tiskin

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.

Spark: Cluster Computing with Working Sets

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Distributed components

Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.

Reference: Message Passing Fundamentals.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Parallel Programming Models and Paradigms

Communication [Lower] Bounds for Heterogeneous Architectures Julian Bui.

Strategies for Implementing Dynamic Load Sharing.

©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Heterogeneous Parallelization for RNA Structure Comparison Eric Snow, Eric Aubanel, and Patricia Evans University of New Brunswick Faculty of Computer.

Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.

Bulk Synchronous Parallel Processing Model Jamie Perkins.

W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.

Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.

Example: Sorting on Distributed Computing Environment Apr 20,

Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

Data Structures and Algorithms in Parallel Computing Lecture 4.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.

Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.

CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.

TensorFlow– A system for large-scale machine learning

Prepared by Oussama Jebbar

Overview Parallel Processing Pipelining

Clouds , Grids and Clusters

Introduction to Distributed Platforms

Alternatives to Mobile Agents

Pagerank and Betweenness centrality on Big Taxi Trajectory Graph

For Massively Parallel Computation The Chaotic State of the Art

Pattern Parallel Programming

PREGEL Data Management in the Cloud

Grid Computing.

CSC 480 Software Engineering

Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang

Physical Architecture Layer Design

University of Technology

Client-Server Interaction

湖南大学-信息科学与工程学院-计算机与科学系

Threads, SMP, and Microkernels

Chapter 4: Threads.

Chapter 17: Database System Architectures

Summary Background Introduction in algorithms and applications

Outline Midterm results summary Distributed file systems – continued

CS110: Discussion about Spark

Scalable Parallel Interoperable Data Analytics Library

2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009

Lecture 4- Threads, SMP, and Microkernels

Hybrid Programming with OpenMP and MPI

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Introduction To Distributed Systems

Database System Architectures

Next-generation Internet architecture

An Orchestration Language for Parallel Objects

Distributed Systems (15-440)

Distributed Systems and Algorithms

Presentation transcript:

Jeremy Martin Alex Tiskin Dynamic BSP: Towards a Flexible Approach to Parallel Computing over the Grid Jeremy Martin Alex Tiskin

Topics The promise of the Grid. The BSP programming model. How the grid differs from BSP. Introducing ‘Dynamic BSP’. Example: Strassen’s algorithm.

Affordable supercomputing-on-demand. The promise of the Grid WWW is a vast, distributed information resource. The Grid will harness the internet’s untapped processing power as well as its information content. E.g. ScreenSaver LifeSaver computational chemistry project for cancer research. Affordable supercomputing-on-demand.

The BSP programming model We need better programming models to utilise the Grid effectively for problems that are not “embarrassingly parallel”. BSP model (s,p,l,g) Set of identical processors, communicating asynchronously by remote memory transfer. Global barrier synchronisation ensures data consistency. Performance and scalability can be predicted prior to implementation. BSP is widely used to program supercomputers and NOWs. Processor 1 Processor 2 Processor 3 Processor 4 Time

How the grid differs from BSP Processor heterogeneity: Time dependent resource sharing. Architectural differences; Network heterogeneity: BSP performance is usually constrained by slowest communication link in the network. Reliability and availability. Processors may fail or be withdrawn by service provider.

Introducing ‘Dynamic BSP’ …Building on previous work (e.g. Vasilev 2003, Tiskin 1998, Sarmenta 1999, Nibhanupudi & Szymanski 1996) The essence of our approach is to use a task farm together with parallel slackness. A problem is partitioned onto N ‘virtual processors’, such that N >> p (the number of available physical processors). Virtual processors are scheduled to run on physical processors using a fault-tolerant task farm. Unlike standard BSP, there is no persistence of data at processor nodes between supersteps. A fault-tolerant, distributed virtual shared memory is implemented. Any existing BSP algorithm could be implemented using this approach, but the cost prediction would be different due to the additional communication. We also allow the dynamic creation of child processes during a superstep.

Standard BSP computation Processor 1 Processor 2 Processor 3 Processor 4 Processor 5 Processor 6 Time Dynamic BSP computation Time out Master processor VP1 VP2 VP3 VP4 VP6 VP5 Grid processor 1 VP1 VP4 VP6 Grid processor 2 VP2 VP5 VP3 Grid processor 3 VP3 Processor dies Time Not shown: distributed shared memory nodes, dynamic process spawning

Example: Strassen’s algorithm Strassen discovered an efficient method for calculating C = AB where A and B are square matrices of dimension n by dividing each matrix into four sub-matrices of size n/2, e.g. The recursive algorithm derived from this spawns eight matrix multiplication subcomputations Strassen was able to reduce this to seven, by careful use of matrix additions and subtractions.

McColl and Valiant developed a two-tiered, recursive, generalised BSP implementation of Strassen’s algorithm: Initial data distribution; Recursive generation of sub-computations; Recursion stops at a level where there are sufficient sub-computations to utilise all the processors; Redistribution of data; Calculation of sub-computations; Additions to complete recursive steps.

Dynamic BSP would provide a more elegant framework to implement this recursive algorithm. Master generates the first ‘root’ task, which requests the data server to do some data parallel work without communication. Child tasks are spawned recursively (all within the master). Once the number of spawned tasks is big enough, they are distributed across the workers, who download data from the data server, synchronise, compute block-products and write them back to the data server. Child tasks terminate and suspended parent tasks (at the master) resume by issuing data-parallel computation tasks to the data server.

Summary We have proposed a modified version of BSP for Grid usage which counteracts problems of resource heterogeneity, availability and reliability. This would seem much harder to achieve for a message-passing paradigm such as MPI. Dynamic BSP also provides a more elegant programming model for recursive algorithms. Now we need a Grid implementation. Note that this would also serve as a vehicle for embarrassingly-parallel problems – which could be implemented with a single ‘huge’ BSP superstep.