Download presentation
Presentation is loading. Please wait.
Published byStanley Chase Modified over 9 years ago
1
Bulk Synchronous Parallel Processing Model Jamie Perkins
2
Overview Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines
3
A Bridge for Parallel Computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation
4
Goals for BSP Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way
5
BSP Design Three Components Node Processor and Local Memory Router or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware Three Components Node Processor and Local Memory Router or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware
6
BSP Design Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory
7
Illustration of BSP Computer Communication Network P M P M P M Node Barrier http:// peace.snu.ac.kr/courses/parallelprocessing /
8
BSP Program Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization
9
Strategies for programming on BSP Balance the computation between processes Balance the communication between processes Minimize the number of supersteps Balance the computation between processes Balance the communication between processes Minimize the number of supersteps
10
BSP Program Superstep 1 Superstep 2 Barrier P1P2P3P4 Computation Communication http:// peace.snu.ac.kr/courses/parallelprocessing /
11
Advantages of BSP Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms
12
Cost Function g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received w i + gh i + L = execution time for the superstep i g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received w i + gh i + L = execution time for the superstep i
13
Languages & Machines BSP ++ C C++ Fortran JBSP Opal BSP ++ C C++ Fortran JBSP Opal IBM SP1 SGI Power Challenge (Shared Memory) Cray T3D Hitachi SR2001 TCP/IP
14
Thank You Any Questions
15
References http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994 United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), 103-111. http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994 United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), 103-111.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.