Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) Slides from: Jimmy Lin The iSchool University of Maryland
Web-Scale Problems? How to approach problems in: – Biocomputing – Nanocomputing – Quantum computing –…–… It all boils down to… – Divide-and-conquer – Throwing more hardware at the problem – Clusters, Grids, Clouds Simple to understand… a lifetime to master…
Cluster computing Compute nodes located locally, connected by fast LAN Tightly coupled – a few nodes to large number Group of linked computers working closely together, in parallel Sometimes viewed as forming single computer Improve performance and availability
Clusters Nodes have computational devices and local storage, networked disk storage allows for sharing Static system, centralized control Data Centers Multiple clusters at a datacenter Google cluster 1000s servers – Racks servers
Grid Computing Loosely coupled, heterogeneous, geographically dispersed Although can be dedicated to single application, typically for variety of purposes Can be formed from computing resources belonging to multiple organizations Thousands of users worldwide – Compute grids, data grids – Access to resources not available locally
Grids Resources in a grid geographically dispersed – Loosely coupled, heterogeneous Sharing of data and resources in dynamic, multi- institutional virtual organizations – Access to specialized resources – Academia Allows users to connect/disconnect, e.g. electrical grid, plug in your device Middleware needed to use grid Can connect a data center (cluster) to a grid
EU Grid (European Grid Infrastructure) SE – storage element, CE – computing element LHC Computing Grid;
Grid Precursor to Cloud computing but no virtualization “Virtualization distinguishes cloud from grid” Zissis
Divide and Conquer “Work” w1w1 w2w2 w3w3 r1r1 r2r2 r3r3 “Result” “worker” Partition Combine
Different Workers Different threads in the same core Different cores in the same CPU Different CPUs in a multi-processor system Different machines in a distributed system
Choices, Choices, Choices Commodity vs. “exotic” hardware Number of machines vs. processor vs. cores Bandwidth of memory vs. disk vs. network Different programming models
Flynn’s Taxonomy Instructions Single (SI)Multiple (MI) Data Multiple (MD) SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Multi-threaded Programming Single (SD)
uniprocessor Type of parallel computing architecture SupercomputersShared or distributed memory Cluster Grid
SISD DDDDDDD Processor Instructions
SIMD D0D0 Processor Instructions D0D0 D0D0 D0D0 D0D0 D0D0 D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D0D0
MIMD DDDDDDD Processor Instructions DDDDDDD Processor Instructions
Memory Typology: Shared Memory Processor
Memory Typology: Distributed MemoryProcessorMemoryProcessor MemoryProcessorMemoryProcessor Network
Memory Typology: Hybrid Memory Processor Network Processor Memory Processor Memory Processor Memory Processor
Parallelization Problems How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die? What is the common theme of all of these problems?
General Theme? Parallelization problems arise from: – Communication between workers – Access to shared resources (e.g., data) Thus, we need a synchronization system! This is tricky: – Finding bugs is hard – Solving bugs is even harder
Managing Multiple Workers Difficult because – (Often) don’t know the order in which workers run – (Often) don’t know where the workers are running – (Often) don’t know when workers interrupt each other Thus, we need: – Semaphores (lock, unlock) – Conditional variables (wait, notify, broadcast) – Barriers Still, lots of problems: – Deadlock, livelock, race conditions,... Moral of the story: be careful! – Even trickier if the workers are on different machines
Patterns for Parallelism Parallel computing has been around for decades Here are some “design patterns” …
Master/Slaves slaves master GFS
Producer/Consumer PC P/C Bounded buffer GFS
Work Queues C P P P C C shared queue WWWWW Multiple producers/consumers
Rubber Meets Road From patterns to implementation: – pthreads, OpenMP for multi-threaded programming – MPI (message passing interface) for clustering computing –…–… The reality: – Lots of one-off solutions, custom code – Write your own dedicated library, then program with it – Burden on the programmer to explicitly manage everything MapReduce to the rescue!
Parallelization MapReduce: the “back-end” of cloud computing – Batch-oriented processing of large datasets – Highly parallel – solve problem in parallel, reduce e.g. count words in document Hadoop MapReduce: a programming model and software framework – for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Ajax: the “front-end” of cloud computing – Highly-interactive Web-based applications