Download presentation
Presentation is loading. Please wait.
Published byHerbert Gilmore Modified over 8 years ago
1
Why Parallel/Distributed Computing Sushil K. Prasad sprasad@gsu.edu
2
. What is Parallel and Distributed computing? Solving a single problem faster using multiple CPUs E.g. Matrix Multiplication C = A X B Parallel = Shared Memory among all CPUs Distributed = Local Memory/CPU Common Issues: Partition, Synchronization, Dependencies, load balancing
3
. Eniac (350 op/s) 1946 - (U.S. Army photo)
4
. ASCI White (10 teraops/sec 2006) Mega flops = 10^6 flops = 2^20 Giga = 10^9 = billion = 2^30 Tera = 10^12 = trillion = 2^40 Peta = 10^15 = quadrillion = 2^50 Exa = 10^18 = quintillion = 2^60
5
. 65 Years of Speed Increases ENIAC 350 flops 1946 Today - 2011 8 Peta flops = 10^15 flops K computer
6
. Why Parallel and Distributed Computing? Grand Challenge Problems Grand Challenge Problems Weather Forecasting; Global Warming Materials Design – Superconducting material at room temperature; nano- devices; spaceships. Organ Modeling; Drug Discovery
7
. Why Parallel and Distributed Computing? Physical Limitations of Circuits Physical Limitations of Circuits Heat and light effect Superconducting material to counter heat effect Speed of light effect – no solution!
8
. Microprocessor Revolution Micros Minis Mainframes Speed (log scale) Time Supercomputers
9
. VLSI – Effect of Integration VLSI – Effect of Integration 1 M transistor enough for full functionality - Dec’s Alpha (90’s) Rest must go into multiple CPUs/chip Cost – Multitudes of average CPUs give better FLPOS/$ compared to traditional supercomputers Cost – Multitudes of average CPUs give better FLPOS/$ compared to traditional supercomputers Why Parallel and Distributed Computing?
10
. Modern Parallel Computers Caltech’s Cosmic Cube (Seitz and Fox) Caltech’s Cosmic Cube (Seitz and Fox) Commercial copy-cats Commercial copy-cats nCUBE Corporation (512 CPUs) Intel’s Supercomputer Systems iPSC1, iPSC2, Intel Paragon (512 CPUs) Thinking Machines Corporation Thinking Machines Corporation CM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMD CM5 – fat-tree interconnect - MIMD Tiahe-1a 4.7 petaflops, 14K Xeon X5670 and 7,168 Nvidia Tesla M2050 XeonNvidia TeslaXeonNvidia Tesla 68 K 2.0GHz 8-core CPUs 548,352 cores; K-computer 8 petaflops (10^15 FLOPS), 2011, 68 K 2.0GHz 8-core CPUs 548,352 cores;
11
. Everyday Reasons Everyday Reasons Available local networked workstations and Grid resources should be utilized Solve compute-intensive problems faster Make infeasible problems feasible Reduce design time Leverage of large combined memory Solve larger problems in same amount of time Improve answer’s precision Reduce design time Gain competitive advantage Exploit commodity multi-core and GPU chips Find Jobs! Why Parallel and Distributed Computing?
12
. Why Shared Memory programming? Easier conceptual environment Easier conceptual environment Programmers typically familiar with concurrent threads and processes sharing address space Programmers typically familiar with concurrent threads and processes sharing address space CPUs within multi-core chips share memory CPUs within multi-core chips share memory OpenMP an application programming interface (API) for shared-memory systems OpenMP an application programming interface (API) for shared-memory systems Supports higher performance parallel programming of symmetrical multiprocessors Java threads Java threads MPI for Distributed Memory Programming MPI for Distributed Memory Programming
13
. Seeking Concurrency Data dependence graphs Data dependence graphs Data parallelism Data parallelism Functional parallelism Functional parallelism Pipelining Pipelining
14
. Data Dependence Graph Directed graph Directed graph Vertices = tasks Vertices = tasks Edges = dependencies Edges = dependencies
15
. Data Parallelism Independent tasks apply same operation to different elements of a data set Independent tasks apply same operation to different elements of a data set Okay to perform operations concurrently Okay to perform operations concurrently Speedup: potentially p-fold, p #processors Speedup: potentially p-fold, p #processors for i 0 to 99 do a[i] b[i] + c[i] endfor
16
. Functional Parallelism Independent tasks apply different operations to different data elements Independent tasks apply different operations to different data elements First and second statements First and second statements Third and fourth statements Third and fourth statements Speedup: Limited by amount of concurrent sub- tasks Speedup: Limited by amount of concurrent sub- tasks a 2 b 3 m (a + b) / 2 s (a 2 + b 2 ) / 2 v s - m 2
17
. Pipelining Divide a process into stages Divide a process into stages Produce several items simultaneously Produce several items simultaneously Speedup: Limited by amount of concurrent sub- tasks = #of stages in the pipeline Speedup: Limited by amount of concurrent sub- tasks = #of stages in the pipeline
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.