Why Parallel/Distributed Computing Sushil K. Prasad

Slides:

Advertisements

Similar presentations

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

1 Computational models of the physical world Cortical bone Trabecular bone.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Parallel (and Distributed) Computing Overview Chapter 1 Motivation and History.

Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Parallel Computers Chapter 1

Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.

Introduction CS 524 – High-Performance Computing.

Parallel (and Distributed) Computing Overview

11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.

Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.

Chapter 1 An Overview of Personal Computers

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.

Lecture 2 : Introduction to Multicore Computing

Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.

1 Chapter 04 Authors: John Hennessy & David Patterson.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Lecture 1 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Slides Courtesy Michael J. Quinn Parallel Programming in C.

Multi-core architectures. Single-core computer Single-core CPU chip.

Multi-Core Architectures

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Parallel Computing.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

Outline Why this subject? What is High Performance Computing?

Lecture 3: Computer Architectures

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Overview Parallel Processing Pipelining

CMSC 611: Advanced Computer Architecture

Parallel Processing - introduction

Parallel Programming By J. H. Wang May 2, 2017.

The University of Adelaide, School of Computer Science

Constructing a system with multiple computers or processors

Morgan Kaufmann Publishers

What is Parallel and Distributed computing?

Parallel (and Distributed) Computing Overview

CSCE569 Parallel Computing

Constructing a system with multiple computers or processors

Overview Parallel Processing Pipelining

Constructing a system with multiple computers or processors

Chapter 1 Introduction.

Chapter 4 Multiprocessors

Types of Parallel Computers

Presentation transcript:

Why Parallel/Distributed Computing Sushil K. Prasad

. What is Parallel and Distributed computing?  Solving a single problem faster using multiple CPUs  E.g. Matrix Multiplication C = A X B  Parallel = Shared Memory among all CPUs  Distributed = Local Memory/CPU  Common Issues: Partition, Synchronization, Dependencies, load balancing

. Eniac (350 op/s) (U.S. Army photo)

. ASCI White (10 teraops/sec 2006) Mega flops = 10^6 flops = 2^20 Giga = 10^9 = billion = 2^30 Tera = 10^12 = trillion = 2^40 Peta = 10^15 = quadrillion = 2^50 Exa = 10^18 = quintillion = 2^60

. 65 Years of Speed Increases ENIAC 350 flops 1946 Today  8 Peta flops = 10^15 flops  K computer

. Why Parallel and Distributed Computing? Grand Challenge Problems Grand Challenge Problems  Weather Forecasting; Global Warming  Materials Design – Superconducting material at room temperature; nano- devices; spaceships.  Organ Modeling; Drug Discovery

. Why Parallel and Distributed Computing? Physical Limitations of Circuits Physical Limitations of Circuits  Heat and light effect  Superconducting material to counter heat effect  Speed of light effect – no solution!

. Microprocessor Revolution Micros Minis Mainframes Speed (log scale) Time Supercomputers

. VLSI – Effect of Integration VLSI – Effect of Integration  1 M transistor enough for full functionality - Dec’s Alpha (90’s)  Rest must go into multiple CPUs/chip Cost – Multitudes of average CPUs give better FLPOS/$ compared to traditional supercomputers Cost – Multitudes of average CPUs give better FLPOS/$ compared to traditional supercomputers Why Parallel and Distributed Computing?

. Modern Parallel Computers Caltech’s Cosmic Cube (Seitz and Fox) Caltech’s Cosmic Cube (Seitz and Fox) Commercial copy-cats Commercial copy-cats  nCUBE Corporation (512 CPUs)  Intel’s Supercomputer Systems  iPSC1, iPSC2, Intel Paragon (512 CPUs) Thinking Machines Corporation Thinking Machines Corporation  CM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMD  CM5 – fat-tree interconnect - MIMD  Tiahe-1a 4.7 petaflops, 14K Xeon X5670 and 7,168 Nvidia Tesla M2050 XeonNvidia TeslaXeonNvidia Tesla  68 K 2.0GHz 8-core CPUs 548,352 cores;  K-computer 8 petaflops (10^15 FLOPS), 2011, 68 K 2.0GHz 8-core CPUs 548,352 cores;

. Everyday Reasons Everyday Reasons  Available local networked workstations and Grid resources should be utilized  Solve compute-intensive problems faster  Make infeasible problems feasible  Reduce design time  Leverage of large combined memory  Solve larger problems in same amount of time  Improve answer’s precision  Reduce design time  Gain competitive advantage  Exploit commodity multi-core and GPU chips  Find Jobs! Why Parallel and Distributed Computing?

. Why Shared Memory programming? Easier conceptual environment Easier conceptual environment Programmers typically familiar with concurrent threads and processes sharing address space Programmers typically familiar with concurrent threads and processes sharing address space CPUs within multi-core chips share memory CPUs within multi-core chips share memory OpenMP an application programming interface (API) for shared-memory systems OpenMP an application programming interface (API) for shared-memory systems  Supports higher performance parallel programming of symmetrical multiprocessors Java threads Java threads MPI for Distributed Memory Programming MPI for Distributed Memory Programming

. Seeking Concurrency Data dependence graphs Data dependence graphs Data parallelism Data parallelism Functional parallelism Functional parallelism Pipelining Pipelining

. Data Dependence Graph Directed graph Directed graph Vertices = tasks Vertices = tasks Edges = dependencies Edges = dependencies

. Data Parallelism Independent tasks apply same operation to different elements of a data set Independent tasks apply same operation to different elements of a data set Okay to perform operations concurrently Okay to perform operations concurrently Speedup: potentially p-fold, p #processors Speedup: potentially p-fold, p #processors for i  0 to 99 do a[i]  b[i] + c[i] endfor

. Functional Parallelism Independent tasks apply different operations to different data elements Independent tasks apply different operations to different data elements First and second statements First and second statements Third and fourth statements Third and fourth statements Speedup: Limited by amount of concurrent sub- tasks Speedup: Limited by amount of concurrent sub- tasks a  2 b  3 m  (a + b) / 2 s  (a 2 + b 2 ) / 2 v  s - m 2

. Pipelining Divide a process into stages Divide a process into stages Produce several items simultaneously Produce several items simultaneously Speedup: Limited by amount of concurrent sub- tasks = #of stages in the pipeline Speedup: Limited by amount of concurrent sub- tasks = #of stages in the pipeline