Parallel Computing Presented by Justin Reschke 9-14-04.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

SE-292 High Performance Computing
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Reference: Message Passing Fundamentals.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.
Parallel Programming Models and Paradigms
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Designing Parallel Programs David Rodriguez-Velazquez CS-6260 Spring-2009 Dr. Elise de Doncker.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
CS 470/570:Introduction to Parallel and Distributed Computing.
Introduction to Parallel Computing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Parallel Computing.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Structures and Algorithms in Parallel Computing Lecture 1.
Outline Why this subject? What is High Performance Computing?
Lecture 19 Beyond Low-level Parallelism. 2 © Wen-mei Hwu and S. J. Patel, 2002 ECE 412, University of Illinois Outline Models for exploiting large grained.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Parallel Programming pt.1
Introduction to Parallel Processing
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
CLASSIFICATION OF PARALLEL COMPUTERS
18-447: Computer Architecture Lecture 30B: Multiprocessors
Introduction to Parallel Computing
PARALLEL COMPUTING Submitted By : P. Nagalakshmi
Distributed and Parallel Processing
Introduction to parallel programming
buses, crossing switch, multistage network.
Parallel Programming By J. H. Wang May 2, 2017.
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Morgan Kaufmann Publishers
Parallel Programming in C with MPI and OpenMP
Multi-Processing in High Performance Computer Architecture:
Different Architectures
Symmetric Multiprocessing (SMP)
buses, crossing switch, multistage network.
AN INTRODUCTION ON PARALLEL PROCESSING
Part 2: Parallel Models (I)
Presentation transcript:

Parallel Computing Presented by Justin Reschke

Overview Concepts and Terminology Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel Algorithm Examples Conclusion

Concepts and Terminology: What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.

Concepts and Terminology: Why Use Parallel Computing? Saves time – wall clock time Cost savings Overcoming memory constraints It’s the future of computing

Concepts and Terminology: Flynn’s Classical Taxonomy Distinguishes multi-processor architecture by instruction and data SISD – Single Instruction, Single Data SIMD – Single Instruction, Multiple Data MISD – Multiple Instruction, Single Data MIMD – Multiple Instruction, Multiple Data

Flynn’s Classical Taxonomy: SISD Serial Only one instruction and data stream is acted on during any one clock cycle

Flynn’s Classical Taxonomy: SIMD All processing units execute the same instruction at any given clock cycle. Each processing unit operates on a different data element.

Flynn’s Classical Taxonomy: MISD Different instructions operated on a single data element. Very few practical uses for this type of classification. Example: Multiple cryptography algorithms attempting to crack a single coded message.

Flynn’s Classical Taxonomy: MIMD Can execute different instructions on different data elements. Most common type of parallel computer.

Concepts and Terminology: General Terminology Task – A logically discrete section of computational work Parallel Task – Task that can be executed by multiple processors safely Communications – Data exchange between parallel tasks Synchronization – The coordination of parallel tasks in real time

Concepts and Terminology: More Terminology Granularity – The ratio of computation to communication Coarse – High computation, low communication Coarse – High computation, low communication Fine – Low computation, high communication Fine – Low computation, high communication Parallel Overhead Synchronizations Synchronizations Data Communications Data Communications Overhead imposed by compilers, libraries, tools, operating systems, etc. Overhead imposed by compilers, libraries, tools, operating systems, etc.

Parallel Computer Memory Architectures: Shared Memory Architecture All processors access all memory as a single global address space. Data sharing is fast. Lack of scalability between memory and CPUs

Parallel Computer Memory Architectures: Distributed Memory Each processor has its own memory. Is scalable, no overhead for cache coherency. Programmer is responsible for many details of communication between processors.

Parallel Programming Models Exist as an abstraction above hardware and memory architectures Examples: Shared Memory Shared Memory Threads Threads Messaging Passing Messaging Passing Data Parallel Data Parallel

Parallel Programming Models: Shared Memory Model Appears to the user as a single shared memory, despite hardware implementations. Locks and semaphores may be used to control shared memory access. Program development can be simplified since there is no need to explicitly specify communication between tasks.

Parallel Programming Models: Threads Model A single process may have multiple, concurrent execution paths. Typically used with a shared memory architecture. Programmer is responsible for determining all parallelism.

Parallel Programming Models: Message Passing Model Tasks exchange data by sending and receiving messages. Typically used with distributed memory architectures. Data transfer requires cooperative operations to be performed by each process. Ex.- a send operation must have a receive operation. MPI (Message Passing Interface) is the interface standard for message passing.

Parallel Programming Models: Data Parallel Model Tasks performing the same operations on a set of data. Each task working on a separate piece of the set. Works well with either shared memory or distributed memory architectures.

Designing Parallel Programs: Automatic Parallelization Automatic Compiler analyzes code and identifies opportunities for parallelism Compiler analyzes code and identifies opportunities for parallelism Analysis includes attempting to compute whether or not the parallelism actually improves performance. Analysis includes attempting to compute whether or not the parallelism actually improves performance. Loops are the most frequent target for automatic parallelism. Loops are the most frequent target for automatic parallelism.

Designing Parallel Programs: Manual Parallelization Understand the problem A Parallelizable Problem: A Parallelizable Problem: Calculate the potential energy for each of several thousand independent conformations of a molecule. When done find the minimum energy conformation. A Non-Parallelizable Problem: A Non-Parallelizable Problem: The Fibonacci Series All calculations are dependent All calculations are dependent

Designing Parallel Programs: Domain Decomposition Each task handles a portion of the data set.

Designing Parallel Programs: Functional Decomposition Each task performs a function of the overall work

Parallel Algorithm Examples: Array Processing Serial Solution Perform a function on a 2D array. Perform a function on a 2D array. Single processor iterates through each element in the array Single processor iterates through each element in the array Possible Parallel Solution Assign each processor a partition of the array. Assign each processor a partition of the array. Each process iterates through its own partition. Each process iterates through its own partition.

Parallel Algorithm Examples: Odd-Even Transposition Sort Basic idea is bubble sort, but concurrently comparing odd indexed elements with an adjacent element, then even indexed elements. If there are n elements in an array and there are n/2 processors. The algorithm is effectively O(n)!

Parallel Algorithm Examples: Odd Even Transposition Sort Initial array: 6, 5, 4, 3, 2, 1, 0 6, 5, 4, 3, 2, 1, 0 6, 4, 5, 2, 3, 0, 1 4, 6, 2, 5, 0, 3, 1 4, 2, 6, 0, 5, 1, 3 2, 4, 0, 6, 1, 5, 3 2, 0, 4, 1, 6, 3, 5 0, 2, 1, 4, 3, 6, 5 0, 1, 2, 3, 4, 5, 6 Worst case scenario. Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1

Other Parallelizable Problems The n-body problem Floyd’s Algorithm Serial: O(n^3), Parallel: O(n log p) Serial: O(n^3), Parallel: O(n log p) Game Trees Divide and Conquer Algorithms

Conclusion Parallel computing is fast. There are many different approaches and models of parallel computing. Parallel computing is the future of computing.

References A Library of Parallel Algorithms, www- 2.cs.cmu.edu/~scandal/nesl/algorithms.html Internet Parallel Computing Archive, wotug.ukc.ac.uk/parallel Introduction to Parallel Computing, Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, McGraw Hill Higher Education, 2003 The New Turing Omnibus, A. K. Dewdney, Henry Holt and Company, 1993