CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page 514-526.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
SE-292 High Performance Computing
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003.
Today’s topics Single processors and the Memory Hierarchy
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Parallel Computing Platforms
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.
Introduction to Parallel Processing Ch. 12, Pg
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Computer Architecture and Interconnect 1b.1.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Principles of Linear Pipelining
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Outline Why this subject? What is High Performance Computing?
Super computers Parallel Processing
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
An Overview of Parallel Processing
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
Midterm 3 Revision and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Overview Parallel Processing Pipelining
Parallel Architecture
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Multiprocessor Systems
buses, crossing switch, multistage network.
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Parallel and Multiprocessor Architectures
Pipelining and Vector Processing
Data Structures and Algorithms in Parallel Computing
Parallel Architectures Based on Parallel Computing, M. J. Quinn
Chapter 17 Parallel Processing
Outline Interconnection networks Processor arrays Multiprocessors
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
AN INTRODUCTION ON PARALLEL PROCESSING
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Advanced Computer and Parallel Processing
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Advanced Computer and Parallel Processing
Presentation transcript:

CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page

What is Parallel Processing? Parallel processing is another method used to improve performance in a computer system, when a system processes two different instructions simultaneously, it is performing parallel processing

Topics Include Parallelism in Uniprocessor Systems Parallelism in Multiprocessor Systems –Flynn’s Classification –System Topologies –MIMD System Architectures

Parallelism in Uniprocessor Systems A uniprocessor (one CPU) system can perform two or more tasks simultaneously. The tasks are not related to each other. So, a system that processes two different instructions simultaneouly could be condsidered to perform parallel processing

Example of Uniprocessor System Recall in Chapter 11, the instruction pipeline is similar to a manufacturing assembly line. If the assembly line is partitioned into four stages: The first stage receives some parts, performs its assembly task, and passes the results to the second stage; the second stage takes the partially assembled product from the first stage, performs its task, and passes its work to the third stage; the third stage does its work, passing the results to the last stage, which completes the task and outputs its results. As the first piece moves from the first stage to the second stage, a new set of parts for a new piece enters the first stage. Ultimately, every staged processes a piece simultaneously. This is how time is saved and this is an example of parallelism in uniprocessor

Reconfigurable pipeline Another example of parallelism in a uniprocessor system. In a reconfigure arithmetic pipeline, each stage has a multiplexer at its input. The multiplexer may pass input data, or the data output from other stages, to the stage inputs.

Vector Arithmetic unit Vector Arithmetic unit is used to perform different arithmetic operations in parallel. A vector arithmetic unit contains multiple functional units. Some perform addition, others subtraction, and others perform different functions.

A Vectored Arithmetic unit To add two numbers, the control unit routes these values to an adder unit. For the operations A B + C, and D E - F the CPU would route B and C to an adder and send E and F to a subtracter, this allows the CPU to execute both instructions simultaneouly.

Parallelism in Multiprocessor Systems Parallel processing systems achieve parallelism by having more than one processor performing tasks simultaneously. Since multiprocessor systems are more complicated than uniprocessor systems, there are many different ways to organize the processors and memory, so a researcher, Michael J. Flynn proposed a classification based on the flow of instructions and data within the computer called Flynn’s classification

Flynn’s Classification It is based on instruction and data processing. A computer is classified by whether it processes a single instruction at a time or multiple instructions simultaneously, and whether it operates on one or multiple data sets.

Categories of Flynn’s Classification SISD: Single instruction with single data SIMD: Single instruction with multiple data MISD: Multiple instruction with single data MIMD: Multiple instruction with multiple data

Single Instruction Single Data (SISD) SISD machines executes a single instruction on individual data values using a single processor. Even if the processor incorporates internal parallelism, such as an instruction pipeline, the computer would still be classified as SISD

(SIMD) Single Instruction Multiple Data As its name implies, an SIMD machine executes a single instruction on multiple data values simultaneously using many processors. Since there is only one instruction, each processor does not have to fetch and decode each instruction. Instead a single control unit handles this task for all processors within the SIMD computer

(MISD) Multiple Instruction Single Data This classification is not practical to implement. So, no significant MISD computers have ever been built. It is included for completeness of the classification.

(MIMD) Multiple Instruction Multiple Data Systems referred to as multiprocessors or multicomputers are usually MIMD. It may execute multiple instructions simultaneously, unlike SIMD. So, each processor must include its own control unit. MIMD machines are well suited for general purpose use.

System Topologies The topology of a multiprocessor system refers to the pattern of connections between its processors. Various factors, typically involving a cost-performance tradeoff, determine which topology a computer designer will select for a multiprocessor system.

Topology Although topologies differ greatly, standard metrics- diameter and bandwidth-are used to quantify them Diameter- the maximum distance between two processors in the computer system. Bandwidth- the capacity of a communications link multiplied by the number of such links in the system. Bisection bandwidth-represents the maximum data transfer that could occur at the bottleneck in the topology.

Examples of Topology (pattern of connections between processors) Shared Bus Topology Ring Topology Tree Topology Mesh Topology Hypercube Topology Completely connected Topology

Shared Bus Topology In this topology, processors communicate with each other exclusively through this bus. However, the bus can only handle only one data transmission at a time. In most shared busses, processors directly communicate with their own local memory.

Ring Topology The ring topology uses direct connections between processors instead of a shared bus. This allows all communication links to be active simultaneously. Data may have to travel through several processors to reach its destination

Tree Topology Like the ring, it uses direct connections between processors; each having three connections. There is only one unique path between any pair of processors

Mesh Topology In the mesh topology, every processor connects to the processors above and below it, and to its right and left.

Hypercube Topology The hypercube is a multidimensional mesh topology. Each processor connects to all other processors whose binary values differ by one bit.

Completely connected Topology In the most extreme connection scheme, the processors are completely connected. Every processor has (n-1) connections, one to each of the other processors. This increases the complexity of the processors as the system grows, but offers maximum communication capabilities.

MIMD System Architectures The architecture of an MIMD system, as opposed to its topology, refers to its connections with respect to system memory. There are two types of architectures: Uniform memory access(UMA) Nonuniform memory access(NUMA)

(UMA) Uniform Memory Access The UMA gives all CPUs equal (uniform) access to all locations in shared memory. They interact with shared memory by some communications mechanism like a simple bus or a complex multistage interconnection network

(NUMA) Nonuniform memory access In contrast to UMA architectures, NUMA do not allow uniform access to all shared memory locations, this architecture still allows all processors to access all shared memory locations. However, each processor can access the memory module closest to it, its local shared memory, more quickly than the other modules, so the memory access times are nonuniform.

The End