Slides:

Advertisements

Similar presentations

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Today’s topics Single processors and the Memory Hierarchy

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Computer Architecture and Data Manipulation Chapter 3.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Introduction to MIMD architectures

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Chapter 17 Parallel Processing.

Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.

Parallel Processing Group Members: PJ Kulick Jon Robb Brian Tobin.

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.

Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Introduction to Parallel Processing Ch. 12, Pg

KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.

Computer System Architectures Computer System Software

Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)

Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Parallel Computer Architecture and Interconnect 1b.1.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.

Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Parallel Computing.

Outline Why this subject? What is High Performance Computing?

Lecture 3: Computer Architectures

Parallel Processing Presented by: Wanki Ho CS147, Section 1.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

An Overview of Parallel Processing

Midterm 3 Revision and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

These slides are based on the book:

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.

Overview Parallel Processing Pipelining

Parallel Architecture

CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.

Multiprocessor Systems

buses, crossing switch, multistage network.

Parallel Processing - introduction

Course Outline Introduction in algorithms and applications

CS 147 – Parallel Processing

Flynn’s Classification Of Computer Architectures

Chapter 17 Parallel Processing

Symmetric Multiprocessing (SMP)

Outline Interconnection networks Processor arrays Multiprocessors

buses, crossing switch, multistage network.

AN INTRODUCTION ON PARALLEL PROCESSING

Advanced Computer and Parallel Processing

Chapter 4 Multiprocessors

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Advanced Computer and Parallel Processing

Presentation transcript:

INTRODUCTION TO PARALLEL PROCESSING CHAPTER - 12 SHOBHANA RAJAN 7/16/2001

The Primary Objective of any Computer Design: To correctly Fetch Decode and Execute every instruction in its instruction set , producing correct results. Beyond this , computer architects may seek to maximize system performance.

What is Parallel Processing? A method used to improve performance in a computer system. A Uniprocessor system can achieve parallelism , but most parallel processing systems are multiprocessor systems.

What is Parallel Processing? Parallelism is two or more things that happen at the same time. A system that processes two different instructions simultaneously could be considered to perform parallel processing,but a system that performs different operations on the same instruction would not.

What is Parallel Processing? Example : A relatively simple CPU includes the following RTL statement as part of its instruction FETCH routine: FETCH2: DR M , PC PC + 1 Two micro operations:copying the contents of M to DR and the contents of PR + 1 to PR occur during this state , but both are used to process the same instruction.Therefore,this is not parallel processing.

Parallelism in Uniprocessor Systems: A Uniprocessor System may achieve parallelism in any of the following ways: Instruction Pipelines: The processor executes one instruction per clock cycle. Each instruction requires three cycles to be fetched decoded and executed. Example : IBM 801 , using four instruction pipelines. Fetch Instructions Decode Instructions and Select Registers Execute Instruction Store Result

Parallelism in Uniprocessor Systems: Reconfigurable Arithmetic Pipelines: Each stage has a Multiplexer at its input. The Control Unit of the CPU sets the select signals of the Multiplexer to control data flow. 1 MUX 2 3 S1 S0 * Latch 1 MUX 2 3 S1 S0 - Latch 1 MUX 2 3 S1 S0 + Latch 1 MUX 2 3 S1 S0 To Memory and Registers X X 1 1 1

Parallelism in Uniprocessor Systems: Vectored Arithmetic Unit: Arithmetic Pipelines cannot perform different operations simultaneously. A Vectored Arithmetic Unit contains multiple functional units to perform different operations. . . Data Input Connections + Data Input Connections - Data Inputs * / . .

Parallelism in Uniprocessor Systems: Vectored Arithmetic Unit: Problem: Getting all the data to the Vectored Arithmetic Unit. The CPU can address this issue by using Multiple Buses or Very Wide Data Buses. The system can Improve Performance by allowing multiple , simultaneous memory access. by getting the memory chips to handle multiple transfers simultaneously.

Multiport Memory: Is designed for the purpose of handling multiple transfers within the memory itself. A Multiport memory chip has two sets of address , data and control pins for simultaneous data transfer. The CPU and DMA Controller can transfer data concurrently. A system with more than one CPU can handle simultaneous requests from two different processors.

Multiport Memory: Advantage: Disadvantage: Can handle two requests to read data from the same location at the same time. Disadvantage: Multiport Memory cannot process two simultaneous requests to write data to the same memory location or to read from and write to the same memory location.

Organization of Multiprocessor Systems: There are many ways to organize the processors and memory within a multiprocessor system,and different ways to classify these systems,including : Flynn’s Classification System Topologies and MIMD System Architectures.

Flynn’s Classification It is a commonly accepted taxonomy of computer organization proposed by researcher Michael J. Flynn. This classification is based on the flow of instructions and data processing within the computer.

Flynn’s Classification (contd.) A computer is classified by whether it processes a single instruction or multiple instructions at a time , and whether it operates on one or multiple data sets. The four categories are as follows: 1) SISD : Single Instruction Single Data 2) SIMD : Single Instruction Multiple Data 3) MISD : Multiple Instruction Single Data 4) MIMD: Multiple Instruction Multiple Data

Flynn’s Classification (contd.) SISD Machines: SISD consist of a single CPU executing individual instructions on individual data. This is the Classic von Neumann architecture studied in this text. MISD Machines: The MISD Classification is not practical to implement. No significant MISD Machines have been built till today. SIMD Machines: SIMD Machines execute a single instruction on multiple data values simultaneously using many processors. SIMD Machines have been built and can serve a practical purpose.

A Generic SISD Organization CPU Memory Subsystem Address Bus Data Bus Control Bus I/O Device I/O Device ... I/O Subsystem

A Generic SIMD Organization Communication Network Main Memory Control Unit Processor Memory Processor Memory ... ... Processor Memory

MIMD Machines Referred to as Multiprocessors or Multicomputers. As these machines have multiple processors , each processor (CPU) includes its own Control Unit. The processors can be assigned to parts of the same task or to completely separate tasks.

Topology of a Multiprocessor System: Some definitions: Topology: The Topology of a Multiprocessor System refers to the pattern of connections between its processors. Diameter: The Diameter is the maximum distance between two processors in the computer system. It is the maximum distance a message must pass through to reach its final destination . Bandwidth: It is the capacity of a communications link multiplied by the number of such links in the system. It is the best-case scenario achieved when every link is active simultaneously. It almost never occurs.

Topology of a Multiprocessor System: Some definitions: (contd.) Bisection Bandwidth: When a network is divided into two halves with equal number of processors (or within one if the number of processors is odd),the total bandwidth of the links connecting the two halves is the Bisection Bandwidth. It is close to a worst-case scenario. It represents the maximum data transfer that could occur at the bottleneck in the topology.

Types of System Topologies: Shared Bus Topology. Ring Topology. Tree Topology. Mesh Topology. Hypercube. Completely Connected.

Shared Bus Topology: Processors communicate with each other exclusively via this bus. The bus can only handle one data transmission at a time. Its diameter is 1 , total bandwidth is 1*l and bisection bandwidth is also 1*l (where l is the bandwidth).

SHARED BUS TOPOLOGY M M M ... P P P Shared Bus Global Memory

Ring Topology: Processors communicate with each other directly instead of a bus. All communication links are active simultaneously. A ring with n processors has diameter of |_n/2_| , total bandwidth of n*l and bisection bandwidth is 2*l (where l is the bandwidth).

RING TOPOLOGY P P P P P P

Tree Topology: Processors communicate with each other directly like in ring topology. Each processor has three connections. It has an advantageously low diameter of 2*|_log n_| , total bandwidth of (n-1)*l and bisection bandwidth of 1*l (where l is the bandwidth).

TREE TOPOLOGY P P P P P P P

Mesh Topology: Every processor connects to the processors above and below it , and to its left and right. It has a diameter of 2n , total bandwidth of (2n - 2n) and bisection bandwidth of 2n*l (where l is the bandwidth).

MESH TOPOLOGY P P P P P P P P P

Hypercube: Is a multidimensional mesh. It has n processors with nlogn connections. It has a relatively low diameter of logn , total bandwidth of (n/2)*logn*l and a bisection bandwidth of (n/2)*l (where l is the bandwidth).

HYPERCUBE P P P P P P P P P P P P P P P P

Completely Connected: Every processor has n-1 connections , one to each of the other processors. Its diameter is 1 , a total bandwidth of (n/2)*(n-1)*l and bisection bandwidth of (|_n/2_| * n/2 )*l (where l is the bandwidth).

COMPLETELY CONNECTED P P P P P P P P

MIMD System Architectures: The Architecture of an MIMD system refers to its connections with respect to system memory. A Symmetric Multiprocessor ( SMP ) is a computer system that has two or more processors with comparable capabilities. The processors are capable of performing the same functions ; this is the symmetry of the SMPs.

Types of SMP: Uniform Memory Access ( UMA ). NonUniform Memory Access ( NUMA ). Cache Coherent NUMA ( CC-NUMA). Cache Only Memory Access ( COMA ).

Uniform Memory Access ( UMA ): UMA gives all CPUs equal access to all locations in shared memory. Communications Mechanism Processor 1 Shared Memory Processor 2 ... Processor n

NonUniform Memory Access ( NUMA ): NUMA architectures do not allow uniform access to all shared locations. Each processor can access the memory module closest to it , its local shared memory faster than the other modules , hence ununiform memory access times. Example: The Cray T3E Supercomputer. Processor 1 Processor 2 Processor n . . . Memory 1 Memory 1 Memory n Communications Mechanism

Cache Coherent NUMA ( CCNUMA ): It is similar to the NUMA Architecture. In addition each processor includes cache memory. Example: Silicon Graphic’s SGI. Cache Only Memory Access ( COMA ): In this architecture , each processor’s local memory is treated as a cache. Example: 1 )Kendall Square Research’s KSR1 and KSR2. 2 )The Swedish Institute of Computer Science’s Data Diffusion Machine ( DDM ).

Multicomputers: A Multicomputer is an MIMD machine in which all processors are not under the control of one operating system. Each processor or group of processors is in charge of a different operating system. One centralized scheduler allocates tasks to processors and processors to tasks.

Multicomputers: Network Of Workstations ( NOW ) or Cluster Of Workstations ( COW ): NOWs and COWs are more than a group of workstations on a local area network (LAN). They have a master scheduler , which matches tasks and processors together.

Massively Parallel Processor ( MPP ): Consist of many self-contained nodes , each having a processor , memory , and hardware for implementing internal communications. The processors communicate with each other using shared memory. Example: IBM’s Blue Gene.