Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
SE-292 High Performance Computing
Computer Organization and Architecture
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Parallel Processing Group Members: PJ Kulick Jon Robb Brian Tobin.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Chapter 18 Parallel Processing (Multiprocessing).
DISTRIBUTED COMPUTING
Advanced Computer Architectures
Advanced Architectures
Computer System Architectures Computer System Software
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.
Parallel Processing – Page 1 of 63CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading:
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Parallel ProcessingCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings,
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Department of Computer Science University of the West Indies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)
Computer System Architecture Dept. of Info. Of Computer. Chap. 13 Multiprocessors 13-1 Chap. 13 Multiprocessors n 13-1 Characteristics of Multiprocessors.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
Lec 6 Chap. 13Multiprocessors
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Parallel Computing Presented by Justin Reschke
These slides are based on the book:
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Overview Parallel Processing Pipelining
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Computer Organization and Architecture
Introduction to parallel programming
Distributed Processors
Parallel Processing - introduction
CS 147 – Parallel Processing
Threads, SMP, and Microkernels
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
Lecture 4- Threads, SMP, and Microkernels
Advanced Computer and Parallel Processing
Chapter 4 Multiprocessors
Presentation transcript:

Lecture 13 Parallel Processing

2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.

3 Why Use Parallel Computing? Saves time – wall clock time Cost savings Overcoming memory constraints It’s the future of computing Increasingly attractive Economics, technology, architecture, application demand Increasingly central and mainstream Parallelism exploited at many levels Instruction-level parallelism Multiprocessor servers Large-scale multiprocessors (“MPPs”) Focus of this class: multiprocessor level of parallelism Same story from memory system perspective Increase bandwidth, reduce average latency with many local memories Wide range of parallel architectures make sense Different cost, performance and scalability

4 Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD Multiple instruction, multiple data stream-MIMD Architecture Categories

5 Single Instruction, Single Data Stream - SISD Single processor Single instruction stream Data stored in single memory Uni-processor

6 Single Instruction, Multiple Data Stream - SIMD Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors

7 Multiple Instruction, Single Data Stream - MISD Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Never been implemented

8 Multiple Instruction, Multiple Data Stream- MIMD Set of processors Simultaneously execute different instruction sequences Different sets of data SMPs, clusters and NUMA systems

9 Taxonomy of parallel computing paradigms models Parallel Computer SynchronousAsynchronous Vector/Array SIMD Systolic MIMD

10 Flynn’s Hardware Taxonomy Processor Organizations Single instruction, single data (SISD) stream Multiple instruction, single data (MISD) stream Single instruction, multiple data (SIMD) stream Multiple instruction, multiple data (MIMD) stream Uniprocessor Vector processor Array processor Shared memory Distributed memory Symmetric multiprocessor (SMP) Nonuniform memory access (NUMA) Clusters

11 Shared Memory Architecture All processors access all memory as a single global address space. Data sharing is fast. Lack of scalability between memory and CPUs

12 Distributed Memory Each processor has its own memory. Is scalable, no overhead for cache coherency. Programmer is responsible for many details of communication between processors.

13 Near Tightly Coupled - SMP Processors share memory Communicate via that shared memory Symmetric Multiprocessor (SMP) Share single memory or pool Shared bus to access memory Memory access time to given area of memory is approximately the same for each processor

14 Tightly Coupled - NUMA Nonuniform memory access Access times to different regions section of memroy may differ

15 Distributed Shared Memory Arch.: NUMA Memory is physically distributed but logically shared. The physical layout similar to the distributed-memory case Aggregated memory of the whole system appear as one single address space. Due to the distributed nature, memory access performance varies depending on which CPU accesses which parts of memory (“local” vs. “remote” access). Two locality domains linked through a high speed connection called Hyper Transport Advantage – Scalability Disadvantage – Locality Problems and Connection congestion.

16 Loosely Coupled - Clusters Collection of independent uniprocessors or SMPs Interconnected to form a cluster Communication via fixed path or network connections

17 Symmetric Multiprocessors A stand alone computer with the following characteristics Two or more similar processors of comparable capacity Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor All processors share access to I/O Either through same channels or different channels giving paths to same devices All processors can perform the same functions (hence symmetric) System controlled by integrated operating system providing interaction between processors Interaction at job, task, file and data element levels

18 SMP Advantages Performance If some work can be done in parallel Availability Since all processors can perform the same functions, failure of a single processor does not halt the system Incremental growth User can enhance performance by adding additional processors Scaling Vendors can offer range of products based on number of processors

19 Interconnection Networks(IN) IN topology Distributed MemoryShared Memory StaticDynamic 1-dimensional 2-dimensional Hypercube Single-stage Multi-stage Cross-bar Vector MIMD

20 Distributed Memory – Static Networks Linear array (1-d) 2-dimensional networks ring star tree mesh

21 Organization Classification Time shared or common bus Multiport memory Central control unit

22 Time Shared Bus Simplest form Structure and interface similar to single processor system Following features provided Addressing - distinguish modules on bus Arbitration - any module can be temporary master Time sharing - if one module has the bus, others must wait and may have to suspend Now have multiple processors as well as multiple I/O modules

23 Time Share Bus - Advantages Simplicity Flexibility Reliability

24 Time Share Bus - Disadvantage Performance limited by bus cycle time Each processor should have local cache Reduce number of bus accesses Leads to problems with cache coherence Solved in hardware - see later

25 Operating System Issues Simultaneous concurrent processes Scheduling Synchronization Memory management Reliability and fault tolerance

26 Clusters Alternative to SMP High performance High availability Server applications A group of interconnected whole computers Working together as unified resource Illusion of being one machine Each computer called a node

27 Cluster Benefits Absolute scalability Incremental scalability High availability Superior price/performance