CS 284a, 7 October 97Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

L.N. Bhuyan Adapted from Patterson’s slides

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.

Distributed Systems CS

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Introductions to Parallel Programming Using OpenMP

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Multiple Processor Systems

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.

Introduction to MIMD architectures

Background Computer System Architectures Computer System Software.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Lecture 18: Multiprocessors

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

CS 284a, 8 October 1997 Copyright (c) , John Thornley1 CS 284a Lecture Wednesday, 8 October, 1997.

Parallel Computing Overview CS 524 – High-Performance Computing.

CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.

Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.

1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.

Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.

Chapter 17 Parallel Processing.

1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

Parallel Computer Architectures

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Parallel Computer Architecture: Essentials for Both Computer Scientists and Engineers Edward F. Gehringer †* Yan.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

DDM - A Cache-Only Memory Architecture Erik Hagersten, Anders Landlin and Seif Haridi Presented by Narayanan Sundaram 03/31/2008 1CS258 - Parallel Computer.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)

Stuart Cunningham - Computer Platforms COMPUTER PLATFORMS Network Operating Systems Week 9.

Computer System Architectures Computer System Software

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.

Parallel Computer Architecture and Interconnect 1b.1.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.

MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

Outline Why this subject? What is High Performance Computing?

Lecture 3: Computer Architectures

1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Background Computer System Architectures Computer System Software.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

The University of Adelaide, School of Computer Science

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

CS5102 High Performance Computer Systems Thread-Level Parallelism

Shared Memory Multiprocessors

Lecture 1: Parallel Architecture Intro

High Performance Computing

Presentation transcript:

CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997

CS 284a, 7 October 97Copyright (c) , John Thornley2 Multiprocessors and Multiprocessing Hardware: Multiprocessor computers have become commodity products, e.g., quad-processor Pentium Pros, SGI and Sun workstations. Programming: Multithreaded programming is supported by commodity operating systems, e.g., Windows NT, UNIX/Pthreads. Applications: Traditionally science and engineering. Now also business and home computing. Problem: Difficulty of multithreaded programming compared to sequential programming.

CS 284a, 7 October 97Copyright (c) , John Thornley3 Why Buy a Multiprocessor? Multiple users. Multiple applications. Multitasking within an application. Responsiveness and/or throughput.

CS 284a, 7 October 97Copyright (c) , John Thornley4 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors communicate via message passing. Shared-Memory Architectures –Single address space shared by all processors. –Processors communicate by memory read/write. –SMP or NUMA. –Cache coherence is important issue. Lots of middle ground and hybrids. No clear consensus on terminology.

CS 284a, 7 October 97Copyright (c) , John Thornley5 Message-Passing Architecture... processor cache memory processor cache memory processor cache memory interconnection network...

CS 284a, 7 October 97Copyright (c) , John Thornley6 Shared-Memory Architecture... interconnection network... processor 1 cache processor 2 cache processor N cache memory 1 memory M memory 2

CS 284a, 7 October 97Copyright (c) , John Thornley7 Shared-Memory Architecture: SMP and NUMA SMP = Symmetric Multiprocessor –All memory is equally close to all processors. –Typical interconnection network is a shared bus. –Easier to program, but doesn’t scale to many processors. NUMA = Non-Uniform Memory Access –Each memory is closer to some processors than others. –a.k.a. “Distributed Shared Memory”. –Typically interconnection is grid or hypercube. –Harder to program, but scales to more processors.

CS 284a, 7 October 97Copyright (c) , John Thornley8 Shared-Memory Architecture: Cache Coherence Effective caching reduces memory contention. Processors must see single consistent memory. Many different consistency models. Weak consistency is sufficient. Snoopy cache coherence for bus-based SMPs. Distributed directories for NUMA. Many implementation issues: multiple-levels, I-D separation, cache line size, update policy, etc. etc. Usually don’t need to know all the details.

CS 284a, 7 October 97Copyright (c) , John Thornley9 Example: Quad-Processor Pentium Pro SMP, bus interconnection. 4 x 200 MHz Intel Pentium Pro processors Kb L1 cache per processor. 512 Kb L2 cache per processor. Snoopy cache coherence. Compaq, HP, IBM, NetPower. Windows NT, Solaris, Linux, etc.

CS 284a, 7 October 97Copyright (c) , John Thornley10 Example: SGI Origin 2000 NUMA, hypercube interconnection. Up to 128 (64 x 2) MIPS R processors Kb L1 cache per processor. 4 Mb L2 cache per processor. Distributed directory-based cache coherence. Automatic page migration/replication. SGI IRIX with Pthreads.

CS 284a, 7 October 97Copyright (c) , John Thornley11 Message-Passing versus Shared-Memory  Architectures Shared-memory programming model is easier because data transfer is handled automatically. Proof: message passing can be efficiently implemented on shared memory, but not vice versa. How much of shared-memory programming model should be implemented in hardware? How efficient is shared-memory programming model? How well does shared-memory scale? Does scalablity really matter?