ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.

Slides:



Advertisements
Similar presentations
© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
How Multi-threading can increase on-chip parallelism
ThreadsThreads operating systems. ThreadsThreads A Thread, or thread of execution, is the sequence of instructions being executed. A process may have.
Module I Overview of Computer Architecture and Organization.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
Manolis Katevenis FORTH and University of Crete, Greece Interprocessor Communication seen as load/store instruction generalization.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
Multiprocessors Speed of execution is a paramount concern, always so … If feasible … the more simultaneous execution that can be done on multiple computers.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Machine Organizations Copyright 2004 Daniel J. Sorin Duke University.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
William Stallings Computer Organization and Architecture 6th Edition
Operating System Overview
COMP 740: Computer Architecture and Implementation
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Unit- 3 Chapter 7 Input/Output.
Infiniband Architecture
Microarchitecture.
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Operating Systems (CS 340 D)
William Stallings Computer Organization and Architecture 7th Edition
Assembly Language for Intel-Based Computers, 5th Edition
Instant replay The semester was split into roughly four parts.
CS 286 Computer Organization and Architecture
CS703 - Advanced Operating Systems
IRQ, DMA and I/O Ports - Introduction -
Microcomputer Architecture
CMSC 611: Advanced Computer Architecture
Levels of Parallelism within a Single Processor
Overview of Computer Architecture and Organization
Chapter 1: How are computers organized?
Introduction to Operating Systems
Introduction to Operating Systems
/ Computer Architecture and Design
Created by Vivi Sahfitri
Overview of Computer Architecture and Organization
Latency Tolerance: what to do when it just won’t go away
CSC3050 – Computer Architecture
William Stallings Computer Organization and Architecture 8th Edition
Levels of Parallelism within a Single Processor
Chapter 4 Multiprocessors
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
The University of Adelaide, School of Computer Science
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment. Phone:
Chapter 13: I/O Systems.
Presentation transcript:

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel J. Sorin Duke University

2 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Outline Interactions with Microarchitectures –Instruction Level Parallelism (dynamic scheduling) –Memory Level Parallelism (multiple outstanding requests) –Thread Level Parallelism (multithreading, SMT) Interactions with Input/Output –Remote DMA in general –VIA/Infiniband –Using IP

3 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Interactions with Microarchitectures We’ve mostly assumed that we’ve been given CPUs –But what do these processors do? Processors exploit many levels of parallelism –How does this affect multiprocessor design? Types of parallelism –Instruction-level (ILP) –Memory-level (MLP) –Thread-level (TLP)

4 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Instruction Level Parallelism (ILP) We’re not using 5-stage, in-order CPUs ILP = instruction level parallelism –Faster rate of memory requests –Greater demands on system bandwidth How do complex processors interact in an MP? To speed up processors, we can relax consistency –MIPS R10000 speculatively relaxes SC –Other processors exploit PC, weak ordering, RC, etc.

5 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Memory Level Parallelism (MLP) Not only can modern processors issue memory requests more frequently and out-of-order, but they can have multiple outstanding requests Miss status holding registers (MSHRs) maintain state for multiple outstanding requests Hypothesis: out-of-order scheduling of processors is most helpful because it enables greater MLP –Gets requests out sooner and overlaps their latencies

6 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Modern CPUs in MPs PRESENTATION

7 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 SC + ILP = RC? PRESENTATION

8 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Thread Level Parallelism (TLP) Commercial workloads often exhibit TLP –Threads handle independent requests/transactions Some processors support multithreading –Simultaneous Multithreading (SMT) –Intel Hyperthreading –Sun MAJC Challenge: assign threads to contexts –Intra-query parallelism  use multiple contexts on single CPU? –Inter-query parallelism  use contexts on different CPUs?

9 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Outline Interactions with Microarchitectures Interactions with Input/Output –SANs in general –VIA/Infiniband –Using IP

10 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Interactions with I/O Real machines interact with the outside world I/O = disks, internet, printer, monitor, etc. What’s the best way to interact with I/O? Traditionally –I/O bridge on memory bus & I/O protocol, such as PCI or SCSI

11 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 System Area Networks SANs connect systems together and/or connect systems to I/O devices Must design communication assist, just like we did in beginning of class  must answer same questions How does a system communicate with I/O on a SAN? –Synchronous send/receive of small messages –Asynchronous bulk transfer (DMA) How much hardware support do we provide? Does the OS have to be involved? Can we offload work from the primary processor(s)?

12 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Remote DMA (RDMA) Class of techniques for remote communication –Asynchronous transfer of bulk data Allows a process on one node to read/write from/into pre-arranged buffer space on another node –Requires establishment of buffers on one or both nodes After completion, the reader/writer is notified –Just like “normal” uniprocessor DMA

13 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Virtual Interface Architecture (VIA) VIA = Virtual Interface Architecture –SAN standard developed by many companies Enables user-level communication (incl. RDMA) –Process p1 on Proc1 registers buffer for sending data –Process p2 on Proc2 registers buffer for receiving data –Once registering is done, no OS involvement in transfers To communicate, processes post requests (for sending or receiving) on work queues Upon completion, a “doorbell” notifies the poster of the request that it is done

14 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 InfiniBand InfiniBand integrates VIA style communication over a unified SAN fabric –Like VIA, InfiniBand has been designed by committee – Unlike VIA, InfiniBand is all-inclusive –Covers high level protocols all the way down to physical design –Designed by committee  has every feature possible –Specifies interfaces, but not implementations –Printing the specs involves the cutting of many trees The jury is still out on whether InfiniBand will survive

15 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 QPIP PRESENTATION