Download presentation
Presentation is loading. Please wait.
Published byNigel Baldwin Modified over 8 years ago
1
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel J. Sorin Duke University
2
2 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Outline Interactions with Microarchitectures –Instruction Level Parallelism (dynamic scheduling) –Memory Level Parallelism (multiple outstanding requests) –Thread Level Parallelism (multithreading, SMT) Interactions with Input/Output –Remote DMA in general –VIA/Infiniband –Using IP
3
3 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Interactions with Microarchitectures We’ve mostly assumed that we’ve been given CPUs –But what do these processors do? Processors exploit many levels of parallelism –How does this affect multiprocessor design? Types of parallelism –Instruction-level (ILP) –Memory-level (MLP) –Thread-level (TLP)
4
4 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Instruction Level Parallelism (ILP) We’re not using 5-stage, in-order CPUs ILP = instruction level parallelism –Faster rate of memory requests –Greater demands on system bandwidth How do complex processors interact in an MP? To speed up processors, we can relax consistency –MIPS R10000 speculatively relaxes SC –Other processors exploit PC, weak ordering, RC, etc.
5
5 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Memory Level Parallelism (MLP) Not only can modern processors issue memory requests more frequently and out-of-order, but they can have multiple outstanding requests Miss status holding registers (MSHRs) maintain state for multiple outstanding requests Hypothesis: out-of-order scheduling of processors is most helpful because it enables greater MLP –Gets requests out sooner and overlaps their latencies
6
6 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Modern CPUs in MPs PRESENTATION
7
7 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 SC + ILP = RC? PRESENTATION
8
8 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Thread Level Parallelism (TLP) Commercial workloads often exhibit TLP –Threads handle independent requests/transactions Some processors support multithreading –Simultaneous Multithreading (SMT) –Intel Hyperthreading –Sun MAJC Challenge: assign threads to contexts –Intra-query parallelism use multiple contexts on single CPU? –Inter-query parallelism use contexts on different CPUs?
9
9 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Outline Interactions with Microarchitectures Interactions with Input/Output –SANs in general –VIA/Infiniband –Using IP
10
10 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Interactions with I/O Real machines interact with the outside world I/O = disks, internet, printer, monitor, etc. What’s the best way to interact with I/O? Traditionally –I/O bridge on memory bus & I/O protocol, such as PCI or SCSI
11
11 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 System Area Networks SANs connect systems together and/or connect systems to I/O devices Must design communication assist, just like we did in beginning of class must answer same questions How does a system communicate with I/O on a SAN? –Synchronous send/receive of small messages –Asynchronous bulk transfer (DMA) How much hardware support do we provide? Does the OS have to be involved? Can we offload work from the primary processor(s)?
12
12 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Remote DMA (RDMA) Class of techniques for remote communication –Asynchronous transfer of bulk data Allows a process on one node to read/write from/into pre-arranged buffer space on another node –Requires establishment of buffers on one or both nodes After completion, the reader/writer is notified –Just like “normal” uniprocessor DMA
13
13 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 Virtual Interface Architecture (VIA) VIA = Virtual Interface Architecture –SAN standard developed by many companies Enables user-level communication (incl. RDMA) –Process p1 on Proc1 registers buffer for sending data –Process p2 on Proc2 registers buffer for receiving data –Once registering is done, no OS involvement in transfers To communicate, processes post requests (for sending or receiving) on work queues Upon completion, a “doorbell” notifies the poster of the request that it is done
14
14 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 InfiniBand InfiniBand integrates VIA style communication over a unified SAN fabric –Like VIA, InfiniBand has been designed by committee –http://www.infinibandta.org/http://www.infinibandta.org/ Unlike VIA, InfiniBand is all-inclusive –Covers high level protocols all the way down to physical design –Designed by committee has every feature possible –Specifies interfaces, but not implementations –Printing the specs involves the cutting of many trees The jury is still out on whether InfiniBand will survive
15
15 (C) 2004 Daniel J. Sorin from Adve, Falsafi, Hill, Lebeck, Reinhardt, Singh ECE 259 / CPS 221 QPIP PRESENTATION
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.