CALTECH cs184c Spring2001 -- DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 15: May 29, 2001 Interconnect.

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
©2003 Dror Feitelson Parallel Computing Systems Part II: Networks and Routing Dror Feitelson Hebrew University.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Module 3.4: Switching Circuit Switching Packet Switching K. Salah.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 22: April 4, 2007 Interconnect 7: Time Multiplexed Interconnect.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
Switching Techniques Student: Blidaru Catalina Elena.
Interconnect Network Topologies
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 20: May 23, 2003 Interconnect.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Interconnect Networks
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Networks-on-Chips (NoCs) Basics
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Computer Networks with Internet Technology William Stallings
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
Networks and Distributed Systems Mark Stanovich Operating Systems COP 4610.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Super computers Parallel Processing
Networks and Distributed Systems Sarah Diesburg Operating Systems COP 4610.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Computer Communication & Networks Lecture # 03 Circuit Switching, Packet Switching Nadeem Majeed Choudhary
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
ESE534: Computer Organization
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Lecture 23: Interconnection Networks
ESE532: System-on-a-Chip Architecture
Switching, routing, and flow control in interconnection networks
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
CS184a: Computer Architecture (Structure and Organization)
Presentation transcript:

CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 15: May 29, 2001 Interconnect

CALTECH cs184c Spring DeHon Previously CS184a: Day –interconnect needs and requirements –basic topology This quarter –most systems require –interfacing issues model, hardware, software

CALTECH cs184c Spring DeHon Today Issues Topology/locality/scaling –(some review) Styles –from static –to online, packet, wormhole Online routing

CALTECH cs184c Spring DeHon Issues Bandwidth –aggregate, per endpoint –local contention and hotspots Latency Cost (scaling) –locality Arbitration –conflict resolution –deadlock Routing –(quality vs. complexity) Ordering

CALTECH cs184c Spring DeHon Topology and Locality (Partially) Review

CALTECH cs184c Spring DeHon Simple Topologies: Bus Single Bus –simple, cheap –low bandwidth not scale with PEs –typically online arbitration can be offline scheduled

CALTECH cs184c Spring DeHon Bus Routing Offline: –divide time into N slots –assign positions to various communications –run modulo N w/ each consumer/producer send/receiving on time slot e.g. 1: A->B 2: C->D 3: A->C 4: A->B 5: C->B 6: D->A 7: D->B 8: A->D

CALTECH cs184c Spring DeHon Bus Routing Online: –request bus –wait for acknowledge Priority based: –give to highest priority which requests –consider ordering –Got i = Want i ^ Avail i Avail i+1 =Avail i ^ /Want i Solve arbitration in log time using parallel prefix For fairness –start priority at different node –use cyclic parallel prefix deal with variable starting point

CALTECH cs184c Spring DeHon Token Ring On bus –delay of cycle goes as N –can’t avoid, even if talking to nearest neighbor Token ring –pipeline bus data transit (ring) high frequency –can exit early if local –use token to arbitrate use of bus

CALTECH cs184c Spring DeHon Multiple Busses Simple way to increase bandwidth –use more than one bus Can be static or dynamic assignment to busses –static A->B always uses bus 0 C-> always uses bus 1 –dynamic arbitrate for a bus, like instruction dispatch to k identical CPU resources

CALTECH cs184c Spring DeHon Crossbar No bandwidth reduction –(except receiver at endoint) Easy routing (on or offline) Scales poorly –N 2 area and delay No locality

CALTECH cs184c Spring DeHon Hypercube Arrange 2 n nodes in n-dimensional cube At most n hops from source to sink High bisection bandwidth –good for traffic –bad for cost [O(n 2 )] May not be able to use all of bisect ?!? Exploit locality Node size grows as log(N)…or maybe log 2 (N)

CALTECH cs184c Spring DeHon Multistage Unroll hypercube vertices so log(N), constant size switches per hypercube node –solve node growth problem –lose locality –similar good/bad points for rest

CALTECH cs184c Spring DeHon Hypercube/Multistage Blocking Minimum length multistage –many patterns cause bottlenecks –e.g.

CALTECH cs184c Spring DeHon Hypercube/Multistage Blocking Solvable with non-minimum length (e.g. Beneš) Also solvable by routing multiple times through net –I.e. Beneš is two back-to-back MINs

CALTECH cs184c Spring DeHon Beneš Nework

CALTECH cs184c Spring DeHon Beneš Routing Solve recursively by looping Start at a route Pick top or bottom half to route path Allocate at destination Look at other route must come in here Must take alternate path Continue until –cycle closes or ends If unrouted at this level, –pick new starting point and continue Once finish this level, –repeat/recurse on top and bottom subproblems remaining

CALTECH cs184c Spring DeHon Online Hypercube Blocking If routing offline, can calculate Benes- like route Online, don’t have time, global view Observation: only a few, canonically bad patterns Solution: Route to random intermediate –then route from there to destination

CALTECH cs184c Spring DeHon K-ary N-cube Alternate reduction from hypercube –restrict to N<log(N) dimensional structure –allow more than 2 ordinates in each dimension E.g. mesh (2-cube), 3D-mesh (3-cube) Matches with physical world structure Bounds degree at node Has Locality Even more bottleneck potentials –make channels wider (CS184a)

CALTECH cs184c Spring DeHon Torus Wrap around n-cube ends –2-cube  cylinder –3-cube  donut Cuts worst-case distances in half Can be laid-out reasonable efficiently –maybe 2x cost in channel width?

CALTECH cs184c Spring DeHon Fat-Tree Saw that communications typically has locality (CS184a) Modeled recursive bisection/Rent’s Rule Leiserson showed Fat-Tree was (area, volume) universal –w/in log(N) the area of any other structure –exploit physical space limitations wiring in {2,3}-dimensions

CALTECH cs184c Spring DeHon Universal Fat-Tree P=0.5 for area universal P=2/3 for volume I.e. go as ratio –surface/perimeter –area/volume Directly related –results on depop. CS184a day 13

CALTECH cs184c Spring DeHon Express Cube (Mesh with Bypass) Large machine in 2 or 3 D mesh –routes must go through square/cube root switches –vs. log(N) in fat-tree, hypercube, MIN Saw practically can go further than one hop on wire… Add long-wire bypass paths

CALTECH cs184c Spring DeHon Segmentation To improve speed (decrease delay) Allow wires to bypass switchboxes Maybe save switches? Certainly cost more wire tracks CS184a Day 14

CALTECH cs184c Spring DeHon Routing Styles

CALTECH cs184c Spring DeHon Hardwired Direct, fixed wire between two points E.g. Conventional gate-array, std. cell Efficient when: –know communication a priori fixed or limited function systems high load of fixed communication –often control in general-purpose systems –links carry high throughput traffic continually between fixed points

CALTECH cs184c Spring DeHon Configurable Offline, lock down persistent route. E.g. FPGAs Efficient when: –link carries high throughput traffic (loaded usefully near capacity) –traffic patterns change on timescale >> data transmission

CALTECH cs184c Spring DeHon Time-Switched Statically scheduled, wire/switch sharing E.g. TDMA, NuMesh, TSFPGA Efficient when: –thruput per channel < thruput capacity of wires and switches –traffic patterns change on timescale >> data transmission

CALTECH cs184c Spring DeHon Self-Route, Circuit-Switched Dynamic arbitration/allocation, lock down routes E.g. METRO/RN1 Efficient when: –instantaneous communication bandwidth is high (consume channel) –lifetime of comm. > delay through network –communication pattern unpredictable –rapid connection setup important

CALTECH cs184c Spring DeHon Self-Route, Store-and- Forward, Packet Switched Dynamic arbitration, packetized data Get entire packet before sending to next node E.g. nCube, early Internet routers Efficient when: –lifetime of comm < delay through net –communication pattern unpredictable –can provide buffer/consumption guarantees –packets small

CALTECH cs184c Spring DeHon Self-Route, Wormhole Packet-Switched Dynamic arbitration, packetized data E.g. Caltech MRC, Modern Internet Routers Efficient when: –lifetime of comm < delay through net –communication pattern unpredictable –can provide buffer/consumption guarantees –message > buffer length allow variable (? Long) sized messages

CALTECH cs184c Spring DeHon Online Routing

CALTECH cs184c Spring DeHon Costs: Area Area –switch (1-1.5K / switch) larger with pipeline (4K) and rebuffer –state (SRAM bit = 1.2K / bit) multiple in time-switched cases –arbitrartion/decision making usually dominates above –buffering (SRAM cell per buffer) can dominate

CALTECH cs184c Spring DeHon Costs: Latency Time local –make decisions –round-trip flow-control Time –blocking in buffers –quality of decision pick wrong path have stale data

CALTECH cs184c Spring DeHon Intermediate For large # of predictable patterns –switching memory may dominate allocation area –area of routed case < time-switched Get offline, global planning advantage –by source routing –source specifies offline determined route path –offline plan avoids contention

CALTECH cs184c Spring DeHon Offline vs. Online If know patterns in advance –offline cheaper no arbitration (area, time) no buffering use more global data –better results As becomes less predictable –benefit to online routing

CALTECH cs184c Spring DeHon Deadlock Possible to introduce deadlock Consider wormhole routed mesh [example from Li and McKinley, IEEE Computer v26n2, 1993]

CALTECH cs184c Spring DeHon Dimension Order Routing Simple (early Caltech) solution –order dimensions –force complete routing in lower dimensions before route in next higher dimension

CALTECH cs184c Spring DeHon Dimension Order Routing Avoids cycles in channel graph Limits routing freedom Can cause artificial congestion –consider (0,0) to (4,3) (1,0) to (4,2) (2,0) to (4,1) (3,0) to (4,0) [There is a rich literature on how to do better]

CALTECH cs184c Spring DeHon Virtual Channel Variation: each physical channel represents multiple logical channels –each logical channel has own buffers –blocking in one VC allows other VCs to use the physical link

CALTECH cs184c Spring DeHon Virtual Channel Benefits –can be used to remove cycles e.g. separate increasing and decreasing channels route increasing first, then decreasing more freedom than dimension ordered –prioritize traffic e.g. prevent control/OS traffic from being blocked by user traffic –better utilization of physical routing channels

CALTECH cs184c Spring DeHon Lost Freedom? Online routes often make (must make) decisions based on local information Can make wrong decision –I.e. two paths look equally good at one point in net but one leads to congestion/blocking further ahead

CALTECH cs184c Spring DeHon Multibutterfly Network Routers have multiple outputs in each logical direction Use to avoid congestion –also faults

CALTECH cs184c Spring DeHon Multibutterfly Network Can get into local blocking when there is a path Costs of not having global information

CALTECH cs184c Spring DeHon Transit/Metro Self-routing circuit switched network When have choice –select randomly avoid bad structural cases When blocked –drop connection –allow to route again from source –stochastic search explores all paths finds any available

CALTECH cs184c Spring DeHon Chaos Router For mesh/packet –when blocked, allow route in any direction –allows to take non-minimizing path to get around congestion –avoids deadlock since blocking causes misroute Refs: –[Konstantinidou and Snyder, SPAA90] –

CALTECH cs184c Spring DeHon Big Ideas Must work with constraints of physical world –only have 3 dimensions (2 on current VLSI) in which to build interconnect –Interconnect can be dominate area, time –gives rise to universal networks e.g. fat-tree

CALTECH cs184c Spring DeHon Big Ideas Structure –exploit physical locality where possible Structure –the more predictable behavior cheaper the solution –exploit earlier binding time cheaper configured solutions allow higher quality offline solutions