Issues in System-Level Direct Networks Jason D. Bakos.

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Dynamic Topology Optimization for Supercomputer Interconnection Networks Layer-1 (L1) switch –Dumb switch, Electronic “patch panel” –Establishes hard links.
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Advanced Networking Wickus Nienaber Daniel Beech.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
Predictive Load Balancing Reconfigurable Computing Group.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCEMadrid, August 28-30, 2006 Jason D. Bakos, Charles L. Cathey, E. Allen Michalski,
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Storage area network and System area network (SAN)
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Networks
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.
Networks-on-Chips (NoCs) Basics
1 Lecture 7: Interconnection Network Part I: Basic Definitions Part II: Message Passing Multicomputers.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Computer Architecture Distributed Memory MIMD Architectures Ola Flygt Växjö University
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Interconnection Networks: Flow Control
Azeddien M. Sllame, Amani Hasan Abdelkader
Deadlock Free Hardware Router with Dynamic Arbiter
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
On-time Network On-chip
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
Switching, routing, and flow control in interconnection networks
Presentation transcript:

Issues in System-Level Direct Networks Jason D. Bakos

Issues in System-Level Networks2 Research Space Marculescu (CMU) formally defines space for NoC design… –Communication infrastructure synthesis Network topology –Ex: mesh, torus, cube, butterfly, tree –Affects everything: latency, throughput, area, fault-tolerance, power consumption –Depends mostly on floorplan and communication structure »Grid floorplans lend to mesh, but assumes cores are regular »Meshs keep wire lengths uniform Floorplanning –Coupled with topology –Biggest issues: regular or irregular core sizes, matching floorplan to topology Channel width –BW = f ch x W –Larger W reduces message latency (worm length) –Affects area (wiring, buffers) –Serial links are good for electrical reasons Buffer size –Depends on switching (store-and-forward, cut-through, circuit switching, wormhole) –Has great effect on router complexity/size

Issues in System-Level Networks3 Research Space Communication paradigm –Routing (and flow control) Affects latency, network throughput, and network utilization Types of routing –Deterministic »PROs: Avoids deadlock, livelock, and indefinite postponement »CONs: Bad for latency and throughput/utilization –Adaptive »PROs: Good for latency and throughput/utilization »CONs: Difficult to avoid deadlock, livelock, and indefinite postponement –Partially adaptive »PROs: Good for latency and throughput/utilization »CONs: Doesn’t exploit full network throughput –Flow control: »Virtual channels: originally for deadlock avoidance, but now used to increase throughput –Switching Ex: circuit switching, store-and-forward, cut-through, wormhole Wormhole better for data networks with dynamic traffic Circuit switching is easier to achieve guaranteed service operation (and better for application-specific NoCs)

Issues in System-Level Networks4 Research Space Application mapping optimization –Scheduling Have a set of tasks, now find a schedule for cores (static, dynamic) Traditional scheduling doesn’t account for network latency –IP mapping Assume floorplan and topology is fixed, map cores to placeholders to minimize energy (hops) Perform search over space of assignments

Issues in System-Level Networks5 Deterministic Wormhole Routing Deterministic –Ex: Dimension-ordered routing –One possible path for any S and D –Worm stops when header encounters a locked destination channel (router output port) Locks all channels along its path –Routers are small and simple Each input port of each router requires buffer for one flit –Guarantees shortest hop count (energy) and prevents deadlock, livelock, and indef. postponement –BAD: High latency (blocking)

Issues in System-Level Networks6 Adaptive Wormhole Routing Adaptive –Many paths between any S and any D –Worm follows a set path until it reaches a block, then routes around it –If the shortest possible remaining path is allowed, then is it fully adaptive –Lower latency, higher throughput –Susceptible to deadlock –Packets may arrive out-of- order

Issues in System-Level Networks7 Partially Adaptive Wormhole Routing Partially adaptive routing –Deadlock avoidance Eliminate a quarter of the turns to avoid deadlock fully adaptive, 8 turnsXY routing, 4 turns west-first, 6 turnsnorth-last, 6 turnsnegative-first, 6 turns

Issues in System-Level Networks8 Odd-Even Wormhole Routing In above methods, at least half of S/D pairs are restricted to having one minimal path, while full adaptiveness is provided to the others –Unfair! Odd-even turn routing offers solution: –Even column: no EN or ES turn –Odd column: no NW or SW turn

Issues in System-Level Networks9 Virtual Channel Routing S0S0 S1S1 S2S2 D0D0 Originally conceived as a way to improve network throughput –Time multiplex virtual channels onto physical channels –Assume deterministic routing D2D2 D1D1

Issues in System-Level Networks10 Fully Adaptive Routing with VCs Can achieve fully adaptive routing with VCs –Problem: minimize required number of VCs –Virtual channel 1 for N and S can only be used if the message no longer needs to be routed west (west-first)

Issues in System-Level Networks11 Where to go from here… NoC –Channels are wide and fast => lots of bandwidth –Routers should be FAST (core speed) and SMALL –Channels don’t require a lot of power Array of FPGAs –Routers cannot be fast, but can be large and complex –Channels are serial and require a LOT of power (differential) –Minimum hop count is important for low power (assuming you can shut down links)

Issues in System-Level Networks12 Applications For both FPGAs and NoCs: –Some/most/? signal processing algorithms can be realized as wide and/or deep dataflow graphs

Issues in System-Level Networks13 Applications FPGAs implement a sea of logic blocks interconnected in data-flow fashion –Slow for arbitrary logic due to wiring overheads (e.g. more latency and area per gate vs. ASIC) How about design an ASIC with an array of high-speed double-precision floating point units, interconnected in a NoC? –TRIPS-like, but allows reuse of functional units within the same DFG –Introduces scheduling issues

Issues in System-Level Networks14 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + + in0 in1 0 * 0 in * 2 in3 mem[0] DFG input 0 input 1 input 2 input 3 +*+* out

Issues in System-Level Networks15 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + + in0 in1 0 * 0 in * 2 in3 mem[0] in 0 input 0 input 1 input 2 input 3 +*+* out

Issues in System-Level Networks16 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + + in0 in1 0 * 0 in * 2 in3 mem[0] in 1 input 0 input 1 input 2 input 3 +*+* out

Issues in System-Level Networks17 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + + in0 in1 0 * 0 in * 2 in3 mem[0] in 2 input 0 input 1 input 2 input 3 +*+* out

Issues in System-Level Networks18 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + input 0 input 1 input 2 input 3 +*+* out in0 in1 0 * 0 in * 2 in3 mem[0] in 3

Issues in System-Level Networks19 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + input 0 input 1 input 2 input 3 +*+* out in0 in1 0 * 0 in * 2 in3 mem[0] in 0

Issues in System-Level Networks20 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + input 0 input 1 input 2 input 3 +*+* out in0 in1 0 * 0 in * 2 in3 mem[0] in 1 00

Issues in System-Level Networks21 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + input 0 input 1 input 2 input 3 +*+* out in0 in1 0 * 0 in * 2 in3 mem[0] in 2 0 0

Issues in System-Level Networks22 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + input 0 input 1 input 2 input 3 +*+* out in0 in1 0 * 0 in * 2 in3 mem[0] 1 0 in 3

Issues in System-Level Networks23 NoC-based General Purpose Streaming Data Flow Architecture C** * mem + +D + input 0 input 1 input 2 input 3 +*+* out in0 in1 0 * 0 in * 2 in3 mem[0] 0 in 0 1

Issues in System-Level Networks24 Other Ideas Marculescu recently looked at mapping strategies for regular tile-based NoCs… –He handwaved away the possibility of adaptive VC-based routing, due to complex routers –In class, we read about a pipelined VC router design… didn’t seem that complex –How about we evaluate the trade-offs between router complexity and network throughput? Apply data-flow architecture to FPGA array?