Download presentation
Presentation is loading. Please wait.
Published byAmanda Fowler Modified over 9 years ago
1
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks 1
2
The PPL Vision Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Data informatics Scientific Engineering Scientific Engineering Physics (Liszt) Scripting Probabilistic (RandomT) Machine Learning (OptiML) Rendering Parallel Runtime (Delite, Sequoia, GRAMPS) Dynamic Domain Spec. Opt. Locality Aware Scheduling Staging Polymorphic Embedding Applications Domain Specific Languages Heterogeneous Hardware DSL Infrastructure Task & Data Parallelism Hardware Architecture OOO Cores SIMD Cores Threaded Cores Specialized Cores Static Domain Specific Opt. Programmable Hierarchies Programmable Hierarchies Scalable Coherence Scalable Coherence Isolation & Atomicity On-chip Networks On-chip Networks Pervasive Monitoring
3
In a Nutshell Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs Input buffers at routers are not needed Compared to VC routers: Reduces cycle time up to 67% Provides 43% more throughput per unit power, and 22% more throughput per unit area Makes for a simpler network EB uses duplicate subnetworks for traffic isolation For many classes, a hybrid EB-VC router is used instead Uses buffers only to alleviate severe contention and deadlocks. Increases power efficiency 3
4
Outline Building EB channels The basic building blocks of EB networks EB router design Deadlock avoidance & congestion sensing Evaluation results 4
5
The Idea Use the network channels as distributed FIFOs Use that storage instead of input buffers at routers To remove input buffer area and power costs Pipelined channel Channel as FIFO 5
6
Building an Elastic Buffer To build an EB in a pipelined channel with master-slave flip-flops (FFs): Use latches for storage by driving their enables independently Master-slave FF Elastic buffer 6
7
How Elastic Buffer Channels Work Ready/valid handshake between elastic buffers Ready: At least one free storage slot Valid: Non-empty (driving valid data) Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6 7
8
Outline Building EB channels EB router design The implications in router design Deadlock avoidance & congestion sensing Evaluation results 8
9
Use EB Flow-Control Through the Router VC input-buffered router EB router Input buffer replaced by input EB VC & SW allocators removed. Per-output arbiters instead. Three-slot output EB to cover for arbitration done one cycle in advance. LA routing also applicable to EB networks. 9
10
Two Improved Router Designs Enhanced two- stage Fixes baseline design’s main inefficiencies Prioritizes cycle time Single-stage Removes pipelining overhead Prioritizes latency 10
11
Outline Building EB channels EB router design Deadlock avoidance & congestion sensing How to provide traffic classes Evaluation results 11
12
Deadlock Avoidance No input buffers no virtual channels Can provide traffic isolation with duplicate physical channels Duplicating subnetworks most efficient due to crossbar quadratic cost That is only true for up to a certain number of classes 12
13
Hybrid EB-VC Router For many classes, have an input buffer to drain flits after a predefined number of blocking cycles Thus, buffer is used only to alleviate heavy contention and resolve deadlocks In the common case, as energy efficient as EB networks 13
14
Output Channel Occupancy Load Metric Flit-buffered networks use credit count EB networks measure output channel occupancy At a certain segment of the output channel (shown in red) Occupancy decremented when flits leave that segment Incremented by a packet’s length when routing decision is made. Packets see other decisions in same cycle 14
15
Outline Building EB channels EB router design Deadlock avoidance & congestion sensing Evaluation results Let’s talk numbers 15
16
Throughput-Power Mesh (Baseline Router) EB network improvement: Same power: 10% increased throughput Same throughput: 12% reduced power Throughput gain EB: 18% lower cycle time. Not taken into account. 16
17
Router RTL Implementation No buffers, VCs, allocators, credits VC router had look-ahead routing Buffers: FF arrays. 2 VCs, 8 slots each AspectVC routerEB routerSavings Area (μm 2 )63,51514,73077% Clock (ns)3.32.718% Power (mW)2.590.1295% 45nm, LP-CMOS, worst-case Mesh 5x5 routers. DOR. 64-bit datapath 17
18
Router Comparison 18 Baseline: 9% less energy than single- stage. 35% than enhanced Enhanced: 26% reduced cycle time than single-stage. 42% than baseline
19
Hybrid EB-VC Comparison Cycle time comparable to VC, not EB routers 19 Hybrid offers 21% more throughput per unit power than VC. 12% than EB The VC network offers 41% more throughput per unit area. The EB 49%
20
Conclusions EB flow-control uses channels as distributed FIFOs Uses the pipeline flip-flops that are required anyway Removes input buffers from routers Provides 43% more throughput per unit power, and 22% more throughput per unit area Depends on what fraction of the cost input buffers are Reduces cycle time up to 67% Hybrid EB-VC router provides a large number of classes. Input buffer is used only when it has to 21% more throughput per unit power than VC Remove buffers, keep buffering. Elastic buffers! 20
21
Questions? 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.