Download presentation
Presentation is loading. Please wait.
1
Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University
2
Multi-Core Wave & Networks-On-Chip
Uniprocessors hit the power wall. Multi-processors provide high performance at lower power budget. Shared-bus architecture has scalability limitation. Networks-On-Chip (NOCs) orchestrate chip-wide communications towards future many-core processors. MIT Raw (0.18um, 300MHz) 16-core chip Four 4x4 mesh networks Intel Polaris (65nm, 4GHz) 80-core chip 8x10 mesh network First, let’s look at two changes in our processor design. Lei Wang - NOCS 2009
3
Challenges in On-Chip Communication
High performance Low communication latency is critical for high system performance. Bandwidth-efficient Well-designed routing algorithms provide high network throughput. Power and Area Constraints Simple topologies and slim routers reduce communication power consumption and save chip area. Efficient Multicast supporting Cache coherence protocols heavily rely on multicast or broadcast communication characteristics. We propose a bandwidth-efficient routing for multicast communication in NOCs with low latency and power consumption. Lei Wang - NOCS 2009
4
Prior Work in Multicast Communication
Routing Evaluation Criteria for Multicast Communication [Ni93] Multicast in multicomputer system Tree-based Multicast Routing for DSM Multiprocessor [Torrellas96] Short message multicast in DSM system Virtual Circuit Tree Multicasting for NOCs[Lipasti08] Demonstrate necessity of multicasting on-chip Propose table-based multicast routing Region-based Multicast for CMPs [Duato08] Multicast routing for irregular topology in CMPs Lei Wang - NOCS 2009
5
Outline Motivation Multicast Router Design
State-of-art Unicast Router Architecture Replication Schemes Destination List Management Recursive Partitioning Multicast (RPM) Network Partitioning Routing Rules Example Deadlock Avoidance Evaluation Conclusion Lei Wang - NOCS 2009
6
Different Bandwidth Usage Example
Source Destination 1 2 3 1 2 3 4 5 6 7 4 5 6 7 8 9 10 11 8 9 10 11 12 13 14 15 12 13 14 15 Left Path requires 11 link traversals, 12 buffer writes, 15 buffer reads, and 15 crossbar traversals Right Path requires 5 link traversals, 6 buffer writes, 10 buffer reads, and 10 cross-bar traversals Lei Wang - NOCS 2009
7
State-of-Art Wormhole Unicast Router
RC VA SA ST LT Router Link RC VA SA ST LT Router Link RC: Route Computation VA: VC Allocation; SA: Switch Allocation ST: Switch Traversal; LT: Link Traversal Lei Wang - NOCS 2009
8
What we need in a Multicast Router?
Packet Replication Synchronous Replication Asynchronous Replication Destination List Management All-destination Encoding Bit String Encoding Multiple-region Broadcast Encoding Lei Wang - NOCS 2009
9
Synchronous Replication
Head flit Time (Cycle) M Middle flit 1 2 3 Tail flit T Input 0 Output 0 T M M M H H Input 1 Output 1 Input 2 Output 2 Input 3 Output 3 Packet replication happens at Switch Traversal Stage. Lei Wang - NOCS 2009
10
Asynchronous Replication
Head flit Time (Cycle) M Middle flit 1 2 3 Tail flit T Input 0 Output 0 T M M M M H H Input 1 Output 1 Input 2 Output 2 Input 3 Output 3 Lei Wang - NOCS 2009
11
Network Partitioning Source node N W E S 1 2 3 7 4 8 5 Eight Parts
Source node 2 N 3 7 W E 4 8 5 Eight Parts Three Parts (5, 6, 7) S Three Parts (0, 1, 7) Three Parts (3, 4, 5) Three Parts (1, 2, 3) Lei Wang - NOCS 2009
12
Basic Routing Rules North: top right corner. West: top left corner.
South: bottom left corner. East: bottom right corner. N W E S Source N N E E W W S S Destination Lei Wang - NOCS 2009
13
Optimized Routing Rules
Source Destination Deadlock!!! Lei Wang - NOCS 2009
14
RPM Example-step 1 Multicast Packet Source Destination Partitioning
Lei Wang - NOCS 2009
15
RPM Example-step 2 Multicast Packet Source Destination Partitioning
Ejection Lei Wang - NOCS 2009
16
RPM Example-step 3 Multicast Packet Source Destination Partitioning
Lei Wang - NOCS 2009
17
RPM Example-step 4 Multicast Packet Source Destination Partitioning
Ejection Ejection M M M M Ejection Lei Wang - NOCS 2009
18
RPM Example-step 5 Multicast Packet Source Destination Partitioning
Ejection M M Lei Wang - NOCS 2009
19
Deadlock Avoidance RPM has no turn restrictions, potentially introducing deadlock. We use Virtual Network (VN) to avoid deadlock. Two VNs lie in the same physical network. Virtual Channels of each port are equally divided into each virtual network. Virtual network Id (0 or 1) for each packet is decided at the source. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Virtual Network 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Virtual Network 1 Lei Wang - NOCS 2009
20
Evaluation Methodology
Performance Model: Cycle-accurate Network Simulator Models all router pipeline stages in detail Highly parameterized Power Model: Orion with both dynamic and leakage power models Network configuration Topology 8×8 Mesh (6×6 Mesh, 10×10 Mesh, 16×16 Mesh) Routing RPM VC/Port 4 VC Depth Packet Length (flits) Unicast Traffic Pattern Uniform Random (Bit Complement, Transpose) Multicast Packet Portion 10% (5%, 20%, 40%, 80%) Multicast Destination Number 0 -16 (uniformly distributed) Lei Wang - NOCS 2009
21
Uniform Random Traffic
50% 40% 40% Latency is improved around 50% before network saturation. Network throughput is extended 40%. Lei Wang - NOCS 2009
22
Link Utilization 33% 45% In low workload, RPM saves 33% link utilization. In high workload, RPM saves 45% link utlization. Lei Wang - NOCS 2009
23
Dynamic Power Consumption
50% 40% Lei Wang - NOCS 2009
24
Scalability Study-Network Size
Over 50% Lei Wang - NOCS 2009
25
Scalability Study-Multicast Traffic Portion
Lei Wang - NOCS 2009
26
Scalability Study-Destination Number
Lei Wang - NOCS 2009
27
Conclusion Propose a new multicast routing algorithm, Recursive Partitioning Multicast (RPM) Bandwidth-efficient and Scalable Performance Improvement Up to 50% latency reduction 33% link utilization reduction Power Savings Up to 40% total dynamic power savings 25% crossbar and link power savings Lei Wang - NOCS 2009
28
Thank you! Lei Wang - NOCS 2009
29
Backup Lei Wang - NOCS 2009
30
Hardware Implementation of Routing logic
Lei Wang - NOCS 2009
31
Bit Complement Traffic
Lei Wang - NOCS 2009
32
Transpose Traffic Lei Wang - NOCS 2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.