Do We Need Wide Flits in Networks-On-Chip? Junghee Lee, Chrysostomos Nicopoulos, Sung Joo Park, Madhavan Swaminathan and Jongman Kim Presented by Junghee.

Slides:

Advertisements

Similar presentations

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.

Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.

Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE.

Aérgia: Exploiting Packet Latency Slack in On-Chip Networks

On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic.

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.

CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.

Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.

1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.

L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.

Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim

Design of a High-Throughput Distributed Shared-Buffer NoC Router

Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.

Dragonfly Topology and Routing

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.

McRouter: Multicast within a Router for High Performance NoCs

Gigabit Routing on a Software-exposed Tiled-Microprocessor

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

José Vicente Escamilla José Flich Pedro Javier García 1.

Cross-Domain Privacy-Preserving Cooperative Firewall Optimization.

Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.

Networks-on-Chips (NoCs) Basics

Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.

1 Application Aware Prioritization Mechanisms for On-Chip Networks Reetuparna Das Onur Mutlu † Thomas Moscibroda ‡ Chita Das § Reetuparna Das § Onur Mutlu.

SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.

Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.

A Programmable Processing Array Architecture Supporting Dynamic Task Scheduling and Module-Level Prefetching Junghee Lee *, Hyung Gyu Lee *, Soonhoi Ha.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.

TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.

Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.

Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.

CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.

Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.

Processor Architecture

Microprocessors and Microsystems Volume 35, Issue 2, March 2011, Pages 230–245 Special issue on Network-on-Chip Architectures and Design Methodologies.

OASIS NoC Revisited Adam Esch (m ). Outline Pre-Research OASIS Overview Research Contributions Remarks OASIS Suggestions Future Work.

Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.

Dynamic Traffic Distribution among Hierarchy Levels in Hierarchical Networks-on-Chip Ran Manevich, Israel Cidon, and Avinoam Kolodny Group Research QNoC.

Assaf Shacham, Keren Bergman, Luca P. Carloni Presented for HPCAN Session by: Millad Ghane NOCS’07.

Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments Junghee Lee, Chrysostomos Nicopoulos, Yongjae Lee, Hyung Gyu Lee.

Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.

1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.

Virtual-Channel Flow Control William J. Dally

Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.

Disk Drive Architecture Exploration VisualSim Mirabilis Design.

FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers

Architecture and Design of AlphaServer GS320

ESE532: System-on-a-Chip Architecture

Interaction of NoC design and Coherence Protocol in 3D-stacked CMPs

Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio

Exploring Concentration and Channel Slicing in On-chip Network Router

Rahul Boyapati. , Jiayi Huang

APPROX-NoC: A Data Approximation Framework for Network-On-Chip Architectures Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, Eun Jung Kim.

Reducing Memory Reference Energy with Opportunistic Virtual Caching

Using Packet Information for Efficient Communication in NoCs

KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures

On-time Network On-chip

Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti

Multiprocessors and Multi-computers

Presentation transcript:

Do We Need Wide Flits in Networks-On-Chip? Junghee Lee, Chrysostomos Nicopoulos, Sung Joo Park, Madhavan Swaminathan and Jongman Kim Presented by Junghee Lee

2 Introduction Increasing number of cores  Communication-centric  Packet-based Networks-on-Chip Unit –Packet: a meaningful unit of the upper-layer protocol –Flit: the smallest unit of flow control maintained by NoC If a packet is larger than a flit, a packet is split into multiple flits The flit size usually matches with the physical channel width

3 Motivation What is the optimal flit size in Networks-on-Chip for general purpose computing? 64 or 128 Research papers 256 or 512 Research papers 144 Intel Single- Chip Cloud 160 Tilera 256 Intel Sandy Bridge

4 Multifaceted Factors Flit Size Global Wires Cost of Router WorkloadLatencyThroughput A first attempt in drawing balanced conclusion

5 Assumed NoC Router Architecture d v c p

6 Packet and Flit HeaderPayload

7 Simulation Environment ParameterDefault Value SimulatorSimics + GEMS (Garnet) BenchmarkPARSEC Number of processors64 Operating systemLinux Fedora L1 cache size32 KB L1 cache number of ways4 L1 cache line size64 B L2 cache (shared)16 MB, 16-way, 128-B line MSHR size32 for I- and 32 for D- cache Main memory2 GB SDRAM Cache coherence protocolMOESI directory Topology2D mesh

8 Default NoC Parameters ParameterDefault Value Number of virtual channels3 Buffer depth8 flits per virtual channel Number of pipeline stages4 Number of ports5 Header overhead16 bits

9 Key Questions Can we afford wide flits as technology scales? Is the cost of wide-flit routers justifiable? How much do wide flits contribute to overall performance? Do memory-intensive workloads need wide flits? Do we need wider flits as the number of processing elements increases?

10 #1) Global Wires Can we afford wide flits as technology scales? Technology scaling does not allow for a direct widening of the flits because the power portion of the global wires increases as technology scales ItemUnitValue Technologynm Chip size*mm Transistors*MTRs Global wiring pitch*nm Power index*W/GHz cm Total chip power*W Normalized power portion * International Technology Roadmap for Semiconductors (ITRS) 2009 and 2011

11 #2) Cost of Router Is the cost of wide-flit routers justifiable? Cost of buffers  Flit size  Buffer depth  Number of virtual channels Cost of switch  (Flit size) 2  (Number of ports) 2 Flit size Cost Switch Buffer Flit size  2  cost of router  2.97 Flit size  4  cost of router  Flit size  2  cost of router  2.97 Flit size  4  cost of router  If the performance improvement does not compensate for the increase in the cost, widening of the flit size is hard to justify

12 #3) Latency How much do wide flits contribute to overall performance? The network traffic usually consists of packets of different sizes –l s : The size of shortest packet –l l : The size of longest packet Flit size Latencyl s +hl l +h Suggested rule of thumb: Flit size = shortest packet size + header overhead Suggested rule of thumb: Flit size = shortest packet size + header overhead

13 #4) Workload Characteristics ApplicationCache misses / Kcycle / node Injected packets / Kcycle / node Blackscholes Bodytrack Ferret Fluidanimate Freqmine Streamcluster Swaptions Vips X Do memory-intensive workloads need wide flits? The injection rate of real applications is far less than the typical saturation point of NoC  Self-throttling effect [34] The injection rate of real applications is far less than the typical saturation point of NoC  Self-throttling effect [34] Up to 64 cores, we can keep the rule of thumb because of the low injection rate

14 #5) Throughput Widening the flit is not a cost-effective way because of fragmentation If widening the physical channel is the only option for increasing the throughput, we suggest using physically separated networks Do we need wider flits as the number of processing elements increases? Flit size Latency One 80-bit network One 160-bit network Two 80-bit networks

15 Conclusions Can we afford wide flits as technology scales? Is the cost of wide-flit routers justifiable? How much do wide flits contribute to overall performance? Do memory-intensive workloads need wide flits? Do we need wider flits as the number of processing elements increases? No, unless the power budget for NoC increases No, the cost increases sharply with the flit size Until the flit size reaches the shortest packet size No, because of self-throttling effect No, because of fragmentation

16 Final Conclusion Suggested rule of thumb: Flit size = shortest packet size + header overhead This paper provides a comprehensive discussion on all key aspects pertaining to the NoC’s flit size This exploration could serve as a quick reference for the designers/architects of general-purpose multi-core microprocessors who need to decide on an appropriate flit size for their design.

17 Thank you!

18 Questions? Contact info Junghee Lee Electrical and Computer Engineering Georgia Institute of Technology