Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.

Slides:



Advertisements
Similar presentations
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Advertisements

Interconnection Networks: Flow Control and Microarchitecture.
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
A Novel 3D Layer-Multiplexed On-Chip Network
International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
University of Utah1 Interconnect-Aware Coherence Protocols for Chip Multiprocessors Liqun Cheng Naveen Muralimanohar Karthik Ramani Rajeev Balasubramonian.
Aérgia: Exploiting Packet Latency Slack in On-Chip Networks
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Link Division Multiplexing (LDM) for NoC Links IEEE 2006 LDM Link Division Multiplexing Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion –
Design of a High-Throughput Distributed Shared-Buffer NoC Router
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor Sankaralingam et al. Presented by Cynthia Sturton CS 258 3/3/08.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
SYNTHESIS OF NETWORKS ON CHIPS FOR 3D SYSTEMS ON CHIPS Srinivasan Murali, Ciprian Seiculescu, Luca Benini, Giovanni De Micheli Presented by Puqing Wu.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
McRouter: Multicast within a Router for High Performance NoCs
29-Aug-154/598N: Computer Networks Switching and Forwarding Outline –Store-and-Forward Switches.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.
José Vicente Escamilla José Flich Pedro Javier García 1.
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
1 Application Aware Prioritization Mechanisms for On-Chip Networks Reetuparna Das Onur Mutlu † Thomas Moscibroda ‡ Chita Das § Reetuparna Das § Onur Mutlu.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI.
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Yu Cai Ken Mai Onur Mutlu
Analyzing the Impact of Data Prefetching on Chip MultiProcessors Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami Kyushu University, Japan.
Lecture 16: Router Design
IMPROVING THE PREFETCHING PERFORMANCE THROUGH CODE REGION PROFILING Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Data Communication Networks Lec 13 and 14. Network Core- Packet Switching.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Network On Chip Cache Coherency Midterm presentation Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar.
Power-aware NOC Reuse on the Testing of Core-based Systems* CSCE 932 Class Presentation by Xinwang Zhang April 26, 2007 * Erika Cota, et al., International.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
Lecture 16: On-Chip Networks
Rahul Boyapati. , Jiayi Huang
Using Packet Information for Efficient Communication in NoCs
Impact of Interconnection Network resources on CMP performance
Data Communication Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
A Case for Interconnect-Aware Architectures
Lecture 25: Interconnection Networks
Presentation transcript:

Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones

Déjà Vu Switching for Multiplane NoCs NOCS’12 Power efficiency has become a primary concern in the design of CMPs. The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power. Network messages can be classified into critical and non-critical It may be possible to send non-critical messages on a slower plane without hurting performance.

Déjà Vu Switching for Multiplane NoCs NOCS’12 Baseline Single Plane NoC Control Plane Data Plane Carries control and coherence messages: data requests, invalidates, acknowledgments, … Operates at baseline’s voltage & frequency Carries data messages Operates at a lower voltage & frequency to save power

Déjà Vu Switching for Multiplane NoCs NOCS’12 Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary

Déjà Vu Switching for Multiplane NoCs NOCS’ Cache line having 8 data words Data Message Header Critical Word Subsequent miss for another word in the line  Delayed cache hit Critical Word Non-Critical Words

Déjà Vu Switching for Multiplane NoCs NOCS’12 If delayed cache hits are overly delayed, performance can suffer. Percentage of L1 misses that are delayed cache hits

Déjà Vu Switching for Multiplane NoCs NOCS’12 Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary

Déjà Vu Switching for Multiplane NoCs NOCS’12 How slow can we operate the data plane to maximize energy savings while not impacting performance?

Déjà Vu Switching for Multiplane NoCs NOCS’12 Split NoC physically into 2 planes: control + data On data plane: Use circuit-switching to speed-up communication. Reduce voltage & frequency. Send a control message to establish the circuit once a cache hit is detected. Do not block circuit establishment message: Déjà Vu Switching Analyze acceptable slow down of the data plane to minimize energy while maintaining performance.

Déjà Vu Switching for Multiplane NoCs NOCS’12 Request PacketReservation Packet Data Packet Control Plane Data Plane 1) Upon a cache miss, a data request is sent

Déjà Vu Switching for Multiplane NoCs NOCS’12 Request PacketReservation Packet Data Packet Control Plane Data Plane 2) Upon a data hit in the next cache level, the circuit reservation packet is sent

Déjà Vu Switching for Multiplane NoCs NOCS’12 Request PacketReservation Packet Data Packet Control Plane Data Plane 3) Finally, the data packet is sent to the requester on the reserved circuit

Déjà Vu Switching for Multiplane NoCs NOCS’ Control Plane Data Plane Reserved Circuits Reservation Packets Data Packets East-North West-NorthWest-East Router

Déjà Vu Switching for Multiplane NoCs NOCS’12 Reservation Queue of West Input Port Reservation Queue of East Output Port Reserving the West-East Connection: EW Head of the Queues Port Names: North West South East Local

Déjà Vu Switching for Multiplane NoCs NOCS’12 S S Reservation Queues of the Input Ports Reservation Queues of the Output Ports E N N N N S W North West South East Local E W N Head of the Queues North West South East Local Port Names: North West South East Local

Déjà Vu Switching for Multiplane NoCs NOCS’12 Reservation Queues of the Input Ports Reservation Queues of the Output Ports N S North West South East Local E W N Head of the Queues North West South East Local N Port Names: North West South East Local

Déjà Vu Switching for Multiplane NoCs NOCS’12 Reservation Queues of the Input Ports Reservation Queues of the Output Ports N S North West South East Local E W N Port Names: North West South East Local Head of the Queues North West South East Local N

Déjà Vu Switching for Multiplane NoCs NOCS’12 Each input and output port must independently track the reserved circuits it is part of. Any two reservation packets that share part of their paths, must traverse all the shared links in the same order Data packets must be injected onto the data plane in the same order their reservation packets are injected onto the control plane.

Déjà Vu Switching for Multiplane NoCs NOCS’12 Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary

Déjà Vu Switching for Multiplane NoCs NOCS’ Time Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3

Déjà Vu Switching for Multiplane NoCs NOCS’ Time Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3

Déjà Vu Switching for Multiplane NoCs NOCS’12

Motivation & Related work Importance of data messages to performance The Optimization Problem Proposed Solution Déjà Vu Switching Analysis of the acceptable data plane speed Evaluation Summary

Déjà Vu Switching for Multiplane NoCs NOCS’12 We use the functional simulator, Simics, to simulate cache coherent CMPs of 16 and 64 cores. We use Orion2 to get power numbers for the interconnect routers We evaluate with Synthetic traces: allows varying the network load Execution driven simulation of parallel benchmarks from the SPLASH-2 and PARSEC suits Interconnect Topology: Mesh

Déjà Vu Switching for Multiplane NoCs NOCS’12 Baseline NoC: Single plane, 16 byte links, packet switched with 3 cycles router pipeline, clocked at 4 GHz Evaluated NoC: Control plane: 6 byte links, packet switched, 4GHz Data plane: 10 byte links, circuit switched. Control and Coherence Packets: 1-flit Data Packets: Baseline NoC: 5 flits Data Plane: 7 flits

Déjà Vu Switching for Multiplane NoCs NOCS’12 Normalized NoC energy and completion time of synthetic traces on a 64-core CMP *Each request receives a reply data packet

Déjà Vu Switching for Multiplane NoCs NOCS’12 Normalized execution time on a 16-core CMP

Déjà Vu Switching for Multiplane NoCs NOCS’12 Normalized NoC energy on a 16-core CMP

Déjà Vu Switching for Multiplane NoCs NOCS’12 Normalized NoC energy and execution time on a 64-core CMP

Déjà Vu Switching for Multiplane NoCs NOCS’12 Performance relative to CMP with a single plane NoC on a 16-core CMP

Déjà Vu Switching for Multiplane NoCs NOCS’12 L. Cheng et al. (ISCA'06) and A. Flores et al. (IEEE Trans. Computers'10): Heterogeneous NoC using wires of different latency and power characteristics to improve performance and reduce NoC energy. Proposal requires wide links (75 bytes), but performance degrades with narrow links. Our work differs in: Tying latency of messages to performance Using Déjà Vu Switching to Compensate for slower data plane

Déjà Vu Switching for Multiplane NoCs NOCS’12 Problem: Saving power in the NoC by reducing the data plane’s power consumption without impacting performance. Delayed cache hits are important to performance. Operating data plane in circuit-switched mode allows it to operate at reduced frequency. Déjà Vu Switching allows reservation to proceed when resources are not currently available. The constraints governing the speed of the data plane can be estimated analytically.

Déjà Vu Switching for Multiplane NoCs NOCS’12