NTPT: On the End-to-End Traffic Prediction in the On-Chip Networks Yoshi Shih-Chieh Huang 1, June 16, 2010 1 Department of Computer Science, National Tsing.

Slides:



Advertisements
Similar presentations
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Advertisements

Bayesian Piggyback Control for Improving Real-Time Communication Quality Wei-Cheng Xiao 1 and Kuan-Ta Chen Institute of Information Science, Academia Sinica.
Project Proposal Presented by Michael Kazecki. Outline Background –Algorithms Goals Ideas Proposal –Introduction –Motivation –Implementation.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June
1 Prediction-based Strategies for Energy Saving in Object Tracking Sensor Networks Yingqi Xu, Wang-Chien Lee Proceedings of the 2004 IEEE International.
Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Not All Microseconds are Equal: Fine-Grained Per-Flow Measurements with Reference Latency Interpolation Myungjin Lee †, Nick Duffield‡, Ramana Rao Kompella†
McRouter: Multicast within a Router for High Performance NoCs
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Yao Wang, Yu Wang, Jiang Xu, Huazhong Yang EE. Dept, TNList, Tsinghua University, Beijing, China Computing System Lab, Dept. of ECE Hong Kong University.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Wei Gao1 and Qinghua Li2 1The University of Tennessee, Knoxville
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Building Expressive, Area-Efficient Coherence Directories Michael C. Huang Guofan Jiang Zhejiang University University of Rochester IBM 1 Lei Fang, Peng.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Network-on-Chip Energy-Efficient Design Techniques for Interconnects Suhail Basit.
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Department of Computer Science and Engineering The Pennsylvania State University Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das Addressing.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Veronica Eyo Sharvari Joshi. System on chip Overview Transition from Ad hoc System On Chip design to Platform based design Partitioning the communication.
Department of Computer Science A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08 Reporter:
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Quadrisection-Based Task Mapping on Many-Core Processors for Energy-Efficient On-Chip Communication Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Yu Cai Ken Mai Onur Mutlu
Parallel And Distributed System Lab (PADS) National Tsing Hua University, Hsinchu, Taiwan Attackboard: A Novel Dependency-Aware Traffic Generator for Exploring.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Dept. of Electronics Engineering & Institute of Electronics National Chiao Tung University Hsinchu, Taiwan ISPD’16 Generating Routing-Driven Power Distribution.
Performed by: Erez Davidi / Aviad Zrihen Instructor: Yaniv Ben-Yitzhak המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
NoCVision: A Network-on-Chip Dynamic Visualization Solution
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
SketchVisor: Robust Network Measurement for Software Packet Processing
Yu-Guang Chen1,2, Wan-Yu Wen1, Tao Wang2,
Providing QoS through Active Domain Management
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Lu Tang , Qun Huang, Patrick P. C. Lee
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Address-Stride Assisted Approximate Load Value Prediction in GPUs
Presentation transcript:

NTPT: On the End-to-End Traffic Prediction in the On-Chip Networks Yoshi Shih-Chieh Huang 1, June 16, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 2 SoC Technology Center, Industrial Technology Research Institute, Hsinchu, Taiwan 1 Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 2 SoC Technology Center, Industrial Technology Research Institute, Hsinchu, Taiwan Kaven Chun-Kai Chou 1 Chung-Ta King 1 Shau-Yin Tseng 2 Kaven Chun-Kai Chou 1 Chung-Ta King 1 Shau-Yin Tseng 2

Outline Motivation An observation from application’s traffic trace Proposed Design Table-driven solution Evaluations On Tilera’s TILE64 Conclusion and Future Works 2

Motivation Consider the LU decomposition in SPLASH-2 running on a many-core machine Observe the communication trace between any pair of the nodes End-to-end traffic / pair-wise communication

Key Observation Consider the end-to-end traffic from node 7 to node 4: Time (1K cycles) Payload (8 bytes) Node 7 to node 4 Regular end-to-end traffic pattern! 4

Key Questions Can the end-to-end traffic pattern be recognized at runtime? If so, can the end-to-end traffic pattern be predicted at runtime? What if we can? Controlling the injection rate for end-to-end flow control Performing dynamic routing based on workload Remapping tasks Controlling the power mode of switches and links … 5

Key Idea We believe that application-driven prediction is the best way to predict the traffic patterns in the on-chip network In contrast to switch layer prediction Prediction-based flow control for network-on-chip traffic, DAC Domain: chip-multiprocessors with on-chip networks Proposal: A hardware design for end-to-end traffic recognition and prediction

Overview of the Design Processing Element 1 On-Chip Network NIC Predictor 1 Processing Element k NIC Predictor k … 7

Traffic Pattern Recognition Monitor outgoing traffic to each destination node time payload Sampling period time payload

Traffic Prediction Last value predictor Use the traffic of current sampling period to predict the traffic of the next period Easy, small, but low accuracy time payload miss hit miss 9 2/4 = 50% accuracy

Traffic Prediction (cont’d) Intuition: more accurate prediction by tracking longer traffic history Pattern-oriented predictor Traffic pattern register Indexing into a table for prediction of traffic in the next sampling period 0333 indexPrediction 0nil …… nil …… 5 (History length=4) 10

Last Value Predictor Versus Pattern-Oriented Predictor Last Value PredictorPattern-Oriented Predictor AccuracyXO Space Overhead OX 11

Network Traffic Prediction Table (NTPT) NTPT-based predictor = last value predictor (LVP) + simplified pattern-oriented predictor (SPOP) + selector The appropriate predictor is chosen by the selector Simplified POP = many space reduction techniques Similar to branch prediction in computer architecture LVP SPOP Selector 12 POP

Evaluation Setup Emulate the NTPT-based predictor on the Tilera TILE64 LU decomposition in SPLASH-2 ported to TILE64 16 nodes for executing application 16 nodes for emulating NTPT-based predictors The following two figures show that NTPT-based predictor has a good tradeoff between The accuracy of prediction The space overhead 13

Prediction Error Rate Sampling Period (Million Cycles) Last Value Predictor Pattern-Oriented Predictor 14

Prediction Table Size Required memory size (in terms of entry) History length 15

Estimated Area Overhead 0.017% in TILE % in AsAP 16

Conclusion and Future Works A hardware design for recognizing and predicting end-to- end traffic on chip-multiprocessors using on-chip interconnection networks Traffic patterns recognition at runtime Traffic behavior prediction at runtime NTPT has good prediction accuracy with low hardware overhead Ongoing works Injection rate control for congestion avoidance in NoC Dynamic voltage/frequency scaling for routers and links Previous works use switch-layer perspective Buffer/link utilization in router 17

Thank you! 18

BACKUP SLIDES

More Techniques for Table Reduction Merging the tables corresponding to different destinations Using the least recently used (LRU) replacement strategy Quantizing the values in the tables … LVP SPOP Selector 20

Design of Selector LVP POP 21

Selected Related Works Switch-layer prediction Prediction-based flow control for network-on-chip traffic, DAC 2006 Compile-time prediction Reducing NoC energy consumption through compiler- directed channel voltage scaling, PLDI