Parallel And Distributed System Lab (PADS) National Tsing Hua University, Hsinchu, Taiwan Attackboard: A Novel Dependency-Aware Traffic Generator for Exploring.

Slides:



Advertisements
Similar presentations
Monitoring very high speed links Gianluca Iannaccone Sprint ATL joint work with: Christophe Diot – Sprint ATL Ian Graham – University of Waikato Nick McKeown.
Advertisements

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 Locating Internet Bottlenecks: Algorithms, Measurement, and Implications Ningning Hu (CMU) Li Erran Li (Bell Lab) Zhuoqing Morley Mao (U. Mich) Peter.
Feng-Xiang Huang 2015/5/4 International Symposium Quality Electronic Design (ISQED), th M. H Neishaburi, Zeljko Zilic, McGill University, Quebec.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
1 Improving the Performance of Distributed Applications Using Active Networks Mohamed M. Hefeeda 4/28/1999.
NTPT: On the End-to-End Traffic Prediction in the On-Chip Networks Yoshi Shih-Chieh Huang 1, June 16, Department of Computer Science, National Tsing.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, Y. Richard Yang Laboratory of Networked Systems Yale University.
1 Multi-Core Debug Platform for NoC-Based Systems Shan Tang and Qiang Xu EDA&Testing Laboratory.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Presenter: Jyun-Yan Li Multiprocessor System-on-Chip Profiling Architecture: Design and Implementation Po-Hui Chen, Chung-Ta King, Yuan-Ying Chang, Shau-Yin.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Introduction to Simulation Shiv Kalyanaraman Rensselaer Polytechnic Institute
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
National Tsing Hua University Trace Processing for Exploiting Architecture Designs Prof. Chung-Ta King Department of Computer Science National Tsing Hua.
Applying Dynamic Analysis to Test Corner Cases First Penka Vassileva Markova Madanlal Musuvathi.
CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.
Manycore Network Interfaces for In-Memory Rack-Scale Computing Alexandros Daglis, Stanko Novakovic, Edouard Bugnion, Babak Falsafi, Boris Grot.
McRouter: Multicast within a Router for High Performance NoCs
Yao Wang, Yu Wang, Jiang Xu, Huazhong Yang EE. Dept, TNList, Tsinghua University, Beijing, China Computing System Lab, Dept. of ECE Hong Kong University.
Computer Architecture Dataflow Machines. Data Flow Conventional programming models are control driven Instruction sequence is precisely specified Sequence.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Study on Power Saving for Cellular Digital Packet Data over a Random Error/Loss Channel Huei-Wen Ferng, Ph.D. Assistant Professor Department of Computer.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
1 Countering DoS Through Filtering Omar Bashir Communications Enabling Technologies
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
ImanFaraji Time-based Snoop Filtering in Chip Multiprocessors Amirkabir University of Technology Tehran, Iran University of Victoria Victoria, Canada Amirali.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
The Design and Implementation of Firewall, NAT, Traffic Shaper on FreeBSD.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Architectural Impact of Stateful Networking Applications Javier Verdú, Jorge García Mario Nemirovsky, Mateo Valero The 1st Symposium on Architectures for.
Workpackage 3 New security algorithm design ICS-FORTH Ipswich 19 th December 2007.
Improving NoC-based Testing Through Compression Schemes Érika Cota 1 Julien Dalmasso 2 Marie-Lise Flottes 2 Bruno Rouzeyre 2 WNOC
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Yu Cai Ken Mai Onur Mutlu
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
IMP: Indirect Memory Prefetcher
Efficient Gigabit Ethernet Switch Models for Large-Scale Simulation Dong (Kevin) Jin David Nicol Matthew Caesar University of Illinois.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations 5/3/2011 Michael K. Papamichael, James C.
NoCVision: A Network-on-Chip Dynamic Visualization Solution
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
SDN Network Updates Minimum updates within a single switch
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
The Variable-Increment Counting Bloom Filter
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Introduction to Simulation
Computer Simulation of Networks
Department of Computer Science University of California, Santa Barbara
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

Parallel And Distributed System Lab (PADS) National Tsing Hua University, Hsinchu, Taiwan Attackboard: A Novel Dependency-Aware Traffic Generator for Exploring NoC Design Space Yoshi Shih-Chieh Huang, June 5, 2012 Yu-Chi Chang Tsung-Chan Tsai Yuan-Ying Chang Chung-Ta King

Problem Domain We want to use trace-driven simulation to explore network-on- chip (NoC) performance for message-passing many-core architecture Traces that record only send/receive events:  Simple and fast  Lacks information about interaction between NoC and PEs   Unable to reflect the effects of changes in design space   Storage space overhead (in terms of gigabytes)  Recent dependency-aware traces: [1][2]  Packet dependencies are embedded in traces  Packet injections can be adjusted based on dependencies when facing different NoC configurations  Trace logs are very complicated and require much more space  Traces with packet dependencies improve accuracy but require more storage space! 2

The BIG Problem Is: Size! How to reduce size of dependency-aware traces while maintaining its accuracy?  Lossless compression, e.g., Gzip?  not enough Key insight:  Each PE has its own BIG trace for NoC operations  Each BIG trace is actually a log of the execution of the corresponding State Machine 10 KB codes may result in 1GB traces! S1S1 SnSn S2S2 … … S1S1 SnSn S2S2 … … S1S1 SnSn S2S2 … … … ……. …… …. …... … …. … ……. …. ……. … 3

Rebuild the State Machines What if we can rebuild the interacting state machines from the traces…?  We don’t have to record the huge traces but only need to generate them at runtime! How to do?  Intuitive idea: find repetitive patterns in traces and fold them  Difficulties: May not be matched exactly May be discrete and fragmented in traces Need to find patterns across the traces resulted by different PEs 4

Rebuild the State Machines (cont’d) State transitions in state machines are often triggered by arrivals of packets  Leverage packet dep. information in traces  Forget about time sequencing in traces How long to wait before start detecting receiving pattern?  interval-based  Interval I for capturing receiving patterns Captured results are attackboards  Interval I’ for reproducing traffic For driving the attackboards 5 PE j I

Data Structure of an Attackboard Each PE has an attackboard An injection Y is allowed if the necessary conditions are satisfied  Necessary conditions = predecessors = must receive packets from X Requires pkt from 1? Requires pkt from 2? Requires pkt from 3? … Requires pkt from X? Inject to … Yes NoDest. Y necessary conditions Inject if satisfied … 6

Attackboard Traffic Generator 1.Rebuild the state machines as attackboards  Use packet arrival patterns in traces to rebuild the state machine  Space complexity: O(execution time)  O(# of patterns) 2.Compact the states  Merge duplicated patterns 3.Drive the state machines, i.e., attackboard  Inject traffic based on the rebuilt machine 7

An Illustration (Viewpoint of PE 4) Parallel Program Execution Execution flow of parallel program PE 1PE 2 PE 3 PE 4 recv 1 recv 2 send 1 Interval I Packets DependenciesInjection Info. 1100(3, flit counts) Attackboard entries of PE 4 Compress the entries with the same packet dependencies Merge duplicated entries recv 5 recv 6 recv 7 send 3 Interval I recv 3 recv 4 send 2 Interval I Packets DependenciesInjection Info. 1110(2, flit counts) Packets DependenciesInjection Info. 1100(3, flit counts) Packets DependenciesInjection Info. 1100(3, flit counts) recv 5 recv 6 recv 7 send 3 Interval I How long to wait before start detecting receiving pattern? Interval I decides the pattern! 8

Driving the Attackboards! Traffic generation with Attackboard Router receives stats are cleared every I’ interval What if no exact match in I’? Match the entry which has the highest similarity  Similar to bloom filter (deny for sure, allow with high confidence) Attackboard Traffic Generator (ATG) 1100 traffic Router ATG NoC 0000 current receive source Inject! Attackboard of PE4 router receive status of PE Match! Design details are left in the poster. You are welcome to take a look! Design details are left in the poster. You are welcome to take a look! 9

Space Overhead and Accuracy 1 S.Bell et al.[ISSCC 2008] Storage Space Overhead Storage space can be greatly reduced! 20% storage space can be reduced in computation- intensive benchmark! IMB-BcastParallel Object Detection I and I’ should be properly selected to give accurate results In this case, I does not have much impact compared with POD. IMB-Bcast POD (2 frames) Simulation Platform Processor elementTilera TILE64 1 Processor frequency700 Mhz Simulated topology4×4 mesh network Routing algorithmDimension-order Bandwidth1 flit/cycle per port BenchmarkIntel MPI Benchmarks Parallel Object Detection 10

Conclusion & Future Works Attackboard not only compresses the NoC traces, but generates them at runtime Key ideas:  Programs  raw trace logs  use arrival patterns to rebuild state machine (as Attackboards)  rebuild trace logs Benefits  Strikes a good tradeoff between accuracy and space overhead Limitations  Currently only for message-passing programs  Only suitable for injections with strong dependencies Future works  Take the computation time before an injection into consideration  Make the tricky parameters (I and I’) disappeared 11

Thank You! To learn more, please come to my poster!

Selected References 1.Netrace: dependency-driven trace-based network-on-chip simulation.  Joel Hestness, Boris Grot, and Stephen W. Keckler  In Proceedings of the Third International Workshop on Network on Chip Architectures (NoCArc '10) 2.Inferring packet dependencies to improve trace based simulation of on-chip networks.  Christopher Nitta, Matthew Farrens, Kevin Macdonald, and Venkatesh Akella  In Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip (NOCS '11)

Backup: More Frames in POD Trace logsO(execution time) AttackboardO(# of patterns)