An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.

Slides:



Advertisements
Similar presentations
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
Advertisements

Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 ILP (Recap). 2 Basic Block (BB) ILP is quite small –BB: a straight-line code sequence with no branches in except to the entry and no branches out except.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Evaluating an Adaptive Framework For Energy Management in Processor- In-Memory Chips Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
Robert Barnes Utah State University Department of Electrical and Computer Engineering Thesis Defense, November 13 th 2008.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
4 July 2005 overview Traineeship: Mapping of data structures in multiprocessor systems Nick de Koning
Phase Detection Jonathan Winter Casey Smith CS /05/05.
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
Multiscalar processors
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Accurate Method for Fast Design of Diagnostic Oligonucleotide Probe Sets for DNA Microarrays Nazif Cihan Tas CMSC 838 Presentation.
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
Variational Path Profiling Erez Perelman*, Trishul Chilimbi †, Brad Calder* * University of Califonia, San Diego †Microsoft Research, Redmond.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
The Design and Analysis of Algorithms
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Design Space Exploration
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Data Structures and Algorithms Lecture 1 Instructor: Quratulain Date: 1 st Sep, 2009.
Embedded System Lab 김해천 Thread and Memory Placement on NUMA Systems: Asymmetry Matters.
Feng-Xiang Huang Test Symposium(ETS), th IEEE European Ko, Ho Fai; Nicolici, Nicola; Department of Electrical and Computer Engineering,
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Whole Program Paths James R. Larus. Outline 1. Find acyclic path fragments 2. Convert into whole-program path 3. Determine hot subpaths.
Yen-Ting Yu Iris Hui-Ru Jiang Yumin Zhang Charles Chiang DRC-Based Hotspot Detection Considering Edge Tolerance and Incomplete Specification ICCAD’14.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Static Identification of Delinquent Loads V.M. Panait A. Sasturkar W.-F. Fong.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July.
Memory-Aware Compilation Philip Sweany 10/20/2011.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
Best detection scheme achieves 100% hit detection with
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Computer Architecture Principles Dr. Mike Frank
Online Subpath Profiling
CSCI1600: Embedded and Real Time Software
John-Paul Fryckman CSE 231: Paper Presentation 23 May 2002
Tosiron Adegbija and Ann Gordon-Ross+
In Search of Near-Optimal Optimization Phase Orderings
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
CSCI1600: Embedded and Real Time Software
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin

Introduction Many optimizations/program analysis rely on control flow information The more accurate the better Trace analysis to identify hot sub-paths Similar problem as finding patterns in strings Or search for DNA sequences Adaptive cache reconfiguration Identification of the configuration change Reducing energy consumption

Whole Program Path Analysis Embedded systems No mechanism for run-time monitoring Offline analysis Interprocedural analysis Phase detection Consider a signature for each basic block Cache misses, ILP, static cycles (VLIW), … What happens between hot sub-paths How are the hot sub-paths interleaved

Example

Main steps Reduce the potential size of the trace Keep only representative basic block Instrument the code to get the BB signature Run the program Compute the hot sub-paths Find the repeating patterns Exploit the sub-paths information Insertion of cache configuration instruction

Reducing the size of the trace Keep only a subset of the basic blocks Use strong regions [Ball93] Don’t keep iterations of simple loops Keep only control condition basic blocks (lossy)

Suffix Arrays Karp Miller and Rosenberg algorithm Complexity is low Log(N) iteration, N the length of the trace Memory space used is linear to the size of the trace Can be used to Find the longest repeated sub-path Find the n-length repeated sub-path of BBWS Determine the frequency of each sub-path Identify the position of each instance of a sub-path

Suffix Arrays

KMR Algorithm

Characterizing the Hot Sub-Paths Three metrics Local Coverage: how long does a sub-path last Global Coverage: how representative a sub-path is Reuse Distance: dispersion in the trace

Experiments Offline analysis ranges from a few minutes (40MB) to hours (GB trace)

Experiments (cont.) Trace Compression

Experiments (cont.) Coverage : Adaptive Cache Reconfiguration Basic blocks signature is a set of data misses

Conclusion Suffix arrays are an efficient tool to deal with traces Accurate description of the sub-paths sequences But the CFG has to be simplified Has been used to dynamically adapt the cache configuration for reducing energy consumption

Future Works Convert hot sub path in speculative threads System on chip Identification of computation to migrate on co-processors More trace compression technique Abstraction of the control flow