University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.

Slides:



Advertisements
Similar presentations
Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa Computer Architecture Department Universitat.
Advertisements

CS 345: Chapter 9 Algorithmic Universality and Its Robustness
1 Optimizing compilers Managing Cache Bercovici Sivan.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
Discovery of Locality-Improving Refactorings by Reuse Path Analysis – Kristof Beyls – HPCC pag. 1 Discovery of Locality-Improving Refactorings.
Authers : Yael Pritch Alex Rav-Acha Shmual Peleg. Presenting by Yossi Maimon.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Phase Detection Jonathan Winter Casey Smith CS /05/05.
1 Low Overhead Program Monitoring and Profiling Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania {naveen,
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
A Practical Method For Quickly Evaluating Program Optimizations Grigori Fursin, Albert Cohen, Michael O’Boyle and Olivier Temam ALCHEMY Group, INRIA Futurs.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
New Visual Characterization Graphs for Memory System Analysis and Evaluation Edson T. Midorikawa Hugo Henrique Cassettari.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
I. Pribela, M. Ivanović Neum, Content Automated assessment Testovid system Test generator Module generators Conclusion.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
CBSSS 2002: DeHon Architecture as Interface André DeHon Friday, June 21, 2002.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Advances in Modeling Neocortex and its impact on machine intelligence Jeff Hawkins Numenta Inc. VS265 Neural Computation December 2, 2010 Documentation.
Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008.
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
Cache-Conscious Structure Definition By Trishul M. Chilimbi, Bob Davidson, and James R. Larus Presented by Shelley Chen March 10, 2003.
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Computer-based identification and tracking of Antarctic icebergs in SAR images Department of Geography, University of Sheffield, 2004 Computer-based identification.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
Visual Relations, Part 2 Advanced Visual Analysis.
Best detection scheme achieves 100% hit detection with
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
MSP’05 1 Gated Memory Control for Memory Monitoring, Leak Detection and Garbage Collection Chen Ding, Chengliang Zhang Xipeng Shen, Mitsunori Ogihara University.
Presenter: Ibrahim A. Zedan
Online Subpath Profiling
Energy-Efficient Address Translation
CSCI1600: Embedded and Real Time Software
Edge computing (1) Content Distribution Networks
Ann Gordon-Ross and Frank Vahid*
"Developing an Efficient Sparse Matrix Framework Targeting SSI Applications" Diego Rivera and David Kaeli The Center for Subsurface Sensing and Imaging.
Phase Capture and Prediction with Applications
Hardware Counter Driven On-the-Fly Request Signatures
Lecture 9: Caching and Demand-Paged Virtual Memory
rePLay: A Hardware Framework for Dynamic Optimization
CSCI1600: Embedded and Real Time Software
Phase based adaptive Branch predictor: Seeing the forest for the trees
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class Discussion prepared by Bumyong Choi

University of California San Diego Memory Adaptation Programs exhibit dynamic locality Several studies have been done, but require manual analysis to find program phases Locality-based phase prediction can solve the problem

University of California San Diego Previous Analysis Interval Based Unclear what the best interval length is Code-based The program structure may not reveal its locality pattern In-lined function, intertwined functions calls

University of California San Diego The new technique Locality analysis No fixed-size windows Phase marking All instructions in the program binary

University of California San Diego Locality Phase A period of a program execution that has stable or slow changing data locality. We are interested in phases that are repeatedly executed with similar locality for optimization purpose. Phase Prediction: knowing a phase and its locality whenever the execution enters the phase.

University of California San Diego Examples of Recurring Locality Phases The aging of airplane model Structural/mechanical/molecular Other scientific and commercial simulations GREAT DEMAND FOR COMPUTING RESOURCES!! Exhibit dynamic but stable phases Good candidates for adaptation, if we can predict locality phases

University of California San Diego Program Phase Source: Phase Tracking and Prediction (Sherwood et al)

University of California San Diego Downside

University of California San Diego Motivation for the use of locality analysis Recent studies found that reuse-distance histograms change in predictable patterns in many programs Reuse distance reveals patterns in program locality

University of California San Diego Reuse Distance The number of distinctive data elements accessed between two consecutive uses of the same element

University of California San Diego Reuse Distance Example a b c a a c b rd=2

University of California San Diego Reuse Distance Example a b c a a c b rd=0

University of California San Diego Reuse Distance Example a b c a a c b rd=1

University of California San Diego Reuse Distance Example a b c a a c b rd=2

University of California San Diego Reuse Distance Example a b c a a c b rd=0

University of California San Diego The reuse-distance trace of Tomcatv

University of California San Diego What the example confirms.. Major shifts in program locality are marked by radical changes Locality phases have different length The size changes greatly with program inputs A phase is a unit of repeating behavior rather than a unit of uniform behavior

University of California San Diego New Locality Prediction Method 1. Analyzes the data locality in profiling runs 1. Variable-distance sampling 2. Wavelet filtering 3. Optimal Phase Partitioning 2. Analyzes the instruction trace and identifies the phase boundaries in the code 3. Uses grammar compression to identify phase hierarchies and then inserts program markers through binary rewriting.

University of California San Diego Off-line Analysis Optimal Phase Partitioning Variable-distance sampling Filtering(Wavelet)

University of California San Diego Variable-distance sampling 1. A small number of representative data 2. Only long-distance reuses 3. Uses dynamic feedback to find suitable thresholds

University of California San Diego Wavelet Filtering Used as a filter to expose abrupt changes in the reuse pattern – removes temporal redundancy Common Technique in signal an image processing Shows the change of frequency over time. Further Reading on Wavelet: I.Daubechies. Ten Lectures on Wavelets. Capital City Press, Montpelier, Vermont, 1992

University of California San Diego Wavelet Filtering The wavelet filtering removes reuses of the same data within a phase

University of California San Diego Optimal Phase Partitioning Removes the spatial redundancy. Conditions for a good phase partition A phase should include accesses to as many data samples as possible. A phase should not include multiple accesses of the same data sample.

University of California San Diego Optimal Phase Partitioning Filtered trace -> a directed acyclic graph Each edge has a weight. More details : in the paper.

University of California San Diego New Prediction Method 1. Analyzes the data locality in profiling runs 1. Variable-distance sampling 2. Wavelet filtering 3. Optimal Phase Partitioning 2. Analyzes the instruction trace and identifies the phase boundaries in the code 3. Uses grammar compression to identify phase hierarchies and then inserts program markers through binary rewriting.

University of California San Diego Phase Marker Selection This step finds the basic blocks in the code that uniquely mark detected phases. Examines all instruction blocks Possible that the high level program structure may be lost due to compiler optimizations

University of California San Diego Phase Marker Selection Phase detection finds the number of phases but cannot locate the precise time of phase transitions. Hundreds of memory access vs a few memory references in basic block What about gradual transition?

University of California San Diego Phase Marker Selection Solution? Using the frequency of the phases instead of the time of their transition Marker Block: a basic block that is always executed at the beginning of phase based on the frequency found If blank region (removed blocks) is larger than threshold, it is considered as a phase execution.

University of California San Diego New Prediction Method 1. Analyzes the data locality in profiling runs 1. Variable-distance sampling 2. Wavelet filtering 3. Optimal Phase Partitioning 2. Analyzes the instruction trace and identifies the phase boundaries in the code 3. Uses grammar compression to identify phase hierarchies and then inserts program markers through binary rewriting.

University of California San Diego Hierarchical Construction SEQUITUR Compresses a string of symbols into a Context Free Grammar By constructing the phase hierarchy, we find phases of the largest granularity.

University of California San Diego Phase Marker Insertion ATOM- binary rewriting tool The basic phases (the leaves of the phase hierarchy) have unique markers in the program, so their prediction is trivial. Based on the phase hierarchy, we make prediction. Finite automaton to recognize the current phase in the phase hierarchy.

University of California San Diego Evaluation 1. Measure the granularity and accuracy of phase prediction 2. Cache resizing 3. Memory remapping 4. Test the result against manual phase marking

University of California San Diego Phase Prediction

University of California San Diego Phase Prediction

University of California San Diego Adaptive Cache-resizing

University of California San Diego Memory-remapping Assume: the support of Impluse controller Key requirement: identify when remapping is profitable

University of California San Diego Manual vs Phase

University of California San Diego Conclusions General method for predicting hierarchical memory phases in programs with input- dependent but consistent phase-behavior Predicts the length and locality with near perfect accuracy It reduces cache size by 40% without increasing the number of cache misses It improves program performance by 35% when used for memory remappings

University of California San Diego Conclusion (cont.) Locality phase detection should benefit modern adaptation techniques for increasing performance reducing energy other improvements

University of California San Diego Questions?