An Efficient Profile-Analysis Framework for Data-Layout Optimizations By Shai Rubin, Rastislav Bodik, Trishul Chilimbi.

Slides:



Advertisements
Similar presentations
GAMPS COMPRESSING MULTI SENSOR DATA BY GROUPING & AMPLITUDE SCALING
Advertisements

Locality / Tiling María Jesús Garzarán University of Illinois at Urbana-Champaign.
Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
Lindsey Bleimes Charlie Garrod Adam Meyerson
Optimizing Expression Selection for Lookup Table Program Transformation Chris Wilcox, Michelle Mills Strout, James M. Bieman Computer Science Department.
Pooja ROY, Manmohan MANOHARAN, Weng Fai WONG National University of Singapore ESWEEK (CASES) October 2014 EnVM : Virtual Memory Design for New Memory Architectures.
1 Optimizing compilers Managing Cache Bercovici Sivan.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
1 Framework for Profile-Analysis Data-Layout Optimizations Shai RubinRas BodikTrishul Chilimbi Microsoft ResearchUniversity of Wisconsin.
Prefetch-Aware Cache Management for High Performance Caching
Using one level of Cache:
 Data copy forms part of an auto-tuning compiler framework.  Auto-tuning compiler, while using the library, can empirically evaluate the different implementations.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Semi-Automatic Composition of Data Layout Transformations for Loop Vectorization Shixiong Xu, David Gregg University of Dublin, Trinity College
Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.
EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Topics Today: – Caches!! Theory Design Examples 2.
Using Abstraction to Speed Up Search Robert Holte University of Ottawa.
1 A K-Means Based Bayesian Classifier Inside a DBMS Using SQL & UDFs Ph.D Showcase, Dept. of Computer Science Sasi Kumar Pitchaimalai Ph.D Candidate Database.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Object Model Cache Locality Abstract In modern computer systems the major performance bottleneck is memory latency. Multi-layer cache hierarchies are an.
SoftSig: Software-Exposed Hardware Signatures for Code Analysis and Optimizations UIUC – ASPLOS 2008 by Evangelos Vlachos.
Cache-Conscious Structure Definition By Trishul M. Chilimbi, Bob Davidson, and James R. Larus Presented by Shelley Chen March 10, 2003.
Main memory DB PDT Ján GENČI. 2 Obsah Motivation DRDBMS MMDBMS DRDBMS versus MMDBMS Commit processing Support in commercial systems.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Informed Search Methods. Informed Search  Uninformed searches  easy  but very inefficient in most cases of huge search tree  Informed searches  uses.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
© TRESETarget Industry TRESE Group Department of Computer Science University of Twente P.O. Box AE Enschede, The Netherlands
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Power efficiency as an analogue to memory management Sara Alspaugh and Arka Bhattacharya.
Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.
LODManager A framework for rendering multiresolution models in real-time applications J. Gumbau O. Ripollés M. Chover.
1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.
Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Machine Learning in Compiler Optimization By Namita Dave.
University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.
CSE 153 Design of Operating Systems Winter 2015 Lecture 12: Page Replacement.
Using Interaction Cost (icost) for Microarchitectural Bottleneck Analysis Brian Fields 1 Rastislav Bodik 1 Mark Hill 2 Chris Newburn 3 1 UC-Berkeley, 2.
Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
A Graph Theoretic Approach to Cache-Conscious Placement of Data for Direct Mapped Caches Mirza Beg and Peter van Beek University of Waterloo June
CS161 – Design and Architecture of Computer
CS161 – Design and Architecture of Computer
Breakout Session 3 Alex, Mirco, Vojtech, Juraj, Christoph
A Simulation Framework For Storage System Proposal
Checkpoint Presentation Vas Chellappa Matt Moore
Reactive NUMA A Design for Unifying S-COMA and CC-NUMA
Run-time organization
FPGA: Real needs and limits
Prefetch-Aware Cache Management for High Performance Caching
CprE 458/558: Real-Time Systems
CSE 120 Principles of Operating
Part V Memory System Design
"Developing an Efficient Sparse Matrix Framework Targeting SSI Applications" Diego Rivera and David Kaeli The Center for Subsurface Sensing and Imaging.
Unit 7: Cognition AP Psychology
Morgan Kaufmann Publishers
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
CSE 589 Applied Algorithms Spring 1999
Automatic Tuning of Two-Level Caches to Embedded Applications
Solution methods for NP-hard Discrete Optimization Problems
Presentation transcript:

An Efficient Profile-Analysis Framework for Data-Layout Optimizations By Shai Rubin, Rastislav Bodik, Trishul Chilimbi

Data-Layout Optimization To improve memory hierarchy performance by exploiting spatial locality –Group the data that is accessed together –Minimizing cache conflicts Optimizations –Field rearrangement in an object –Object rearrangement in the heap –Object inlining

Motivation A solution to find the optimal layout is NP- hard and also poorly approximable All these techniques are actually heuristics and no guarantee of effectiveness and robustness A naïve approach is tedious

Solution A generalized framework to determine the best layout for a given program –Unifying existing profile-based data layout optimizations A very efficient way of evaluating a candidate layout with simulation

Optimization Evaluation Memoization –Dynamic programming Whole Program Misses –The representation omits references that can never suffer any memory fault Ex. abba On demand Cache simulation

Results

Discussion How many applications really use this kind of optimizations? Some of the optimizations are already automated (ex. Object inlining)