Experiments on the Effectiveness of an Automatic Insertion of Memory Reuses into ML-like Programs Oukseh Lee (Hanyang University) Kwangkeun Yi (Seoul National.

Slides:



Advertisements
Similar presentations
1 Parametric Heap Usage Analysis for Functional Programs Leena Unnikrishnan Scott D. Stoller.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Object-Orientation Meets Big Data Language Techniques towards Highly- Efficient Data-Intensive Computing Harry Xu UC Irvine.
Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Peter van Emde Boas: Games and Complexity Guangzhou 2009 Complexity, Speed-up and Compression Games and Complexity Peter van Emde Boas Guangzhou 2009 ©
Compiler construction in4020 – lecture 12 Koen Langendoen Delft University of Technology The Netherlands.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
Read-Copy Update P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell, D. Saram, M. Soni Ottawa Linux Symposium 2001 Presented by Bogdan Simion.
Memory Management Professor Yihjia Tsai Tamkang University.
Memory Management Memory Areas and their use Memory Manager Tasks:
CS 312 Spring 2004 Lecture 18 Environment Model. Substitution Model Represents computation as doing substitutions for bound variables at reduction of.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
Chapter 10 Storage Management Implementation details beyond programmer’s control Storage/CPU time trade-off Binding times to storage.
CS 312 Spring 2002 Lecture 16 The Environment Model.
1 Quasi-Static Scheduling of Embedded Software Using Free-Choice Petri Nets Marco Sgroi, Alberto Sangiovanni-Vincentelli Luciano Lavagno University of.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Functional Design and Programming Lecture 4: Sorting.
Compile-Time Deallocation of Individual Objects Sigmund Cherem and Radu Rugina International Symposium on Memory Management June, 2006.
COMS W1004 Introduction to Computer Science May 29, 2009.
Memory management. Instruction execution cycle Fetch instruction from main memory Decode instruction Fetch operands (if needed0 Execute instruction Store.
1 CSC103: Introduction to Computer and Programming Lecture No 26.
Applying Data Copy To Improve Memory Performance of General Array Computations Qing Yi University of Texas at San Antonio.
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
Effectiveness of an Automatic Insertion of Safe Memory Reuses into ML-like Programs Oukseh Lee and Kwangkeun Yi {oukseh; Seoul National.
Database Systems Slide 1 Database Systems Lecture 5 Overview of Oracle Database Architecture - Concept Manual : Chapters 1,8 Lecturer : Dr Bela Stantic.
2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
1 ECE408/CS483 Applied Parallel Programming Lecture 10: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al University.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 9.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Automating and Optimizing Data Transfers for Many-core Coprocessors Student: Bin Ren, Advisor: Gagan Agrawal, NEC Intern Mentor: Nishkam Ravi, Yi Yang.
PARALLEL RECURSIVE STATE COMPRESSION FOR FREE ALFONS LAARMAN JOINT WORK WITH: MICHAEL WEBER JACO VAN DE POL 12/7/2011 SPIN 2011.
Topic 3: C Basics CSE 30: Computer Organization and Systems Programming Winter 2011 Prof. Ryan Kastner Dept. of Computer Science and Engineering University.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve.
Runtime Organization (Chapter 6) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Kyung-Goo Doh Hanyang University - ERICAComputer Science & Engineering Functional Programming / Imperative Programming CSE215 Fundamentals of Program Design.
1 Total Pasta: Unfailing Pointer Programs Neil Mitchell, ndm AT cs.york.ac.uk Department of Computer Science, University of York.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Shape & Alias Analyses Jaehwang Kim and Jaeho Shin Programming Research Laboratory Seoul National University
1 CS/EE 217 GPU Architecture and Parallel Programming Lecture 9: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu,
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
8/2/00SEM107- © Kamin and ReddyClass 5 - Lists - 1 Class 5 - Lists r The list data type r Recursive methods on lists.
Complexity Relief Techniques for Model Checking METU, Aug SOFTWARE VERIFICATION WORKSHOP Hüsnü Yenigün Sabanci University Informatics Institute,
Storage Management Different-sized Items. Light blue indicates allocated items Heap Memory with Different-sized Items.
Automatic Memory Management Without Run-time Overhead Brian Brooks.
Lecture 9 : Universal Types
Memory Management Memory Areas and their use Memory Manager Tasks:
Memory management.
Age-Based Garbage Collection
Upper Bound for Defragmenting Buddy Heaps
CS 153: Concepts of Compiler Design November 28 Class Meeting
Memory Management Memory Areas and their use Memory Manager Tasks:
Martin Rinard Laboratory for Computer Science
Optimizing Malloc and Free
Reducing Training Time in a One-shot Machine Learning-based Compiler
Closure Representations in Higher-Order Programming Languages
Memory Management Memory Areas and their use Memory Manager Tasks:
Oukseh Lee, Hongseok Yang, and Kwangkeun Yi {cookcu; hyang;
Introduction to Optimization
Run-time environments
CMPE 152: Compiler Design May 2 Class Meeting
Science is fun. Science is fun. Science is fun. Science is fun. Science is fun. Science is fun. Science is fun. Science is fun. Science is fun. Science.
Module IV Memory Organization.
Presentation transcript:

Experiments on the Effectiveness of an Automatic Insertion of Memory Reuses into ML-like Programs Oukseh Lee (Hanyang University) Kwangkeun Yi (Seoul National University)

Question Our SAS 2003 paper* presented  an algorithm to replace allocations by memory reuse (or destructive update); and  some promising yet preliminary experiment numbers. When and how much is it cost-effective?  Space & time-wise.  Before launching it inside our nML compiler. * Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. Inserting Safe Memory Reuse Commands into ML-like Programs. In Proceedings of the Annual International Static Analysis Symposium, volume 2694 of Lecture Notes in Computer Science, pp , San Diego, California, June 2003.

Brief Overview of Our Algorithm

Example: insert nil l insert 5 l fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z result fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in free l; h::z

34 Example: insert nil l insert 5 l fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in free l; h::z 5 21 result fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when b; h::z

Analysis fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z X1X1 X2X2 X3X3 X4X4 Z L.tl L X1X1 X 2 [ L X 4 [ Z L.hd L.tl X1[X2[L[X4[ZX1[X2[L[X4[ZL.hd [ L.tl Z µ X 3 [ L.tl X [ LL [ µ L.hd resultusage X =X 1 [ X 2 [ X 3 [ X 4 =L.hd [ L.tl

Transformation [1/3] fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z When b=true, the transformed insert function deallocates the cons cells of the input list l excluding those of the result list.

Transformation [2/3] must not be freed whenareaoverlap?necessary condition the input list lb =falseLyes b =true the result listX 4 [ Znonone When is it safe to free the tail cells t not in the result z ( L.tl\Z )? fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in h::z b

Transformation [3/3] must not freedwhenareaoverlap? necessary condition the input list lb =falseLyes b =true the cons cells freed during insert b i t b =trueL.tl \ Znonone the result listX 4 [ Znonone When is it safe to free the head cell ( L.hd )? fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when ; h::z b

Experiments

Analysis & Transformation Cost slope=1.46 1,500~29,000 lines/sec program size (logarithmic scale) analysis & transformation cost (logarithmic scale)

Reuse Ratio 3.4%~93.9% of allocations are avoided. low reuse ratio due to much sharing.

Memory Peak Reduction 0.0%~71.9% peak reduction much reuse = much peak reduction memory reuse ratio memory peak reduction 84.4% 10.6% 2.6% 25.6% 41.9% 8.1%

Difference in Live Cells sieve 84.3% 56.5% merge 50.0% 49.4% qsort 93.9% 71.9% msort 89.3% 55.0%

Difference in Live Cells queens 4.2% 0.0% kb 3.4% 2.3% nucleic 16.9% 13.8% k-eval 31.5% 9.6%

Difference in Live Cells life 10.6% 25.6% mirage 84.4% 2.6% professor 41.9% 8.1%

GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction in Objective Caml system

GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction High reuse ratio & big GC portion: runtime speedup 50.0% 93.9% 89.3% 16.9% 50.0% 93.9% 89.3% 16.9% 76.0% 63.2% 59.9% 52.1% 78.2% 57.2% 55.3% 46.3% 24.0% 39.1% 21.6% 7.2% 30.0% 28.2% 20.7% 8.8% in Objective Caml system

GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction High reuse ratio & big GC portion: runtime speedup Low reuse ratio: flags overhead 4.2% 3.4% 4.2% 3.4% -8.4% -9.1% -6.8% -3.6% -5.8% -6.7% -4.7% -7.3% in Objective Caml system

GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction High reuse ratio & big GC portion: runtime speedup Low reuse ratio: flags overhead Small GC portion: almost no effect 7.2% 5.6% 1.4% 1.9% 4.3% 4.2% 1.1% 1.3% -5.5% -2.6% -3.8% -2.9% 4.8% 0.1% -0.9% 0.6% in Objective Caml system

GC-time & Runtime Changes much reuse = much GC-time reduction much reuse & big GC-time portion = much runtime reduction memory reuse ratio GC time reduction GC portion x memory reuse ratio runtime reduction

Conclusion program transformation result program performance not much sharing + big GC-time portion runtime speedup high reuse ratio memory peak reduction & GC time speedup