Improving Cache Performance of OCaml Programs Case Study - MetaPRL Alexey Nogin and Alexei Kopylov April 15, 1999.

Slides:



Advertisements
Similar presentations
Locality / Tiling María Jesús Garzarán University of Illinois at Urbana-Champaign.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
CS492B Analysis of Concurrent Programs Memory Hierarchy Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
INSTRUCTION SET ARCHITECTURES
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Performance of Cache Memory
Chris Riesbeck, Fall 2007 Dynamic Memory Allocation Today Dynamic memory allocation – mechanisms & policies Memory bugs.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
User-Level Memory Management in Linux Programming
Preliminaries Attendance sheets –I remembered! HW1 due tonight –progress report Stage 1 due on Friday –progress report.
ECE 353 Lab 1: Cache Simulation. Purpose Introduce C programming by means of a simple example Reinforce your knowledge of set associative caches.
Type-Safe Programming in C George Necula EECS Department University of California, Berkeley.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.
Chapter 3.2 : Virtual Memory
Using Generational Garbage Collection To Implement Cache- conscious Data Placement Trishul M. Chilimbi & James R. Larus מציג : ראובן ביק.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Session-01. Hibernate Framework ? Why we use Hibernate ?
Source Code Basics. Code For a computer to execute instructions, it needs to be in binary Each instruction is given a number Known as “operation code”
CSCI 224 Introduction to Java Programming. Course Objectives  Learn the Java programming language: Syntax, Idioms Patterns, Styles  Become comfortable.
File Systems and Disk Management. File system Interface between applications and the mass storage/devices Provide abstraction for the mass storage and.
File Implementation. File System Abstraction How to Organize Files on Disk Goals: –Maximize sequential performance –Easy random access to file –Easy.
HARDWARE: CPU & STORAGE How to Buy a Multimedia Computer System.
Cache Locality for Non-numerical Codes María Jesús Garzarán University of Illinois at Urbana-Champaign.
ITEC 352 Lecture 20 JVM Intro. Functions + Assembly Review Questions? Project due today Activation record –How is it used?
Files CS Spring Overview Example: FAT File System File Organization File System Organization –File Directories and File Sharing –Record Blocking.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Object Model Cache Locality Abstract In modern computer systems the major performance bottleneck is memory latency. Multi-layer cache hierarchies are an.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Cache-Conscious Structure Definition By Trishul M. Chilimbi, Bob Davidson, and James R. Larus Presented by Shelley Chen March 10, 2003.
ECE 353 Lab 1: Cache Simulation. Purpose Introduce C programming by means of a simple example Reinforce your knowledge of set associative caches.
CSCI-375 Operating Systems Lecture Note: Many slides and/or pictures in the following are adapted from: slides ©2005 Silberschatz, Galvin, and Gagne Some.
CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer
Reuse Distance as a Metric for Cache Behavior Kristof Beyls and Erik D’Hollander Ghent University PDCS - August 2001.
Chapter 4 Memory Management Virtual Memory.
CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
CS 241 Discussion Section (11/17/2011). Outline Review of MP7 MP8 Overview Simple Code Examples (Bad before the Good) Theory behind MP8.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1.3 Analysis And Synthesis OF LP Language Processor = Analysis of Source Program + Synthesis of Target Program. 1.
How To Program An Overview Or A Reframing of the Question of Programming.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
File Systems and Disk Management
Checkpoint Presentation Vas Chellappa Matt Moore
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Operation System Program 4
Disk Storage, Basic File Structures, and Buffer Management
Directory-based Protocol
CSCI206 - Computer Organization & Programming
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS703 - Advanced Operating Systems
CS703 - Advanced Operating Systems
Presentation transcript:

Improving Cache Performance of OCaml Programs Case Study - MetaPRL Alexey Nogin and Alexei Kopylov April 15, 1999

Background Information OCaml is a dialect of the ML functional language MetaPRL is the next generation of the NuPrl Proof Development System. All measurements were done on a 400Mhz Pentium-II Xeon with 512Kb L2 cache running Linux 2.2.2

Overview What we tried to do –Collect some data –See if standard techniques (developed for Java and C programs) can be applied Why it didn’t work –Ocaml programs (MetaPRL in particular) are quite different from Java and C programs in their cache behavior.

Memory Usage Statistics Most object are really small: – 60-90% of all allocated objects are 3 words (12 bytes) big We allocate them really fast Mb/sec Only 1-10% of allocated objects survive the first garbage collection. L1 DCU miss rate is % L2 cache miss rate is 18-47%

Cache-Conscious Structure Definition Trushil M. Chilimbi Bob Davidson James R. Larus

Ideas Structure size << cache block size –no action Structure size  cache block size –splitting structure into “hot” and “cold” portions Structure size >> cache block size –field reordering

Structure Splitting f1f2f3f4 becomes f3f1f2f4 hotcold Pros : –pack more hot object fields per cache line Cons: –cost of additional reference from hot to cold portion –code bloat –more objects in memory –extra indirection to access fields in the cold portion

Field Reordering Typically fields in big structures are grouped logically –exchange fields to better match program access pattern Problems : –in C may use pointer arithmetic to access field –existing file formats and protocol specifications