Making Object-Based STM Practical in Unmanaged Environments Torvald Riegel and Diogo Becker de Brum ( Dresden University of Technology, Germany)

Slides:

Advertisements

Similar presentations

Part IV: Memory Management

Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

Programming Languages and Paradigms

Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.

Lecture 10: Heap Management CS 540 GMU Spring 2009.

Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

Lightweight Abstraction for Mathematical Computation in Java 1 Pavel Bourdykine and Stephen M. Watt Department of Computer Science Western University London.

Breno de MedeirosFlorida State University Fall 2005 Buffer overflow and stack smashing attacks Principles of application software security.

CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.

1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.

Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.

Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.

Honors Compilers Addressing of Local Variables Mar 19 th, 2002.

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Previous finals up on the web page use them as practice problems look at them early.

Run time vs. Compile time

1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.

HARDBOUND: ARCHITECURAL SUPPORT FOR SPATIAL SAFETY OF THE C PROGRAMMING LANGUAGE Kyle Yan Yu Xing 2014/10/15.

Building An Interpreter After having done all of the analysis, it’s possible to run the program directly rather than compile it … and it may be worth it.

Automatic Pool Allocation for Disjoint Data Structures Presented by: Chris Lattner Joint work with: Vikram Adve ACM.

1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.

Language Evaluation Criteria

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.

Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.

P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.

Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

SPL/2010 StackVsHeap. SPL/2010 Objectives ● Memory management ● central shared resource in multiprocessing RTE ● memory models that are used in Java and.

Programming Languages and Design Lecture 7 Subroutines and Control Abstraction Instructor: Li Ma Department of Computer Science Texas Southern University,

A Time Predictable Instruction Cache for a Java Processor Martin Schoeberl.

Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University.

CPSC 388 – Compiler Design and Construction Runtime Environments.

Storage Allocation for Embedded Processors By Jan Sjodin & Carl von Platen Present by Xie Lei ( PLS Lab)

Simulated Pointers Limitations Of Java Pointers May be used for internal data structures only. Data structure backup requires serialization and deserialization.

Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

WG5: Applications & Performance Evaluation Pascal Felber

Mark Marron IMDEA-Software (Madrid, Spain) 1.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.

RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,

Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.

1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.

Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.

WTM’12, Bern, April 10, 2012 In-Place Metainfo Support in DeuceSTM Ricardo J. Dias, Tiago M. Vale and João M. Lourenço CITI / Universidade Nova de Lisboa.

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.

Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve.

Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.

1 Becoming More Effective with C++ … Day Two Stanley B. Lippman

How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.

Copyright 2014 – Noah Mendelsohn Code Tuning Noah Mendelsohn Tufts University Web:

Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

Simulated Pointers Limitations Of C++ Pointers May be used for internal data structures only. Data structure backup requires serialization and deserialization.

Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places  within a main.

Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.

COMP 1402 Winter 2008/Summer 2008 Tutorial #10 Enumerations, Unions, Linked Lists.

Tuning Threaded Code with Intel® Parallel Amplifier.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.

A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.

Object Lifetime and Pointers

Jonathan Walpole Computer Science Portland State University

Compositional Pointer and Escape Analysis for Java Programs

Optimization Code Optimization ©SoftMoore Consulting.

Lecture 25: Introduction to Memory Management

TensorFlow: A System for Large-Scale Machine Learning

Dynamic Performance Tuning of Word-Based Software Transactional Memory

Lecture 25: Introduction to Memory Management

Presentation transcript:

Making Object-Based STM Practical in Unmanaged Environments Torvald Riegel and Diogo Becker de Brum ( Dresden University of Technology, Germany)

2 Background: Word-based or Object-based? Location to metadata mapping Hash into array:  variable, can be tuned  But on fast path In-place metadata  Fixed mapping Hash into array also possible Cache locality Potentially miss in data and in metadata Data and metadata potentially share cache line vs. perhaps more misses for nontxnal code False conflicts Depends on hash but not unlikely Probably unlikely if objects are small Space overhead Seems to need large metadata arrays Fixed per object (might get large) Choice depends on the workload and on tuning quality Object-based has advantages that we should not ignore

3 Object-based might have advantages: Can we actually choose? TM must be practical  Manual load/store transactification is not sufficient  Annotations are error-prone and less composable  Programmer should not need to choose Unmanaged environments matter  A lot of server code in C / C++  Desktop apps too (e.g., KDE)! Need compiler support!  Managed Environments (Java, C#, …): environment provides object-abstraction for memory  Unmanaged Environments (C, C++, …): no ready-to-use object abstraction  “Making Object-Based STM Practical in Unmanaged Environments”

4 Contributions Show how to use pointer analysis to detect which memory chunks / pointers can be safely used for object-based accesses  No programmer-supplied annotations necessary  Programming language is not extended nor restricted  Use object-based accesses where this is safe, fall back to word- based otherwise  Integrated into Tanger (STM Compiler for C/C++/..., [Transact 07])  Enables TinySTM [PPoPP 08] to provide object-based accesses (in-place and external metadata) Performance results for word-based vs. object-based

5 Analysis  We use Data Structure Analysis (DSA [1]):  Pointer analysis for LLVM compiler framework  Creates a points-to graph with Data Structure (DS) nodes  Context-sensitive:  Data structures distinguished based on call graphs  Field-sensitive:  distinguish between DS fields  Unification-based:  Pointers target a single node in the points-to graph  Information about pointers from different places get merged  If incompatible information, node is collapsed (= “nothing known”)  Can safely analyze incomplete programs:  Calls to external / not analyzed functions have an effect only on the data that escapes into / from these functions (get marked “External”)  Analyzing more code increases analysis precision [1] Chris Lattner, PhD thesis, 2005

6 Analysis (2) Integration into Tanger compilation process: 1.Compile and link program parts into LLVM intermediate representation module 2.Analyze module using DSA  Local intra-function analysis: per-function DS graph  Merge DS graphs bottom-up in callgraph (put callees’ information into callers)  Merge DS graphs top-down in callgraph (vice versa) 3.Transactify module  Use DSA information to decide between object-based / word- based  Requirement: If memory chunk (DS node) is object-based, then it must be safe for object-based everywhere in the program  DSA can give us this guarantee 4.Link in STM library and generate native code

7 Example: STAMP Vacation Type, if known Collapsed: accessed in different ways (e.g., as void* and as int) struct has 4 fields, 2 are pointers A Red-Black Tree instance HMR: onHeap, Modified, Read Partial, simplified DS graph for main()

8 How many object-based accesses?  Memory chunks that we consider for object-based accesses:  All uses must have been analyzed (e.g., doesn’t escape to external function): required for safety  Not an array (in-place metadata would be more tricky, but could be possible in some cases)  Not a primitive type (probably not useful, but is possible) BenchmarkObject-based loads/stores (static) Object-based loads/stores at runtime Red-Black Tree Only object-based accesses Linked List STAMP Vacation90% / 95%97% / 84% STAMP KMeans No object-based accesses STAMP Genome40% / 48%12% / 98%

9 STM Implementations + Transformations Word-based (TinySTM):  Time-based reads, blocking writes  Metadata: array of versioned locks: lock = (address >> shifts) % locks:  More details: see PPoPP 08 paper  %val = load %addr transformed to %val = call STMLoad(%addr) Object-based (where possible, otherwise fall back to word-based) External metadata (TinySTM-ObjE):  Metadata:  Reuse word-based lock array  Use base address of object to select lock from lock array  %val = call STMLoad(%addr, %baseaddr) In-place metadata (TinySTM-Obj):  Metadata:  Single per-object versioned lock  Located after the end of the object  Compiler must enlarge all memory allocations accessed in an object-based way and initialize metadata  %val = call STMLoad(%addr, %baseaddr, %objSize)

10 Microbenchmarks: Tuning External metadata (TinySTM, TinySTM-ObjE): different “shapes” In-place metadata (TinySTM-Obj): flat “shape” (lock array not used) usually better performance

11 Lock arrays: Tuning difficulties (Cuts with “enough locks”, performance relative to object-based, in-place metadata) -40% -30% Bad tuning is costly Best tuning configuration can easily change (e.g., tree gets larger) Tuning trade-offs: one data structure benefits, another looses Object-based, in-place often better or equal and not affected! RB LL RB

12 STAMP Vacation Application has word- based and object- based accesses Performance degrades even for TinySTM-Obj if word-based accesses suffer from false conflicts TinySTM-ObjE is just slightly better than TinySTM

13 STAMP Vacation 30% throughput increase for object-based with in-place metadata (TinySTM-Obj) Possible reasons: App has different data structures: No tuning trade-offs Large working set: Better cache locality

14 Conclusion  STM compilers for unmanaged environments can target object-based STMs without requiring anything special from the programmer  Pointer analysis very useful  Object-based accesses can have better performance  In-place metadata less sensitive to tuning / false conflicts  Potentially improved cache locality 

15 Backup Slides

16 Microbenchmarks: Maximum Throughput Use lock/shift setting with best throughput for particular STM (= perfect tuning) (not considering huge lock arrays (>256MB space overhead)) Linked List: in-place metadata is beneficial RBTree: no conclusive result one of the object-based STMs is usually better BUT: Lock array must be tuned to get these best throughput results

17 RBTree scaling

18 RBTree