Pay-to-use strong atomicity on conventional hardware Martín Abadi, Tim Harris, Mojtaba Mehrara Microsoft Research.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

A Block-structured Heap Simplifies Parallel GC Simon Marlow (Microsoft Research) Roshan James (U. Indiana) Tim Harris (Microsoft Research) Simon Peyton.
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Making sense of transactional memory Tim Harris (MSR Cambridge) Based on joint work with colleagues at MSR Cambridge, MSR Mountain View, MSR Redmond, the.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Lecture 23 Basic Blocks Topics Code Generation Readings: 9 April 17, 2006 CSCE 531 Compiler Construction.
Accessing parameters from the stack and calling functions.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
Run time vs. Compile time
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
HARDBOUND: ARCHITECURAL SUPPORT FOR SPATIAL SAFETY OF THE C PROGRAMMING LANGUAGE Kyle Yan Yu Xing 2014/10/15.
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.
Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.
Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦
Discovering and Understanding Performance Bottlenecks in Transactional Applications Ferad Zyulkyarov 1,2, Srdjan Stipic 1,2, Tim Harris 3, Osman S. Unsal.
© 2004, D. J. Foreman 1 Memory Management. © 2004, D. J. Foreman 2 Building a Module -1  Compiler ■ generates references for function addresses may be.
1 Specialization Tools and Techniques for Systematic Optimization of System Software McNamee, Walpole, Pu, Cowan, Krasic, Goel, Wagle, Consel, Muller,
Object Oriented Programming with C++/ Session 6 / 1 of 44 Multiple Inheritance and Polymorphism Session 6.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 22: Unconventional.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Lecture 3 Classes, Structs, Enums Passing by reference and value Arrays.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Addressing Modes Chapter 6 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
CNIT 127: Exploit Development Ch 8: Windows Overflows Part 2.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Compiler Construction Code Generation Activation Records
Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
1 Hardware Overview Dave Eckhardt
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Addressing Modes Dr. Hadi Hassan.  Two Basic Questions  Where are the operands?  How memory addresses are computed?  Intel IA-32 supports 3 fundamental.
Chapter 8 String Operations. 8.1 Using String Instructions.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
CS161 – Design and Architecture of Computer
Processes and threads.
C function call conventions and the stack
CS161 – Design and Architecture of Computer
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Mechanism: Address Translation
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
Memory Management © 2004, D. J. Foreman.
Optimization Code Optimization ©SoftMoore Consulting.
Introduction to Compilers Tim Teitelbaum
idempotent (ī-dəm-pō-tənt) adj
Enforcing Isolation and Ordering in STM Systems
Practical Session 4.
Binding Times Binding is an association between two things Examples:
Multi-modules programming
Lecture 23: Transactional Memory
CSE 451 Section 1/27/2000.
Mechanism: Address Translation
Tim Harris (MSR Cambridge)
SPL – PS2 C++ Memory Handling.
Presentation transcript:

Pay-to-use strong atomicity on conventional hardware Martín Abadi, Tim Harris, Mojtaba Mehrara Microsoft Research

Our approach Strong semantics atomic, retry,..... What, ideally, should these constructs do? Programming discipline(s) What does it mean for a program to use the constructs correctly? Low-level semantics & actual implementations Transactions, optimistic concurrency, program transformations, weak memory models,...

Programming disciplines All programs Violation-free programs Obeying dynamic separation Obeying static separation More implementation flexibility More programs correctly synchronized Which programs are correctly synchronized?

Strong atomicity Direct accesses work like single-access transactions We would like: –Implementation flexibility; ongoing innovation in STM/hybrid techniques, optimizations,... Invisible / visible readers In-place / deferred updates Eager / lazy conflict detection –No overhead on direct accesses –Robust performance, not dependent on success of static analyses

Strong atomicity: implementation Physical address space Virtual address space Tx-heapDirect-heap Direct memory accesses Memory accesses from atomic blocks

Writes from atomic blocks Physical address space Virtual address space Tx-heapDirect-heap Direct memory accesses Memory accesses from atomic blocks 1. Atomic block attempts to write to a field of an object

Writes from atomic blocks Physical address space Virtual address space Tx-heapDirect-heap Direct memory accesses Memory accesses from atomic blocks 2. Revoke direct access to the page holding the direct view of the object

Writes from atomic blocks Physical address space Virtual address space Tx-heapDirect-heap Direct memory accesses Memory accesses from atomic blocks 3. Use underlying STM write primitives

Writes from atomic blocks Physical address space Virtual address space Tx-heapDirect-heap Direct memory accesses Memory accesses from atomic blocks 4. Restore direct access once the underlying transaction has finished and an access violation (AV) occurs

Avoiding Access Violations 1.Safe accesses in runtime system code –Virtual method tables and array length –Memory allocation structures (e.g. free list) –STM implementation structures –GC implementation Forward all these to TX- heap at compile time

Avoiding Access Violations 2.Safe accesses in normal code –Normal writes to locations that haven’t been read or written in a TX –Normal reads from locations that haven’t been written in a TX 3.Safe accesses in TX code –TX writes to locations that haven’t been read or written outside TXs –TX reads from locations that haven’t been written outside TXs Forward to TX-heap Avoid page-level tracking

Sample Code private int ComputeUniqueSegments (int nthreads) { int numUniqueSegment = 0; for (int i = 0; i < nthreads; i++) numUniqueSegment += this.uniqueSegments[i].Count; return numUniqueSegment; } Genome_Sequencer_ComputeUniqueSegments:: loop: mov eax,dword ptr [edi+0x20] // Load uniqueSegments array reference cmp ebx,dword ptr [eax+0x4] // Check reference with array bounds jae outOfRange mov ecx,dword ptr [eax+ebx*4+0x08] // load array element mov eax,dword ptr [ecx] // load Count function pointer call dword ptr [eax+0x88] // call Count (get) function add ebp,eax// add it to numUniqueSegments add ebx,1 cmp ebx,esi jl loop Access immutable runtime- system data cmp ebx,dword ptr [eax+0x ] // Check reference with array bounds mov eax,dword ptr [ecx+0x ] // load Count function pointer call dword ptr [eax+0x ] // call Count (get) function mov ecx,dword ptr [eax+ebx*4+0x ] // load array element mov eax,dword ptr [edi+0x ] // Load uniqueSegments array reference Safe normal access

Exploiting Safe Accesses Implemented by extending Steensgard’s points- to analysis Only safe accesses from normal code were beneficial Little benefit from identifying safe accesses from inside atomic blocks. #page-table changes: GenomeDelaunayLabyrinthVacation Before31 K K After31 K K Ratio99%90%36%92 %

Patching access violations Patch sites of AVs Our heuristic: –Patch on first AV –Also change page protection as normal Future work: –Remove patches if they become unnecessary –Make multiple patches to bound worst-case perf

Results - Vacation

Results - Delaunay

Results - Genome

Results - Labyrinth

Scaling SA – patch AV + analysis WA

Conclusion Weak atomicity is an obstacle in providing clear semantics for TM models We use conventional memory protection hardware to provide strong atomicity This comes at a low performance cost… high runtime complexity cost Performance hit can be lowered by compile time analysis