McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha.

Slides:



Advertisements
Similar presentations
Concurrency Control III. General Overview Relational model - SQL Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Dynamic Memory Management
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
1 Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Distributed Process Management
OS Spring 2004 Concurrency: Principles of Deadlock Operating Systems Spring 2004.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CPSC 4650 Operating Systems Chapter 6 Deadlock and Starvation
OS Fall’02 Concurrency: Principles of Deadlock Operating Systems Fall 2002.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
1 Concurrency: Deadlock and Starvation Chapter 6.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Distributed process management: Distributed deadlock
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
A Transaction-Friendly Dynamic Memory Manager for Embedded Multicore Systems Maurice Herlihy Joint with Thomas Carle, Dimitra Papagiannopoulou Iris Bahar,
1 Concurrency: Deadlock and Starvation Chapter 6.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
Copyright 2007 Sun Microsystems, Inc SNZI: Scalable Non-Zero Indicator Yossi Lev (Brown University & Sun Microsystems Laboratories) Joint work with: Faith.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
Optimistic Methods for Concurrency Control By: H.T. Kung and John Robinson Presented by: Frederick Ramirez.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Read-Copy-Update Synchronization in the Linux Kernel 1 David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis.
© Oxford University Press All rights reserved. Data Structures Using C, 2e Reema Thareja.
Lecture 20: Consistency Models, TM
Section 10: Memory Allocation Topics
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Part 2: Software-Based Approaches
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
Faster Data Structures in Transactional Memory using Three Paths
Chapter 5: Process Synchronization
Lecture 6: Transactions
Strategies for automatic memory management
Lecture 22: Consistency Models, TM
Grades.
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Software Transactional Memory Should Not be Obstruction-Free
Kernel Synchronization II
Chapter 6: Synchronization Tools
Presentation transcript:

McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha

2 Goals of McRT-Malloc Scalable Performance linear to # of processors then flat as you add more SW threads Preemption safety Implies a lock free approach to all structures Allows other scalable McRT algorithms to use malloc and remain scalable Transactional memory awareness Avoid memory blowup within transaction Avoid freeing of bits needed to validate other transactions Enable a object level conflict detection in STM Best of class

3 Block Data Structure Heap divided into aligned 16K blocks 18 significant bits Block Owned by a single thread during allocation Blocks segregated into bins according to objects size Meta data header –Free Lists –Bump Pointer –Next/Previous Block –Object size and usage info No per object Headers Free blocks on non-blocking LIFO queue –46 bit for update timestamp 0xABCD0000 0xABCD0040 0xABCD4000 Meta data Header Object Pointer

4 Object Allocation and Freeing Thread owns block they allocate in Trick - Free uses two linked free lists per block Private free list for block owner avoids atomic instructions Public list for other threads use atomic instruction and non- blocking algorithm Trick - Fresh block uses frontier pointer to avoid free list initialization Then allocates from private free list Privatize entire public list as needed with atomic xchg

5 McRT-Malloc: A Transaction Aware Memory Allocator Three problems 1. Speculative memory allocation and de-allocation inside transactions can cause space blowup 2. Transactional conflict detection and frees 3. Object-based conflict detection in C/C++ Garbage collection also solves these issues

6 Allocation with STM Speculatively allocate or free inside transaction Valid at commit - rolled back on abort Balanced – both malloc and free within transaction Memory is transaction-local must be reused to prevent memory blowup transaction { for (i=0; i<big_number; i++) { foo = malloc(size); … free(foo); }

7 Solution Use sequence numbers to track allocation relationships Sequence counter per-thread (thread-local) Every transaction (even nested) takes a new (incremented) sequence number upon start Every allocation in the transaction is tagged with its sequence number The relationship of an object being freed in a given transaction is determined by sequence number: seq(object) < seq(transaction) → speculative free seq(object) == seq(transaction) → balanced free

8 Monitors != Transactions STM uses bits in object to validate at commit Pessimistically monitors (locks) allow only one thread inside a critical section Optimistically transactions allow multiple threads inside a critical section This causes problems freeing an object

9 nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} Thread 1 Deleting node 2 Thread 2 Deleting node 3

10 nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */}

11 nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} At this point you have read / read (non) conflict

12 nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} Now we have a read / write conflict Thread 1 commits and thread two will abort

13 nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate & end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate & end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} STM Version information needed for validation is destroyed along with object 2

14 nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} Thread two wakes up

15 The bits thread 2 are relying on to detect and resolve conflict by aborting are now garbage nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */} nodeDelete(int key) { ptr = head of list; ptr = head of list; transaction { transaction { while( ptr->next->key != key ) { while( ptr->next->key != key ) { ptr = ptr->next; ptr = ptr->next; } } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; ptr->next = ptr->next->next; } /* validate &end transaction */ } /* validate & end transaction */ free(temp); /* Anyone using? */ free(temp); /* Anyone using? */}

16 Solution Delay the actual free and reuse until in a consistent state A global epoch (timestamp) is maintained and incremented periodically Each thread locally remembers the global epoch of the last time it entered or exited a top level transaction Set as part of TransactionBegin and TransactionAbort/Commit Each free and global epoch noted in a thread local buffer When the buffer fills each thread’s epoch is queried All frees before the minimum epoch are freed “for real” O(number of frees) not O(number of memory accesses)

17 McRT-Malloc Beats Hoard Machias Benchmark Mimics the consumer producer pattern with minimal work load (Normalized so X axis indicates linear scaling)

18 McRT STM Malloc Running Machias

19 McRT STM vs. McRT Malloc Running Machias

20 McRT STM vs. McRT Malloc Memory Usage Running Machias

21 Conclusion Best of class scalable malloc implementation Non-blocking to enable other McRT algorithms to be non-blocking and still use malloc Solved memory blowup within a transaction Solved premature freeing problem for STM with optimistic concurrency Enabled object granularity conflict detection in C

22 Questions