Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
1 CS318 Project #3 Preemptive Kernel. 2 Continuing from Project 2 Project 2 involved: Context Switch Stack Manipulation Saving State Moving between threads,
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
Chapter 6: Process Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 8, 2005 Objectives Understand.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Synchronization Solutions
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
9/8/2015cse synchronization-p1 © Perkins DW Johnson and University of Washington1 Synchronization Part 1 CSE 410, Spring 2008 Computer Systems.
CS510 Concurrent Systems Introduction to Concurrency.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
Condition Variables and Transactional Memory: Problem or Opportunity? Polina Dudnik and Michael Swift University of Wisconsin, Madison.
Cpr E 308 Spring 2004 Real-time Scheduling Provide time guarantees Upper bound on response times –Programmer’s job! –Every level of the system Soft versus.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Race condition The scourge of parallel and distributed computing...
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Executing Parallel Programs with Potential Bottlenecks Efficiently Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa {oyama, tau,
The ATOMOS Transactional Programming Language Mehdi Amirijoo Linköpings universitet.
Chapter 6: Process Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Module 6: Process Synchronization Background The.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Concurrency & Dynamic Programming.
1 Based on: The art of multiprocessor programming Maurice Herlihy and Nir Shavit, 2008 Appendix A – Software Basics Appendix B – Hardware Basics Introduction.
SYNAR Systems Networking and Architecture Group CMPT 886: The Art of Scalable Synchronization Dr. Alexandra Fedorova School of Computing Science SFU.
Practice Chapter Five.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Chapter 6: Process Synchronization
CS703 – Advanced Operating Systems
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Atomic Operations in Hardware
Reactive Synchronization Algorithms for Multiprocessors
Chapter 5: Process Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
Jonathan Walpole Computer Science Portland State University
NOBLE: A Non-Blocking Inter-Process Communication Library
Concurrency: Mutual Exclusion and Process Synchronization
Software Transactional Memory Should Not be Obstruction-Free
Major Topics in Operating Systems
Kernel Synchronization II
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul

Problem: Programmer uses mutex in multi-threaded programming to prevent inconsistency in shared data. But mutex ( locks ) seems to be creating bottleneck : If we have many core system, then using mutex will create competition for lock that will keep threads/process out of ready queue (blocked, waiting for lock when waken up by other threads ).

Disadvantages of using mutex : * Taking few locks : It's common to forget to acquire lock before writing and end-up with multiple threads modifying same data. * Deadlock and live-lock: It is uncommon for a programmer to find out the situation when his system can go into deadlock. and programmer knows no-way to handle out deadlock situation. * Lost wake-ups : It is easy to forget to signal a conditional variable (lock object) on which thread is waiting.

Replacements of lock for multi-core system Atomic operations : knows as CAS ( compare and swap ), which is supported by modern processors. CAS (Value held in memory, Old value, New value) { Existing value = Value held in memory; if (Existing value == Old value) { Value held in memory = New value; return Old value ( success ); } else return Value held in memory ( fail ); }

Atomic operation... So updatation routine becomes: While ( CAS ( &address, oldValue, newValue ) ) ; Retry : if CAS is failed then we need to retry the operation.

Example of Atomic increment while(true){ old_val=sequenceNumber.get(); new_val=old_val+1; if(sequenceNumber.compareAndSet(old_val,new_val))//CAS break; }

Benchmark : Written in C++ and tested over 16 core linux server and 8 core Solaris system. The following benchmarks were tested : 1) Counting : Many threads are incrementing a shared variable. 2) Linked List : Insertion operation performed by multiple threads if key is not there. If( !found(head,node)) insert(head,node); 3) HashTable : Extension of linked list.

Benchmark HashTable: Contention : Many threads are inserting node in a same bucket. Advantage in using Atomic operation : If we have contention, then while one thread is doing insertion at the tail, another thread ( or many thread ) can proceed to search/scan the existing node.

Disadvantage of Atomic operation: * If the critical section ( writings ) is large, then there would be many retries, which would degrade the performance.

Conclusion : * Increasing number of thread beyond the number of physical cores does not gives us much improvement. Having too many buckets increase probability of having no contention, so atomic operation does not seems to be advantageous over mutex. Acquiring and releasing locks, does not create overhead. There is overhead only when there is contention.

References : Art of multiprocessor programming : Maurice Herlihy. A pragmatic Implementation of Non-blocking Linked-lists : Timothy L. Harris.