Enforcing Isolation and Ordering in STM Systems

Enforcing Isolation and Ordering in STM Systems
Tatiana Shpeisman et all Presentation by Ashish Dore.

Introduction What is Transactional Memory (TM)?
Simple concurrency control mechanism that is seen as a way to avoid traditional concurrency mechanisms such as locking. Programmer has to identify areas of atomic code and the system should ensure isolation and ordering of the execution.

Ok .. I am good so far .. Does it ??
The authors define present TM systems to be weak atomic - when non transactional memory accesses bypass the STM access protocols. So what does it mean for the programmer It voids the isolation provided to the transactional code if there is a data race between transactional and non transactional code.

So it is good to segregate data
To prevent this situation, it is good to segregate data into transactional part and non transactional part. How easy is it to segregate data? Any thoughts? Let us consider an example.

Aha .. Not all jargon .. Maybe I will not fall asleep during this presentation
Think of STM systems with eager versioning and lazy versioning.

What ideas do the authors suggest instead.
Strong Atomicity. It is when even non transactional code has to use the STM access protocols to update memory. Provides isolation and consistent ordering without segregation.

Has someone else tried this as well.
Most of the previous efforts to provide strong atomicity has been done with hardware support. Assume uni processor Assume strict segregation statically. Do not demonstrate scalability.

So what is there in this paper.
First scalable STM designed for multi processor with strong atomicity. Analysis of problems with weak atomicity Implement strong atomicity via efficient read and write barriers. Optimizations – what is the point of doing this without them. Finally compare prove that their method is better.

Characterizing weak atomicity behaviors
Cannot give semantics for transactions as it is out of scope of the paper. All unexpected behaviors arise when transactional and non transactional code intertwine and there is a write to the shared data. Problems categorized as follows. Shared with locks (improperly synchronized code) Eager versioning/lazy versioning systems Granularity issues

Incorrect synchronization … We meet again .. 
Non repeatable reads Lost updates Dirty reads

Eager versioning system anomalies
Speculative lost updates Speculative dirty reads

Lazy versioning system anomalies
Overlapped writes Buffered writes

Anomalies due to coarse grained versioning.
Granular lost updates Granular inconsistent reads

Summing up all the anomalies

Venture a guess !!! What is the problem faced by the first example if the system is a easy versioning system? What is the problem if it is a late versioning system?

Ok .. Propose a solution already !!
Enforcing Isolation and ordering between transactional and non transactional threads requires read and write isolation barriers. Implemented a high performance STM system with the following features.

What does it consist of ??? Extends Java with an atomic {B} construct for declaring an atomic code block. Includes close and open nesting and user – initiated retry operations. Automatically inserts and optimizes STM operations for code that executes inside a transaction and isolation barriers for non transactional code.

And .. It is based on the McRT-STM which implements optimistic concurrency control. It uses versioning for reads and two phase locking and eager versioning for writes. For the program static analysis, the Paddle extension of Soot is used.

How they implemented their ideas
The base STM system has a pointer sized transaction record which tracks the state of every object accessed by a transaction and the record is either in. Shared state – read access by many transactions Exclusive State – write access only for one. The objects stores the transaction record in a transaction field variable. To support strong atomicity, 2 more states are introduced.

And they would be .. Exclusive anonymous state – Indicates that some thread is updating data but there is no indication of which one. Indicates that non transactional code is updating data. Private state – Seen only to a single thread. Hence there is never any contention for private objects.

Lets show everything in a table.
Non transactional reads writes can now look up the states and in an eager versioning system can detect dirty reads. In a lazy versioning system it can detect if there are pending updates by a committed transaction. This can be detected by just looking at the 2nd lowest bit. Lower three bits indicate which state the transaction record is in. This creates effective read and write barriers. How?

State transitions BTR – IA 32 Bit test and retest.
CAS – Compare and swap operation.

How do the read and write barriers work?
Lazy versioning Read barrier first checks the Tx Record and then checks to see if the last two bits are 010. This means exclusive, so then there is a read conflict. After checking that, it checks to see if Tx Record has been changed since the last time it compared it. If it is not then read is done else read is not done and readConflict is called. Write barrier looks and tries to lock by flipping the lowest bit to one and then trying to see if it is possible. If it is then it will try to write it else it will call writeConflict. For Lazy versioning, there are no dirty reads and so once the Tx record is checked to make sure which state it is in, then it can go ahead and update the value. It just needs to check for an update from the most recent transaction.

Quiescence Quiescence can provide partial isolation and ordering of data. It can also solve the privatization problem without requiring the read and write barriers. Quiescence and other solutions do not provide solutions to general isolation problems such as speculative dirty reads and so on.

Dynamic escape analysis
We use dynamic escape analysis to solve the privatization problem. It detects if an object is private (visible to one) or public (visible to many). If an object is private then there are no barriers associated with that. Here is how the code looks like

Dynamic escape analysis cont ..
For writes of reference types, the write barrier also consists of instructions to publish a private object if it becomes public because of the right. If the new value that is being written references a non-null private object then these instructions call the function publishObject to publish the written object before it is visible to other threads.

Dynamic escape analysis cont ..
Each object has a vtable containing the map of the object fields which hold references. The slots are iterated over and a graph is formed which is rooted at the object. During the traversal which ever private object is encountered is made public. This is a finite list. The graph does not traverse beyond public objects and so there are no chains.

Static not accessed in transaction analysis.
A memory write does not need a barrier if the memory it writes is never accessed in a transaction. A memory read does not need a barrier if the memory it reads is never written in a transaction. This is shown in the table.

Static not accessed in transaction analysis cont ..

There is considerable work that has been done for detecting thread local objects. Not-accessed-in-transaction analysis (NAIT) Thread local analysis (TL) NAIT complements TL in two ways. Truly thread shared data may never be accessed in a transaction.

Common example is data handoff where in the queues used are accessed in the critical section but never the objects themselves. NAIT optimizes this but TL requires addition of these objects as well with limited effectiveness. TL treats a static field as shared even if it is used only in a single thread, NAIT does not.

Pointer analysis For accesses for a field or array outside a transaction, we need to know if it is accessing an object that is accessed in a transaction as well. Use Paddle extension to Soot to compute points to set for each bytecode that accesses the memory. Analysis is a whole program field sensitive and flow insensitive.

Pointer analysis cont .. But for it to be flow insensitive, there needs to be two sets of bytecode. One for in transaction and one of not in transaction. To avoid this, we simulate the effect of duplication by making this flow sensitive with two contexts .. In transaction and not in transaction.

Pointer analysis cont .. All calls inherit the current context and the ones that are lexically in atomic will be labeled as in transaction. Thus after pointer analysis, there are two points to sets for each bytecode that accesses memory.

Annotating Memory operations
After the pointer analysis, two more passes over the code are required for annotating bytecode with barrier removal information. First we figure out how each abstract object gets accessed within transactions by using both in transaction and not in transaction points to set for loads and stores.

Annotating Memory operations cont ..
For those not in transaction we do not need a barrier if it is a load and there are no object in the points to set is written in a transaction. if it is a store and there is no object in the points to set which is read or written.

Static analysis results
Same points to information is used for NAIT and TL for comparison purposes. NAIT removes more barriers than TL and it also removes all the barriers that TL removes.

JIT optimizations Do we need any more optimizations?
Of course we always do .. We want things to get done at the speed with which my stock portfolio hit the ground. ( OK maybe that is a little too fast .. Slower than that would be ok).

JIT optimizations cont ..
Does not insert barriers for immutable objects. Detects and eliminates barriers to thread local objects via escape analysis. Barrier aggregation detects multiple barriers to same object in same block and joins them together. This would definitely improve performance. Aggregate barriers access a single object. It does not aggregate across basic blocks.

Performance Investigate the cost of strong atomicity and the effectiveness of optimizations using both transactional and non-transactional workloads. For non-transactional benchmarks, measure the overhead of strong atomicity by running each benchmark with and without read and write isolation barriers.

Performance cont For transactional benchmarks, investigate the performance of a weakly atomic execution (with no isolation barriers) a strongly atomic execution (with isolation barriers) a lock-based synchronized execution (with synchronized regions in the source instead of atomic ones).

Performance cont Show that enforcing strong atomicity has little effect on the scalability of multi-threaded transactional workloads. Show that optimizations are extremely effective in mitigating the overhead of non-transactional and single-threaded workloads.

Overhead of strong atomicity on SPEC JVM98 without optimizations.

Read barrier overhead.

Write barrier overhead.

TSP over multiple threads.

OO7 over multiple threads.

Spec JBB over multiple threads.

Conclusions Strong atomicity is required to guarantee isolation and ordering. Since there are problems with weak atomicity, we need a more stricter mechanism. Cost of this strictness is reduced by the optimizations used.

Enforcing Isolation and Ordering in STM Systems

Similar presentations

Presentation on theme: "Enforcing Isolation and Ordering in STM Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Enforcing Isolation and Ordering in STM Systems

Similar presentations

Presentation on theme: "Enforcing Isolation and Ordering in STM Systems"— Presentation transcript:

Similar presentations

About project

Feedback