An Operational Approach to Relaxed Memory Models Xinyu Feng University of Science and Technology of China Joint work with Yang Zhang @ USTC
Why Memory Models C1 || C2 Compiler Result Which reads see which writes? Memory
Two different philosophies for RMM Define behaviors of all programs Such as x86-TSO, JMM DRF guarantee Behaviors of racy programs Weak, to incorporate main-stream optimizations But not too week Type safety, security, etc. , need to prohibit thin-air reads Define behaviors of only DRF programs C/C++11 But DRF with low-level atomics is difficult to understand
Operational Happens-Before Memory Model (OHMM) Follows the first philosophy Motivated by solving some of the problems of JMM Use an abstract machine to simulate relaxed behaviors Memory model defined as operational semantics (Almost) Avoids thin-air reads Avoids many surprising behaviors and bugs of JMM Weak enough to allow common compiler optimizations
Basic settings Two types of memory cells: normal & volatile Volatile read/write roughly corresponds to C++ load-acquire/store-release Volatile memory cells cannot be used as normal ones Unlike C++ This talk: non-volatile memory only
Design of the abstract machine Starting from a SC machine Adding 3 features to relax the program behaviors Event buffers History-based memory Replay of events Similar to [Yang et al. 2002] New mechanism for compiler optimizations
The Abstract Machine - SC Tn processor processor memory
The Abstract Machine – Event Buffer Tn 00:00 timer processor processor <<T1, t>, i> <<Tn, t’>, i'> event buffer memory
Events and Event Buffer Instructions are converted to events following the interleaving semantics: <<t1, 0>, x = 1> t1 t2 <<t2, 1>, y = 1> x = 1; r1 = y; y = 1; r2 = x; <<t2, 2>, r2 = x> <<t1, 3>, r1 = y>
Events and Event Buffer Events from the same threads could be reordered: <<t1, 0>, x = 1> t1 t2 <<t2, 1>, y = 1> x = 1; r1 = y; y = 1; r2 = x; <<t2, 2>, r2 = x> <<t1, 3>, r1 = y> Execution order: 2, 3, 0, 1 Result: r1 = r2 = 0
Limitation of Event Reordering Reordering of events is not weak enough for the following program: t1 t2 Reorder is not allowed due to data dependency! x = 1; r1 = x; x = 2; r2 = x; r1 = 2, r2 = 1 ?
The Abstract Machine – History-Based Memory <<t1,0>, n1> <<t2,1>, n2> . . . <<t2,3>, n3> <<t1,4>, n4> <<t2,5>, n5> <<t1,7>, n7> <<t1,9>, n9> T1 Tn timer 00:00 processor processor <<T1, t>, i> <<Tn, t’>, i'> event buffer A memory cell memory
<<t1, 8>, r = x> History-Based Memory We keep all the write operations in the corresponding memory cell. Update History of x <<t1,0>, n1> Read sees (1) the most recent write that happens-before it, (2) or writes from other threads that does not happens-before it. <<t2,1>, n2> . . . <<t2,3>, n3> <<t1,4>, n4> <<t2,5>, n5> <<t1,7>, n7> <<t1, 8>, r = x> <<t1,9>, n9>
<<t1, 3>, r1 = x> <<t2, 4>, r2 = x> History-Based Memory Update History of x t1 t2 x = 1; r1 = x; x = 2; r2 = x; <<t1,1>, 1> <<t2,2>, 2> r1 = 2, r2 = 1 ? <<t1, 3>, r1 = x> <<t2, 4>, r2 = x>
Support of Compiler Analysis Still cannot allow the following behavior: Initially: x = 0, y = 0 r1 = x; r2 = x; if (r1 == r2) y = 2; r3 = y; x = r3; r1 = r2 = r3 = 2?
Compiler Optimization Can Be Smart Initially: x = 0, y = 0 r1 = x; r2 = x; if (r1 == r2) y = 2; y = 2; r1 = x; r2 = r1; if (true) r3 = y; x = r3; r1 = r2 = r3 = 2? Redundant read elimination Must be allowed!
Support of Compiler Analysis Still cannot allow the following behavior: r1 = x; r2 = x; r1 = x; r2 = x; if (r1 == r2) y = 2; Our idea: Use dynamic execution to simulate static analysis (or symbolic execution). r3 = y; x = r3; Duplicate the first two lines. r1 = r2 = r3 = 2?
The Abstract Machine - Replay Tn timer 00:00 processor processor replay replay event buffer memory
Replay Buffer r1 = x; r2 = x; if (r1 == r2) r3 = y; y = 2; x = r3; Instead of code rewriting, we put an event into the replay buffer when they are executed, which can be executed a second time later. r1 = x; r2 = x; if (r1 == r2) y = 2; r3 = y; x = r3; replay r1 = x; r2 = x; Duplicate the first two lines. event buffer r1 = r2 = r3 = 2? Need to be careful to preserve sequential semantics.
Some constraints for replay When reads get replayed, its timestamp doesn’t change r = x; r = x; x = r+1; Cannot see the write
Some constraints for replay When reads get replayed, its timestamp doesn’t change When writes get replayed, the old writes in history is overwritten <<t1,0>, n1> <<t2,1>, n2> . . . <<t2,3>, n3> <<t1,4>, n4> <<t2,5>, n5> <<t1,7>, n7> <<t1,9>, n9> Recall the update history stored in memory cells: <<t1,4>, N > You won’t end up having two events with same time stamp but different update value
Some constraints for replay When reads get replayed, its timestamp doesn’t change When writes get replayed, the old writes in history is overwritten If a write have been seen by other threads, it cannot be replayed <<t1,0>, n1> false true . . . <<t2,1>, n2> . . . <<t2,3>, n3> <<t1,4>, n4> Use Boolean flags to remember whether it has been seen (by others) <<t2,5>, n5> <<t1,7>, n7> <<t1,9>, n9>
Necessary to prevent the following behavior:
Summary T1 Tn timer 00:00 processor processor event buffer memory replay replay event buffer memory
Properties of OHMM DRF-guarantee DRF defined under SC semantics (Almost) Passes JMM Causality Test Cases Except two controversial ones (Test case 5 and 10)
Test case 5 JMM decides to prohibit it, but controversial in the mailing list (at least the value 1 does show in the program).
Properties of OHMM DRF-guarantee DRF defined under SC semantics (Almost) Passes JMM Causality Test Cases Except two controversial ones (Test case 5 and 10) Avoids some counter-intuitive/buggy behaviors of JMM [Aspinall & Sevcik, 2007]
vs.
Properties of OHMM DRF-guarantee DRF defined under SC semantics (Almost) Passes JMM Causality Test Cases Except two controversial ones (Test case 5 and 10) Avoids some counter-intuitive/buggy behaviors of JMM [Aspinall & Sevcik, 2007] Soundness with respect to program transformations
Soundness w.r.t. Prog. Trans. Results for SC, JMM and JMM-Alt taken from [Sevcik and Aspinall, ECOOP 2008] JMM-Alt refers to [Aspinall and Sevcik, TPHOLs 2007], a fixed version of JMM A grain of salt: transformations used for OHMM are defined syntactically, which are less general than their semantically-defined counterparts in [Sevcik and Aspinall, ECOOP 2008]
Summary OHMM Properties Event buffer, history-based memory, and replay Supports DRF Guarantee (proved in Coq) Weak enough to support compiler optimization Thanks to the replay mechanism No out-of-thin-air reads ? Avoids surprising behaviors in JMM Lockless programs have more relaxed semantics than JMM Semantics of locks is stronger Adding more synchronization reduces (instead of increases) behaviors
Thank you!