Type Systems For Distributed Data Sharing Ben Liblit, Alex Aiken, and Katherine Yelick University of California, Berkeley
Why Shared/Private Matters Data location management Cache coherence Race condition detection Program/algorithm documentation Consistency model relaxation Synchronization elimination Autonomous garbage collection Security
Highlights of This Talk Review of underlying memory model Originally in [Liblit et al, POPL ’00] Captures representation but not sharing Suite of type systems for data sharing One size does not fit all Overview of type inference Selected experimental findings
Distributed Memory Model Multiple machines, each with local memory Global memory is union of local memories Distinguish two types of pointers: Local points to local memory only: address Global points anywhere: machine, address Different representations & operations
Type Grammar Integers and pointers; unboxed pairs in paper All indirection (boxing) is explicit Pointers are either local or global Coercion, not subtyping
Review of Global Dereferencing: Standard Approach Unsound x = 5 x =
Review of Global Dereferencing: Standard Approach Unsound x = 5 x =
Review of Global Dereferencing: Sound With Type Expansion 5 x =
Review of Global Dereferencing: Sound With Type Expansion
Representation Versus Sharing Consider obvious assumptions: local pointers address private data global pointers address shared data 5
Representation Versus Sharing Locally pointed-to data might not be private 5
Representation Versus Sharing Locally pointed-to data might not be private Because of local / global aliasing x = 5
Representation Versus Sharing Locally pointed-to data might not be private Because of transitivity + pointer widening y = 5 y =
Representation Versus Sharing Globally pointed-to data might not be shared What if “y” never actually happens? y = 5 y =
Distinct, but not Independent Local pointer to shared data: Local pointer to private data: Global pointer to shared data: Global pointer to private data: ?!? Several possible approaches Determines what “private” really means Determines which clients can benefit
Sharing Qualifiers Polymorphism needed in practice Top, but no bottom Mixed: supertype of shared & private Local access only, but assume others may be watching Top, but no bottom mixed = shared private
Augmented Type Grammar Allow subtyping of pointers But not across pointers, since we allow assignment Allocation is explicitly shared or private
Late Enforcement: Limited Use of Global Pointers
Late Enforcement: Applicability Data location management Cache coherence Race condition detection Program/algorithm documentation Consistency model relaxation Synchronization elimination Autonomous garbage collection (in practice) Security
Why Garbage Collection Breaks Locally allocate some private data 5
Why Garbage Collection Breaks Locally allocate some private data Send its address to another machine 5
Why Garbage Collection Breaks Forget the original local pointer 5
Why Garbage Collection Breaks Forget the original local pointer Garbage collect unreachable private data
Why Garbage Collection Breaks Later, retrieve the global pointer Coerce back to local (runtime check)
Export Enforcement: No Escape of Private Addresses Note that τ′ might reference private data Autonomous garbage collection: OK Security: not OK
Early Enforcement: Shared is Transitively Closed
Recap of Enforcement Strategies Late enforcement Anything can point to anything Restricted global dereference & assignment y = 3 5
Recap of Enforcement Strategies Export enforcement Can only reveal shared addresses Still restrict global pointer operations y = 3 5
Recap of Enforcement Strategies Early enforcement Shared universe is transitively closed Global pointer restrictions trivially satisfied y = 3 5
Type Inference: Constraint Generation Type structure already known Including local / global Induce constraints on sharing qualifiers δ = shared from global deref / assign δ ≤ δ′ from assignments δ = δ′ from various other operations Stricter enforcement adds more constraints δ = shared Þ δ′ = shared
Type Inference: Constraint Resolution δ1 δ2 private δ shared Given constraints δ ≤ δ1 shared ≤ δ1 δ ≤ δ2 private ≤ δ2
Type Inference: Constraint Resolution δ1 mixed δ2 shared private δ shared shared Two “minimal” solutions δ shared Þ δ1 mixed Ù δ2 shared
Type Inference: Constraint Resolution δ1 private δ2 mixed private δ private shared Two “minimal” solutions δ shared Þ δ1 mixed Ù δ2 shared δ private Þ δ1 private Ù δ2 mixed
Type Inference: Biased Constraint Resolution δ1 shared ≤ δ2 private δ shared Push “shared” and “mixed” forward
Type Inference: Biased Constraint Resolution δ1 shared ≤ δ2 private δ shared Push “shared” and “mixed” forward Identify qualifiers which cannot be private
Type Inference: Biased Constraint Resolution δ1 private shared ≤ δ2 private δ private shared Push “shared” and “mixed” forward Identify qualifiers which cannot be private Set all other qualifiers to private
Type Inference: Biased Constraint Resolution shared ≤ δ2 private ≤ δ2 δ1 private private δ private shared Identify qualifiers which cannot be private Set all other qualifiers to private Push “private” forward
Type Inference: Biased Constraint Resolution δ1 private δ2 mixed private δ private shared Set all other qualifiers to private Push “private” forward Set remaining qualifiers to “shared” or “mixed”
Implementation For Titanium Java + SPMD extensions Objects, classes, interfaces, methods Multidimensional arrays, templates Local / global, communications primitives Sharing validation as type checking Sharing inference as compiler analysis Late or early enforcement Whole-program or partial
Experimental Findings: Consistency Model Relaxation Titanium has very weak consistency model Sequential model preferred, but too slow? Sequential is overkill for private data Weakly consistent on private data Sequentially consistent on shared data Compare to weak & fully sequential models Four-way Pentium III SMP at 550 MHz
Experimental Findings: Consistency Model Relaxation
Experimental Findings: Data Location Management Tally allocations by type at run time Tremendous variation 1% - 100% of allocated bytes are private 45% in large gas benchmark Sensitivity to enforcement policy amr: 74% late / 19% early
Summary “Private” might not mean what you think Generalize on earlier (often implicit) designs Amenable to efficient type inference Experimental implementation Ideas & algorithms scale to real system More aggressive clients needed Potential for stronger, phase-aware inference