Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv
Dataflow Impediments to Parallelization public void set(Object o) { this.f = calc_f(o); } public void process() { Object o = this.f; if (o == null) { doA(); } else { doB(); } public void setAndProcess(Object o) { set(o); process(); } set(o) || process()? RAW dependency
for (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this. block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } Simplified version of the JGraphT algorithm for building a block- cutpoint graph Sometimes It’s Less Obvious for (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this.block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } } This code admits a lot of available parallelism, but there are a few impediments that must be addressed toward parallelizing it. How can we pinpoint these dependencies precisely and concisely?
Field-based Dependence Analysis So let’s use dynamic dependence analysis instead… Static dependence analysis is challenged by dynamic containers, aliasing, etc
789 modcount table [0] next [8] next 1 K key value […] … … 2 K’ key value next m.put(k,1); m.put(k’,2); Spurious dependencies, which inhibit m.put(k,1) || m.put(k’,2)! m = new ConcurrentHashMap(); 2 m.put(k,2); Semantic dependency, which gets “lost” in the noise!
Eureka: Let’s Use Abstraction Abstract Locking Galois Leveraging ADT semantics in STM conflict detection Using ADT semantics in DB concurrency control (Muth et al., 93) Exploiting commutativity in DB transactions (Bernstein, 66) But… We need a predictive tool; our code is still sequential We want the tool to pinpoint impediments to parallelization before applying parallelization transformations
The Hawkeye Analysis Tool 789 modcount table [0] nex t [8] nex t 1 K key value […] … … 2 K’K’ key value nex t K 1 valueK K’K’ 2 K’ ? value ? ? ? Representation Function KeyValue Concrete Map state Map ADT state Dynamic analysis tool Uses abstraction while tracking (certain) dependencies User specifies representation function for data structures of choice; rest tracked concretely Allows concentrating on semantic dependencies while suppressing spurious dependencies
Specification Language foreach key k in m.keySet() adtState.add(m -> k); foreach entry (k,v) in m.entrySet() adtState.add(k -> v); foreach node n in g.nodes() adtState.add(g -> n); foreach edge (n 1,n 2 ) in g.edges() adtState.add(n 1 -> n 2 ); Map Graph
Specification Language foreach instance i 1 in instances() foreach instance i 2 in instances() adtState.add((i 1,i 2 ) -> distance(i 1,i 2 )); … DistanceFunction
Specification Language No need to model ADT operations User can refine approximation (though our experience shows that the default is mostly accurate) No need for a commutativity spec Hawkeye uses heuristics for (sound) approximation of the footrprint of an ADT operation
Concrete The Hawkeye Algorithm 789 M modcount table [0] next [8] next 1 K key value 2 K’ key value 2 M (M,X) (M,K) 1 (M,K,1) K (M,K’) 2 (M,K’,2) K’ m.put(k,1); m.put(k’,2); m.put(k,2); 2 (M,K,2) (R: {}, W: {(M,K),(M,K,1)}) (R: {}, W: {(M,K’),(M,K’,2)}) (R: {}, W: {(M,K),(M,K,1),(M,K,2)}) WAW Our assumptions: linearizability – for trace abstraction encapsulation – for state abstraction Logical
Challenges What is the meaning of dependencies under abstraction? How can we track both concrete and abstract dependencies simultaneously? We’ve developed a uniform framework for tracking data dependencies…
Best Write Set The write set of transition is the union of – the locations whose value was changed by ; – the locations allocated by ; and – the locations de-allocated by. Intuitively, the write set of a transition is its observable effect, i.e., the delta between the entry and exit states.
Best Read Set (More Tricky) is a sufficient read set of transition iff for every, such that and agree on, write( ) ≡ write( ). The read set of transition is the union of all its minimal sufficient read sets. Intuitively, the read set of a transition is the set of locations whose values determine the observable effect of the transition.
Simple Example ([y=3], set(y,4), [y=4]) Read set:{ y } Write set:{ y } ([y=3], set(y,3), [y=3]) Read set:{ y } Write set:{ } Secures y=4 in exit state Secures empty write set
Approximating the “Best” Definitions The good news: The “best” definitions apply both in concrete and in abstract semantics The bad news: The definition of the “best” read set is not computable in general An approximation r, w of read, write is sound iff read r w write w
Usage Scenario 7 modcount table [0] nex t [8] nex t 1 K key value […] … … 2 K’K’ key value nex t Hmmm… Too many dependencies!
Usage Scenario K 1 valueK K’K’ 2 K’ ? value ? ? ? Now I understand what’s going on!
Usage Scenario K 1 valueK K’K’ 2 K’ ? value ? ? ?
Number of inter-iteration dependencies at the level of ADT operations with and without abstraction Only built-in spec (Java collections)
Number of inter-iteration dependencies at the level of ADT operations with and without abstraction Including user spec (for user types)
789 modcount table [0] next T H A […] … … N next Y O U ! K
Backup
Preliminaries A state maps memory locations to values. A transition is a triple, where p is a program statement and are states, such that. A program trace is a sequence of transitions. We assume an interleaving semantics of concurrency.
Challenges What is the meaning of dependencies under abstraction? How can we track both concrete and abstract dependencies simultaneously? We’ve developed a uniform framework for tracking data dependencies…
Best Write Set The write set of transition is the union of – the locations whose value was changed by ; – the locations allocated by ; and – the locations de-allocated by. Intuitively, the write set of a transition is its observable effect, i.e., the delta between the entry and exit states.
Best Read Set (More Tricky) is a sufficient read set of transition iff for every, such that and agree on, write( ) ≡ write( ). The read set of transition is the union of all its minimal sufficient read sets. Intuitively, the read set of a transition is the set of locations whose values determine the observable effect of the transition.
Simple Example ([y=3], set(y,4), [y=4]) Read set:{ y } Write set:{ y } ([y=3], set(y,3), [y=3]) Read set:{ y } Write set:{ } Secures y=4 in exit state Secures empty write set
Approximating the “Best” Definitions The good news: The “best” definitions apply both in concrete and in abstract semantics The bad news: The definition of the “best” read set is not computable in general An approximation r, w of read, write is sound iff read r w write w
Approximate Read Set Take 1: all the locations reachable from arguments Take 2: all the locations reachable from arguments that were accessed during the statement’s execution Take 3: all the locations reachable from arguments that were accessed during the statement’s execution with user specification of the frame