Presentation is loading. Please wait.

Presentation is loading. Please wait.

05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

Similar presentations


Presentation on theme: "05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006."— Presentation transcript:

1 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006

2 2 Presentation Overview Introduction Motivation Preliminaries Implementing LCM Results Implementation Status

3 05 May 20063 Motivation What problem are we trying to solve? Lazy Code Motion is a bit-vector-based iterative dataflow algorithm for partial redundancy elimination (PRE) that delivers safe, computationally optimal results. SSAPRE is an approach to PRE that was specifically designed to work on SSA-form that also delivers a computationally optimal placement Unfortunately, the sparse SSAPRE algorithm does not always perform better than the older Lazy Code Motion dataflow algorithm.

4 05 May 20064 Solution? LCM is based on the source level syntax of a program… an expression like a+b is easy to identify in non-SSA form. In SSA-form, variables are renamed… What variables are the same from a source-level perspective? What expressions are equivalent to each other? How do we handle multiple instances of the same variable being live at the same time? Which instance of a variable do we use when we move a computation to a new location? Why is implementing Lazy Code Motion on an SSA-based internal representation (like LLVM’s) difficult?

5 05 May 20065 Redundant and Partially Redundant Computations Redundant Code motion is used to remove Redundant computations… Partially Redundant … and Partially Redundant computations. f := 7 y := e + f f := 7y := e + f s := b + c t := b + c

6 05 May 20066 Critical Edges Problem Problem: Code motion can be blocked by “Critical Edges” – edges leading from nodes with more than one successor to nodes with more than one predecessor. Solution Solution: An edge splitting transformation can be performed that inserts extra nodes. z := u + v w := u + v h := u + v z := h w := u + v h := u + v

7 05 May 20067 Variable Equivalent Classes (VECs) What is a Variable? A VEC? Variables that are operands of a phi-node, along with the phi- node itself are placed in to the same VEC. Many variables may be tied together by multiple phi-nodes. Independent variables and constants are placed in singleton VECs Function arguments can also be included in VECs a1 := c x := a1 + b a0 := d y := b + a2 a3 := f z := a3 + b a2 = phi (a1, a0)

8 05 May 20068 Expression Equivalent Classes (EECs) When are two expressions equivalent? Two expressions are considered equivalent for purposes of code motion if and only if… 1. they have the same operator and 2. the corresponding operands of the two expressions are in the same VEC. Modulo commutativity, etc. of course a1 := c x := a1 + b a0 := d y := b + a2 a3 := f z := a3 + b a2 = phi (a1, a0)

9 05 May 20069 “Stale” Uses and “Fresh” Values What is a “stale” use? A “stale” use occurs when the live ranges of two different versions of the same source-level variable overlap An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values What is a “fresh” value? Intuitively, it is the most recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point. a1 := c a0 := d ==> y := b + a2 z := b + a0 a2 = phi (a1, a0)

10 05 May 200610 Freshness Analysis The Freshness Lattice BOTTOM < SSA values < TOP Local Freshness For each instruction, make that SSA value the Fresh value for its VEC. What ever is Fresh at the exit is X_FRESH Global Freshness To compute the N_FRESH for a basic block we meet over the succesors. Removal of Stale Uses After completion of the Freshness analysis we remove Stale uses by inserting copies. a1 := c a0 := d ==> y := b + a2 z := b + a0 a2 = phi (a1, a0)

11 05 May 200611 LCM Analyses Which analyses do we perform? We perform upsafety, downsaftey, earliestness, delayability, latenesses. Why do we not do the Isolation analysis? Mem2reg, essentially like leaving the original computation in place. Worklist based No predecessors/successors can cause problems a1 := c a0 := d ==> y := b + a2 z := b + a0 a2 = phi (a1, a0)

12 05 May 200612 Moving Code… The Almost LCM Transformation The Basic Block Local Transform We do not require local CSE as a prereq. We first insert new computations for everything marked as N_INSERT for this basic block. As we step through instuctions in a given basic block we update the local fresh set based on the fresh set at the beginning of the basic block. We also keep a set of dead computations. For each Binary Operator if its computation is dead we insert a new computation with the proper Fresh operands. We store this computation to a memory location specific to each EEC. At the point of each original computation we insert a load of the proper memory location, and replace all uses of that original computation with the load. At the end of the basic block we insert computations and stores for all expressions that are X_INSERT and not X_REPLACE. These will be used in later basic blocks.

13 05 May 200613 Example y0 := a0 + b0 y1 := a0 + b0 a1 := G0 y2 := a1 + b0 y3 := y2 + b0 EEC0_comp_0 := a0 + b0 store EEC0_comp_0, ECC0 EEC0_load_0 := load ECC0 a1 := G0 EEC0_comp_1 := a1 + b0 store EEC0_comp_1, ECC0 EEC0_load_1 := load ECC0 EEC1_comp_0 := EEC0_load_1 + b0

14 05 May 200614 Results

15 05 May 200615 Bmps!

16 05 May 200616 Bmps!

17 05 May 200617 Limitations Currently we can only dubiously handle programs which use unwind (maybe it will work, maybe not, if it does it is probably by accident). While we appear to handle programs that use unreachable correclty we are not completely sure. The algorithm is pretty slow due to all the book keeping we must do.

18 05 May 200618 Random Thoughts I actually found a case where map is faster than hash_map! Using handles to make Fresh updates not suck The truth(?) of VECs!

19 05 May 200619 Results What is a “stale” use? A “stale” use occurs when the live ranges of two different versions of the same source-level variable overlap An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values What is a “fresh” value? Intuitively, it is the most recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point. a1 := c a0 := d ==> y := b + a2 z := b + a0 a2 = phi (a1, a0)

20 05 May 200620 bmps z := a + b x := a + b y := a + b w := a + b a := c x := a + b The predicate equations used by the algorithm make use of two local predicates defined below: For every assignment node n ≡ v := t′ and every term t  T \ V (where T is the set of all terms, and V is the set of all variables): Used(n, t) = t  SubTerms(t′ ) Transp(n, t) = v  Var(t) When t is understood, these predicates will be denoted Used(n) and Transp(n). z := a + b x := a + b y := a + b w := a + b a := c x := a + b

21 05 May 200621 Down-Safe z := a + b x := a + b y := a + b w := a + b a := c x := a + b A node n is D-SAFE if a computation of a term t at n does not introduce a new value on a terminating path starting in n. D-SAFE(n) = falseif n = e Used(n)  otherwise Transp(n)  D-SAFE(m) m  succ(n) D-Safe

22 05 May 200622 Earliest EARLIEST(n) = trueif n = s (¬ Transp(m)  otherwise ¬ D-SAFE(m)  EARLIEST(m))Σ m  pred(n) z := a + b x := a + b y := a + b w := a + b a := c x := a + b A node n is EARLIEST if there is a path from s to n where no node on that path prior to n is D-Safe and delivers the same value for t as when computed at n. D-SafeEarliest

23 05 May 200623 Safe-Earliest Transformation x := h h := a + b x := h y := h w := h a := c z := h h := a + b Introduce a new auxiliary variable h for the term t. Insert at the entry of every node n that is both D-Safe and Earliest the assignment h := t. Replace every original computation of t by h. D-Safe & Earliest The set of nodes that are both D-Safe and Earliest are computationally optimal computation points. Safe-Earliest Transformation…

24 05 May 200624 An Example… z := a + b x := a + b y := a + b w := a + b a := c x := a + b The Lazy Code Motion Approach: Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses. Identifies computation points that allow variables to be initialized “as late as possible”. Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.

25 05 May 200625 Delay z := a + b x := a + b y := a + b w := a + b a := c x := a + b A node n is DELAY if on every path from s to n there is a computation of the Safe-Earliest Transform such that all subsequent original computations lie in n. DELAY(n) = D-SAFE(n)  EARLIEST(n)  falseif n = s ¬ Used(m)  DELAY(m) otherwise m  pred(n) Delay D-Safe & Earliest

26 05 May 200626 Latest z := a + b x := a + b y := a + b w := a + b a := c x := a + b LATEST(n) = falseif n = e DELAY(n)  otherwise ( Used(n)  ¬ DELAY(m)) m  succ(n) DelayLatest A node n is LATEST if… n is a computation point of some computationally optimal placement. On every terminating path starting in n, any subsequent optimal computation point follows an original computation.

27 05 May 200627 An Example… z := a + b x := a + b y := a + b w := a + b a := c x := a + b The Lazy Code Motion Approach: Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses. Identifies computation points that allow variables to be initialized “as late as possible”. Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.

28 05 May 200628 Isolated z := a + b x := a + b y := a + b w := a + b a := c x := a + b A node n is ISOLATED if on every terminating path starting from a successor of n, any original computation of t is preceded by a new, latest computation. ISOLATED(n) = trueif n = e ( LATEST(m)  otherwise ¬ Used(m)  ISOLATED(m)) m  succ(n) IsolatedLatest

29 05 May 200629 Lazy Code Motion Transformation z := h x := a + b h := a + b y := h w := h a := c x := a + b h := a + b Set of Optimal Computation Points for t : OCP = { n | Latest(n)  ¬ Isolated(n) } Set of Redundant Occurrences of t : RO = { n | Used(n)  ¬ (Latest(n)  Isolated(n)) } Introduce a new auxiliary variable h for the term t. OCP Insert at the entry of every node in OCP the assignment h := t. RO Replace every original computation of t in nodes of RO by h. LatestLatest & Isolated LCM Transformation…

30 05 May 200630 Register pressure is not always reduced. Some desirable code motion is not allowed. Code size can be increased. Considerations

31 05 May 200631 ab How do the live ranges of a and b affect “lifetime optimality”? Reducing Register Pressure? z := h h := a + b y := h w := h x := a + b h := a + b x := h y := h w := h z := h h := a + b

32 05 May 200632 Some desirable code motion is not D-Safe and therefore not allowed. Code Motion & Down-Safety w := a + b w := h h := a + b

33 05 May 200633 Late placement of computation points increases code size. Code Bloat y := h h := a + b y := h h := a + b y := h h := a + b y := h h := a + b y := h h := a + b y := h

34 05 May 200634

35 05 May 200635 Equations 1 D-SAFE(n) = falseif n = e Used(n)  otherwise Transp(n)  D-SAFE(m) m  succ(n) EARLIEST(n) = trueif n = s (¬ Transp(m)  otherwise ¬ D-SAFE(m)  EARLIEST(m))Σ m  pred(n) For every node n ≡ v := t′ and every term t  T \ V… Used(n, t) = t  SubTerms(t′ ) Transp(n, t) = v  Var(t)

36 05 May 200636 Equations 2 DELAY(n) = D-Safe(n)  Earliest(n)  falseif n = s ¬ Used(m)  DELAY(m) otherwise m  pred(n) LATEST(n) = falseif n = e Delay(n)  otherwise ( Used(n)  ¬ Delay(m)) m  succ(n)

37 05 May 200637 Equations 3 ISOLATED(n) = trueif n = e ( Latest(m)  otherwise ¬ Used(m)  ISOLATED(m)) m  succ(n) RO = { n | Used(n)  ¬ (Latest(n)  Isolated(n)) OCP = { n | Latest(n)  ¬ Isolated(n) }


Download ppt "05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006."

Similar presentations


Ads by Google