Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.

Similar presentations


Presentation on theme: "Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp."— Presentation transcript:

1 Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010 If you have already taught superlocal value numbering, then this lecture is short. If not, then it is a full lecture, as the explanation of EBBs takes time. If you have already taught superlocal value numbering, then this lecture is short. If not, then it is a full lecture, as the explanation of EBBs takes time.

2 Local Instruction Scheduling The Overall Picture 1.Build a dependence graph 2.Compute priority function 3.Apply the local scheduling algorithm The Overall Picture 1.Build a dependence graph 2.Compute priority function 3.Apply the local scheduling algorithm Comp 412, Fall 20101 Cycle  1 Ready  leaves of P Active  Ø while (Ready  Active  Ø ) if (Ready  Ø ) then remove an op from Ready S(op)  Cycle Active ¬ Active  op Cycle  Cycle + 1 for each op  Active if (S(op) + delay(op) ≤ Cycle) then remove op from Active for each successor s of op in P if (s is ready) then Ready  Ready  s Cycle  1 Ready  leaves of P Active  Ø while (Ready  Active  Ø ) if (Ready  Ø ) then remove an op from Ready S(op)  Cycle Active ¬ Active  op Cycle  Cycle + 1 for each op  Active if (S(op) + delay(op) ≤ Cycle) then remove op from Active for each successor s of op in P if (s is ready) then Ready  Ready  s Recap of previous lecture

3 Comp 412, Fall 20102 Local Scheduling As long as we stay within a single block List scheduling does well The problem is hard, so tie-breaking matters —More descendants in dependence graph —Prefer operation with a last use over one with none —Breadth first makes progress on all paths  Tends toward more ILP & fewer interlocks —Depth first tries to complete uses of a value  Tends to use fewer registers Classic work on this is Gibbons & Muchnick ( [154] in EaC )

4 One step beyond a block is an Extended Basic Block (EBB) EBB is a maximal set of blocks s.t. —Set has a single entry, B i —Each block B j other than B i has exactly one predecessor Example CFG has three EBBs Comp 412, Fall 20103 Scheduling Larger Regions abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 CFG ≅ Control Flow Graph

5 Comp 412, Fall 20104 Scheduling Larger Regions One step beyond a block is an Extended Basic Block (EBB) EBB is a maximal set of blocks such that —Set has a single entry, B i —Each block B j other than B i has exactly one predecessor Example has three EBBs —Big EBB has two paths —{B 1,B 2,B 4 } & {B 1,B 3 } Many optimizations operate on EBBs (including scheduling) abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

6 Comp 412, Fall 20105 Scheduling Larger Regions Superlocal Scheduling Schedule entire paths through EBBs Example has four EBB paths abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

7 Comp 412, Fall 20106 Scheduling Larger Regions Superlocal Scheduling Schedule entire paths through EBBs Example has four EBB paths —Two paths are nontrivial —{B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts —Moving an op out of B 1 causes problems abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

8 Comp 412, Fall 20107 Scheduling Larger Regions abcdabcd g c,e f hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 no c here ! Superlocal Scheduling Schedule entire paths through EBBs Example has four EBB paths —Two paths are nontrivial —{B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts —Moving an op out of B 1 causes problems

9 Comp 412, Fall 20108 Scheduling Larger Regions Superlocal Scheduling Schedule entire paths through EBBs Example has four EBB paths —Two paths are nontrivial —{B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts —Moving an op out of B 1 causes problems —Must insert “compensation” code in B 3, unless c is dead in B 3 —Increases code space —May not help on {B 1,B 3 } abcdabcd cgcg c,e f hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 c was moved for correctness not for speed!

10 Comp 412, Fall 20109 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths —{B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts —Moving an op into B 1 causes problems abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

11 Comp 412, Fall 201010 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths —{B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts —Moving an op into B 1 causes problems —Lengthens {B 1,B 3 } —May need compensation code,  Renaming may avoid “undo f ”  For example, rename result of f along path & add a copy along if it is still live.. a b c d,f undo f g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 Compensation code makes the path B 1 B 3 even longer!

12 Superlocal Scheduling How much can we get? —Schielke constrained away compensation code —Algorithm produced 11 to 12% speed ups —So, it is worth doing … Comp 412, Fall 201011 Scheduling Larger Regions abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

13 Comp 412, Fall 201012 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Join points create blocks that must work in multiple contexts 2 paths 3 paths 7 branches abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

14 Comp 412, Fall 201013 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B 6a B 5a B3B3 jkjk B 5b ll B 6b B 6c 8 branches Superblock Cloning Enabling transformation Clone up to a backward branch Creates better conditions for other transformations - e.g., superlocal scheduling Other enabling transformations - loop unrolling, inlining Superblock Cloning Enabling transformation Clone up to a backward branch Creates better conditions for other transformations - e.g., superlocal scheduling Other enabling transformations - loop unrolling, inlining

15 Comp 412, Fall 201014 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some of the resulting blocks combine —Single successor, single predecessor 8 branches abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B 6a B 5a B3B3 jkjk B 5b ll B 6b B 6c

16 Comp 412, Fall 201015 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some of the resulting blocks combine —Single successor, single predecessor Now schedule EBBs {B 1,B 2,B 4 }, {B 1,B 2,B 5q }, {B 1,B 3 } —Pay heed to compensation code Works well for forward motion Backward motion still needs compensation code —Speeding up one path can slow down others (undo) abcdabcd efef hilhil jkljkl B1B1 B2B2 B4B4 B 5a B3B3 gjklgjkl 4 branches

17 Comp 412, Fall 201016 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges —Obtained by profiling abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

18 Two options: 1. Profile data — instrument the code and run it on “representative data” 2. Static estimates — use some simple heuristic and guess — jumps x 1, branches x 0.5, backward branches x 10 Two options: 1. Profile data — instrument the code and run it on “representative data” 2. Static estimates — use some simple heuristic and guess — jumps x 1, branches x 0.5, backward branches x 10 Comp 412, Fall 201017 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges —Obtained by profiling Pick the “hot” path —A “trace” is a maximal length, acyclic path through the CFG Block counts can mislead us — see B 5 10 3 7 5 55 3 2 abcdabcd g efef B4B4 l jkjk B1B1 B2B2 B6B6 B5B5 B3B3 hihi

19 Comp 412, Fall 201018 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges —Obtained by profiling Pick the “hot” path —B 1,B 2,B 4,B 6 Schedule it —Compensation code in B 3,B 5 if needed —Get the hot path right! If we picked the right path, the other blocks do not matter as much —Places a premium on quality profiles 10 3 7 5 55 3 2 abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

20 Comp 412, Fall 201019 Scheduling Larger Regions Trace Scheduling Entire CFG Pick & schedule hot path Insert compensation code Remove hot path from CFG Repeat the process until CFG is empty Example —B 1 B 2 B 4 B 6 then B 3 B 5 —other edges run between scheduled blocks Idea Hot paths matter The farther we go off the hot path, the less it matters 10 3 7 5 55 3 2 abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

21 Extra Materials Start Here Comp 412, Fall 201020

22 Comp 412, Fall 201021 Local Scheduling Schielke’s RBF algorithm Run 5 passes of forward list scheduling and 5 passes of backward list scheduling Break each tie randomly Keep the best schedule —Shortest time to completion —Other metrics are possible (shortest time + fewest registers) In practice, this does very well Randomized Backward & Forward Randomized Backward & Forward


Download ppt "Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp."

Similar presentations


Ads by Google