Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Similar presentations


Presentation on theme: "Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *"— Presentation transcript:

1 Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † * Institute for Information Science and Technologies, Italy † Massachusetts Institute of Technology, USA

2 Outline This talk shows how one can apply machine learning techniques to find good phase orderings for an instruction scheduler First, I’ll introduce the scheduler that we are interested in improving Then, I’ll discuss genetic programming Then, I’ll present experimental results

3 R4000 like Processor Core Operand network Clustered Architectures Memory and registers separated into clusters  RAW  Clustered VLIWs When scheduling, we try to co-locate data with computation

4 Convergent Scheduling Convergent scheduling passes are symmetric Each pass takes as input a preference map and outputs a preference map Passes are modular and can be applied in any order

5 Convergent Scheduling Preference Maps Instructions Clusters Time 0123 4 5 6 7 0 1 2 3 Each entry is a weight The weights correspond to the “confidence” of a space-time assignment for a given instruction

6 Four clusters High confidence Low confidence Example Dependence Graph

7 Placement Propagation

8 Critical Path Strengthening

9 Path Propagation

10 Parallelism Distribute

11 Path Propagation

12 Communication Reduction

13 Path Propagation

14 Final Schedule

15 Convergent Scheduling “Classical” scheduling passes make absolute decisions that can’t be undone Convergent scheduling passes make soft decisions in the form of preferences  Mistakes made early on can be undone Passes don’t impose order! Pass

16 Double-Edged Sword The good news: convergent scheduling does not constrain phase order  Nice interface makes writing and integrating passes easy The bad news: convergent scheduling does not constrain phase order  Limitless number of phase orders to consider, some of which are much better than others

17 Our Proposal Use genetic programming to automatically search for a phase ordering that’s catered to a given  Architecture  Compiler Our inspiration comes from Cooper’s work [Cooper et al., LCTES 1999]

18 Genetic Programming Searching algorithm analogous to Darwinian evolution  Maintain a population of expressions (sequence INITTIME (sequence PLACE (if imbalanced LOAD COMM)))

19 Genetic Programming Searching algorithm analogous to Darwinian evolution  Maintain a population of expressions  Selection The fittest expressions in the population are more likely to reproduce  Reproduction Crossing over subexpressions of two expressions  Mutation

20 General Flow Create initial population (initial solutions) Evaluation Selection Randomly generated initial population Create Variants done?

21 General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Compiler is modified to use the given expression as the phase ordering Each expression is evaluated by compiling and running the benchmark(s) Fitness is the relative speedup over our original phase ordering on the benchmark(s)

22 General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Just as with Natural Selection, the fittest individuals are more likely to survive

23 General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Use crossover and mutation to generate new expressions And thus, generate new and hopefully improved phase orderings

24 Experimental Setup We use an in-house VLIW compiler (SUIF, MachSUIF) and simulator Compiler and simulator are parameterized so we can easily change VLIW configurations Experiments presented here are for clustered architectures  Details of the architectures are in the paper

25 Convergent Scheduling Heuristics Noise Introduction Initial Time Assignment Preplacement Critical Path Strengthening Communication Minimization Parallelism Distribution Load Balance Dependence Enforcement Assignment Strengthening Functional Unit Distribution Push to first cluster Critical Path Distance Cluster Creation Register Pressure Reduction in Time Register Pressure Reduction in Space

26 Hand-Tuned Results 4-cluster VLIW, Rich Interconnect

27 Results 4-cluster VLIW, Limited Interconnect

28 Training an Improved Sequence Goal: find a sequence that works well for all the benchmarks in the last graph (vmul, rbsorf, yuv, etc.) Train a sequence using these benchmarks then…  For each expression in the population compile and run all the benchmarks, take the average speedup as fitness

29 The Schedule Evolved sequence is much more conservative in communication inittime  func  dep  func  load  func  dep  func  comm  dep  func  comm  place func reduces weights of instructions on overloaded clusters dep increases probability that dependent instruction scheduled “nearby” comm tries to keep neighboring instructions in same cluster

30 Results 4-cluster VLIW, Limited Interconnect

31 Results Leave-One-Out Cross Validation

32 Summary of Results When we changed the architecture, the hand-tuned sequence failed  UAS and PCC outperform convergent scheduling Our GP system found a sequence that usually outperforms UAS and PCC Cross validation suggests that it is possible to find a “general-purpose” sequence

33 Running Time Using about 20 machines in a small cluster of workstations it takes about 2 days to evolve a sequence This is a one-time process!  Performed by the compiler vendor

34 Disappointing Result Unfortunately, sequences with conditionals are weeded out of the GP selection process  Our system rewards parsimony  Convergent scheduling passes make soft decisions, so running an extra pass may not be detrimental We’d like to get to the bottom of this unexpected result

35 Conclusions Using GP we’re able to find architecture- specific, application-independent sequences We can quickly retune the compiler when  The architecture changes  The compiler itself changes

36

37 Implemented Tests


Download ppt "Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *"

Similar presentations


Ads by Google