Adapting Convergent Scheduling Using Machine Learning Diego Puppin, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe †

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † * Institute for Information Science and Technologies, Italy † Massachusetts Institute of Technology, USA

Outline This talk shows how one can apply machine learning techniques to find good phase orderings for an instruction scheduler First, I’ll introduce the scheduler that we are interested in improving Then, I’ll discuss genetic programming Then, I’ll present experimental results

R4000 like Processor Core Operand network Clustered Architectures Memory and registers separated into clusters  RAW  Clustered VLIWs When scheduling, we try to co-locate data with computation

Convergent Scheduling Convergent scheduling passes are symmetric Each pass takes as input a preference map and outputs a preference map Passes are modular and can be applied in any order

Convergent Scheduling Preference Maps Instructions Clusters Time 0123 4 5 6 7 0 1 2 3 Each entry is a weight The weights correspond to the “confidence” of a space-time assignment for a given instruction

Four clusters High confidence Low confidence Example Dependence Graph

Placement Propagation

Critical Path Strengthening

Path Propagation

Parallelism Distribute

Path Propagation

Communication Reduction

Path Propagation

Final Schedule

Convergent Scheduling “Classical” scheduling passes make absolute decisions that can’t be undone Convergent scheduling passes make soft decisions in the form of preferences  Mistakes made early on can be undone Passes don’t impose order! Pass

Double-Edged Sword The good news: convergent scheduling does not constrain phase order  Nice interface makes writing and integrating passes easy The bad news: convergent scheduling does not constrain phase order  Limitless number of phase orders to consider, some of which are much better than others

Our Proposal Use genetic programming to automatically search for a phase ordering that’s catered to a given  Architecture  Compiler Our inspiration comes from Cooper’s work [Cooper et al., LCTES 1999]

Genetic Programming Searching algorithm analogous to Darwinian evolution  Maintain a population of expressions (sequence INITTIME (sequence PLACE (if imbalanced LOAD COMM)))

Genetic Programming Searching algorithm analogous to Darwinian evolution  Maintain a population of expressions  Selection The fittest expressions in the population are more likely to reproduce  Reproduction Crossing over subexpressions of two expressions  Mutation

General Flow Create initial population (initial solutions) Evaluation Selection Randomly generated initial population Create Variants done?

General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Compiler is modified to use the given expression as the phase ordering Each expression is evaluated by compiling and running the benchmark(s) Fitness is the relative speedup over our original phase ordering on the benchmark(s)

General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Just as with Natural Selection, the fittest individuals are more likely to survive

General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Use crossover and mutation to generate new expressions And thus, generate new and hopefully improved phase orderings

Experimental Setup We use an in-house VLIW compiler (SUIF, MachSUIF) and simulator Compiler and simulator are parameterized so we can easily change VLIW configurations Experiments presented here are for clustered architectures  Details of the architectures are in the paper

Convergent Scheduling Heuristics Noise Introduction Initial Time Assignment Preplacement Critical Path Strengthening Communication Minimization Parallelism Distribution Load Balance Dependence Enforcement Assignment Strengthening Functional Unit Distribution Push to first cluster Critical Path Distance Cluster Creation Register Pressure Reduction in Time Register Pressure Reduction in Space

Hand-Tuned Results 4-cluster VLIW, Rich Interconnect

Results 4-cluster VLIW, Limited Interconnect

Training an Improved Sequence Goal: find a sequence that works well for all the benchmarks in the last graph (vmul, rbsorf, yuv, etc.) Train a sequence using these benchmarks then…  For each expression in the population compile and run all the benchmarks, take the average speedup as fitness

The Schedule Evolved sequence is much more conservative in communication inittime  func  dep  func  load  func  dep  func  comm  dep  func  comm  place func reduces weights of instructions on overloaded clusters dep increases probability that dependent instruction scheduled “nearby” comm tries to keep neighboring instructions in same cluster

Results 4-cluster VLIW, Limited Interconnect

Results Leave-One-Out Cross Validation

Summary of Results When we changed the architecture, the hand-tuned sequence failed  UAS and PCC outperform convergent scheduling Our GP system found a sequence that usually outperforms UAS and PCC Cross validation suggests that it is possible to find a “general-purpose” sequence

Running Time Using about 20 machines in a small cluster of workstations it takes about 2 days to evolve a sequence This is a one-time process!  Performed by the compiler vendor

Disappointing Result Unfortunately, sequences with conditionals are weeded out of the GP selection process  Our system rewards parsimony  Convergent scheduling passes make soft decisions, so running an extra pass may not be detrimental We’d like to get to the bottom of this unexpected result

Conclusions Using GP we’re able to find architecture- specific, application-independent sequences We can quickly retune the compiler when  The architecture changes  The compiler itself changes

Implemented Tests

Adapting Convergent Scheduling Using Machine Learning Diego Puppin, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe †

Similar presentations

Presentation on theme: "Adapting Convergent Scheduling Using Machine Learning Diego Puppin, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † "— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

Similar presentations

Presentation on theme: "Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *"— Presentation transcript:

Similar presentations

About project

Feedback

Adapting Convergent Scheduling Using Machine Learning Diego Puppin, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe †

Presentation on theme: "Adapting Convergent Scheduling Using Machine Learning Diego Puppin, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † "— Presentation transcript: