LLVM Greedy Register Allocator – Improving Region Split Decisions

Slides:



Advertisements
Similar presentations
Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.
Advertisements

Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Register Allocation Consists of two parts: Goal : minimize spills
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
Quincy: Fair Scheduling for Distributed Computing Clusters Microsoft Research Silicon Valley SOSP’09 Presented at the Big Data Reading Group by Babu Pillai.
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
1 Register Allocation Consists of two parts: –register allocation What will be stored in registers –Only unambiguous values –register assignment Which.
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
2005 International Symposium on Code Generation and Optimization Progressive Register Allocation for Irregular Architectures David Koes
Improving Code Generation Honors Compilers April 16 th 2002.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.
Generic Software Pipelining at the Assembly Level Markus Pister
Supplementary Lecture – Register Allocation EECS 483 University of Michigan.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
1 October 18, October 18, 2015October 18, 2015October 18, 2015 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel:
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos.
Project Presentation by Joshua George Advisor – Dr. Jack Davidson.
Register Usage Keep as many values in registers as possible Keep as many values in registers as possible Register assignment Register assignment Register.
Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
COMPILERS Liveness Analysis hussein suleman uct csc3003s 2009.
Simone Campanoni SSA Simone Campanoni
Chapter Overview General Concepts IA-32 Processor Architecture
Code Optimization Overview and Examples
Global Register Allocation Based on
Topic Register Allocation
C function call conventions and the stack
Mooly Sagiv html://
Optimizing Compilers Background
The compilation process
Register Allocation Hal Perkins Autumn 2009
Static Single Assignment
Topic Register Allocation
Introduction to Compilers Tim Teitelbaum
Register Allocation Hal Perkins Autumn 2011
Optimizing Transformations Hal Perkins Autumn 2011
CS 201 Compiler Construction
Optimizing Transformations Hal Perkins Winter 2008
Register Allocation Hal Perkins Summer 2004
Register Allocation Hal Perkins Autumn 2005
Lecture 16: Register Allocation
Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II
Instruction Scheduling Hal Perkins Autumn 2005
Compiler Construction
Lecture 17: Register Allocation via Graph Colouring
Code Generation Part II
COMPILERS Liveness Analysis
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Computer Organization and Assembly Language
(via graph coloring and spilling)
Return-to-libc Attacks
Register Allocation Harry Xu CS 142 (b) 05/08/2018.
CS 201 Compiler Construction
Presentation transcript:

LLVM Greedy Register Allocator – Improving Region Split Decisions Marina Yatsina Intel Corporation, Israel April 16-17, 2018 European Developers Meeting Bristol, United Kingdom

Motivation Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation * idiv implicitly clobbers %edx Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation %ecx %ebx %edi %edx %ebp %ecx %ebx %edi %ebp %ecx %ebx %edi * idiv implicitly clobbers %edx Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation %ecx %ebx %edi %edx %ebp %ecx %ebx %edi %ebp %ecx %ebx %edi * idiv implicitly clobbers %edx Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation %ecx %ebx %edi %edx %ebp %ecx %ebx %edi %ebp %ecx %ebx %edi * idiv implicitly clobbers %edx Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation %edx %ebp %ebp %edx * idiv implicitly clobbers %edx Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Motivation %edx %ebp %ebp %edx * idiv implicitly clobbers %edx Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll The redundant copies were inserted by the greedy register allocator.

Greedy Register Allocator Greedy Register Allocator Overview Region Split Encountered Issues Performance Impact

Greedy Register Allocator Greedy Register Allocator Overview Region Split Encountered Issues Performance Impact

High Level Design of Code Generator Instruction Selection Instruction Selection Scheduling and Formation Scheduling and Formation SSA-based Machine Code Optimizations SSA-based Machine Code Optimizations Register Allocation Prolog/Epilog Code Insertion Late Machine Code Optimizations Code Emission

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow Stand alone pass Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Live Interval Analysis BB#0: x = … … BB#1: … = …x… … BB#2: … … = …x…

Live Interval Analysis Earlier pass named SlotIndexes added numbering to the instructions 000 BB#0: 001 x = … 002 … 003 BB#1: 004 … = …x… 005 … 006 BB#2: 007 … 008 … = …x… 009 …

Live Interval Analysis Analyze x’s live interval x 000 BB#0: 001 x = … 002 … 003 BB#1: 004 … = …x… 005 … 006 BB#2: 007 … 008 … = …x… 009 …

Live Interval Analysis Look for uses and defs x 000 BB#0: 001 x = … 002 … 003 BB#1: 004 … = …x… 005 … 006 BB#2: 007 … 008 … = …x… 009 …

Live Interval Analysis Connect them into intervals x 000 BB#0: 001 x = … 002 … 003 BB#1: 004 … = …x… 005 … 006 BB#2: 007 … 008 … = …x… 009 …

Live Interval Analysis Represented x’s liveness as a collection of segments x [001, 004), [006, 008) 000 BB#0: 001 x = … 002 … 003 BB#1: 004 … = …x… 005 … 006 BB#2: 007 … 008 … = …x… 009 …

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview RegAllocGreedy Pass General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow For every interval Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Spill Weight Calculation Intervals calculated by Live Interval Analysis w x y z

Spill Weight Calculation Estimate spill weight of each interval based on interval characteristics w has uses in a hot loop Higher spill weight x is cheaply rematerializable Lower spill weight w x y z 30 KG 20 KG 10 KG 5 KG Based on number of uses/defs and their block frequency, if instruction is rematerializable or looks like induction variable etc.

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow For every interval Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Priority Queue Construction w x y z

Priority Queue Construction w x y z

Priority Queue Construction w x y z

Priority Queue Construction Calculate interval allocation priority and insert into the queue w x y z Priority Queue

Priority Queue Construction Calculate interval allocation priority and insert into the queue w is local in one basic block Lower allocation priority x y z Priority Queue w

Priority Queue Construction Calculate interval allocation priority and insert into the queue x is global and spans across a lot of instructions Higher allocation priority y z Priority Queue w x

Priority Queue Construction Calculate interval allocation priority and insert into the queue z Priority Queue y w x

Priority Queue Construction Calculate interval allocation priority and insert into the queue Priority Queue y w z x

Priority Queue Construction The Priority Queue will always dequeue the interval with the highest priority Priority Queue y w z x

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction For every interval in queue Register Assignment Eviction Split Spill

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Register Assignment R0, R1 are the physical registers in the system R0

Register Assignment R0 R1 Priority Queue w x y z

Register Assignment Dequeue interval with highest priority R0 R1 y w z Priority Queue y w z x

Register Assignment x R0 R1 Priority Queue y w z

Register Assignment Assign to available register if possible R0 R1 y w x Priority Queue x y w z x

Register Assignment Dequeue interval with highest priority R0 R1 y w z x Priority Queue x y w z x

Register Assignment z R0 R1 x Priority Queue x y w x

Register Assignment Assign to available register if possible R0 R1 y w x Priority Queue z x y w z x

Register Assignment Dequeue interval with highest priority R0 R1 y w x Priority Queue z x y w z x

Register Assignment w R0 R1 x Priority Queue z x y z x

Register Assignment Interference with x in R0 w R0 R1 y x z x z x Priority Queue z x y z x

Register Assignment Interference with z in R1 w R0 R1 y x z x z x Priority Queue z x y z x

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Eviction w R0 R1 x Priority Queue z x y z x

Eviction Compare spill weights of interfering intervals w R0 R1 y x z Priority Queue z x y z x 30 KG 20 KG 5 KG

Eviction x cheaper than w w R0 R1 y x z x z x 30 KG 20 KG 5 KG Priority Queue z x y z x 30 KG 20 KG 5 KG

Eviction Evict x w x R0 R1 Priority Queue z y z

Eviction Assign w x R0 R1 w Priority Queue z y z

Eviction R0 R1 w Priority Queue z y x z

Eviction Enqueue x back to the queue R0 R1 Usually receives the same allocation priority R0 R1 w Priority Queue z y x z Because it’s the old allocation priority it means it will probably enqueued with very high priority to the beginning of the queue.

Register Assignment R0 R1 w Priority Queue z y x z

Register Assignment Dequeue interval with highest priority R0 R1 y x w Priority Queue z y x z

Register Assignment x R0 R1 w Priority Queue z y z

Register Assignment Interference with w in R0 x R0 R1 y w z z Priority Queue z y z

Register Assignment Interference with z in R1 x R0 R1 y w z z Priority Queue z y z

Eviction Compare spill weights of interfering intervals x R0 R1 Can we Evict? x R0 R1 w Priority Queue z y z 30 KG 20 KG 5 KG

Eviction Compare spill weights of interfering intervals x R0 R1 Can we Evict? No! x is the cheapest one x R0 R1 w Priority Queue z y z 30 KG 20 KG 5 KG

Register Assignment x R0 R1 w Priority Queue z y z

Register Assignment Mark x to be split x* R0 R1 w Priority Queue z y z

Register Assignment R0 R1 w Priority Queue z y x* z

Register Assignment Enqueue x* back to the queue R0 R1 Intervals marked to be split receive lower allocation priority R0 R1 w Priority Queue z y x* z

Register Assignment R0 R1 w Priority Queue z x* y z

Register Assignment Dequeue interval with highest priority R0 R1 x* y Priority Queue z x* y z

Register Assignment y R0 R1 w Priority Queue z x* z

Register Assignment Assign to available register if possible R0 R1 x* w Priority Queue z x* y z

Register Assignment Dequeue interval with highest priority R0 R1 x* w Priority Queue z x* y z

Register Assignment x is marked to be split x* R0 R1 w z y z Priority Queue z y z

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Split Is split beneficial? x* R0 R1 w Priority Queue z y z

Split Is split beneficial? x* R0 R1 (Assume) Yes! w z y z Priority Queue z y z

Split Is split beneficial? x* R0 R1 (Assume) Yes! Find the best way to split x* R0 R1 w Priority Queue z y z

Split Do the split Add COPY instruction between the intervals x* x1 x2 Priority Queue z x2 = COPY x1 y z x1 = COPY x2

Split x1 x2 R0 R1 w Priority Queue z y z

Split Calculate spill weights x1 x2 R0 R1 Split artifacts may receive higher weight than the original interval x1 x2 R0 R1 w Priority Queue z y z 15 KG For example high use density in the resulted interval x2 may cause it to have higher spill weight then the original interval x. 2 KG

Split R0 R1 w Priority Queue z x1 x2 y z

Split Enqueue x1, x2 into the queue R0 R1 This will also calculate their allocation priority R0 R1 w Priority Queue z x1 x2 y z

Register Assignment R0 R1 w Priority Queue z x2 x1 y z

Register Assignment Dequeue interval with highest priority R0 R1 x2 x1 Priority Queue z x2 x1 y z

Register Assignment x1 R0 R1 w Priority Queue z x2 y z

Register Assignment Assign to available register if possible R0 R1 x2 w Priority Queue z x1 x2 y z x1

Register Assignment Dequeue interval with highest priority R0 R1 x2 x1 Priority Queue z x1 x2 y z x1

Register Assignment x2 R0 R1 x1 w Priority Queue z x1 y z x1

Register Assignment Interference with y in R0 x2 R0 R1 x1 w z x1 y z Priority Queue z x1 y z x1

Register Assignment Interference with z in R1 x2 R0 R1 x1 w z x1 y z Priority Queue z x1 y z x1

Register Assignment x2 R0 R1 x1 w Priority Queue z x1 y z x1

Eviction Compare spill weights of interfering intervals x2 R0 R1 x1 w Priority Queue z x1 y z x1 20 KG 15 KG 10 KG

Eviction y cheaper than x2 x2 R0 R1 x1 w z x1 y z x1 20 KG 15 KG 10 KG Priority Queue z x1 y z x1 20 KG 15 KG 10 KG

Eviction Evict y x2 y R0 R1 x1 w Priority Queue z x1 z x1

Eviction A split artifact can evict original interval Part of the split-eviction gradual refinement x2 y R0 R1 x1 w Priority Queue z x1 z x1

Eviction Assign x2 y R0 R1 x1 w Priority Queue z x1 x2 z x2 x1

Eviction R0 R1 x1 w Priority Queue z x1 y x2 z x2 x1

Eviction Enqueue y back to the queue R0 R1 y x1 w z x1 x2 z x2 x1 Priority Queue z x1 y x2 z x2 x1

Register Assignment R0 R1 x1 w Priority Queue z x1 y x2 z x2 x1

Register Assignment Dequeue interval with highest priority R0 R1 y x1 Priority Queue z x1 y x2 z x2 x1

Register Assignment y R0 R1 x1 w Priority Queue z x1 x2 z x2 x1

Register Assignment Interference with x2 in R0 y R0 R1 x1 w z x1 x2 z Priority Queue z x1 x2 z x2 x1

Register Assignment Interference with z in R1 y R0 R1 x1 w z x1 x2 z Priority Queue z x1 x2 z x2 x1

Eviction Compare spill weights of interfering intervals y R0 R1 Can we Evict? y R0 R1 x1 w Priority Queue z x1 x2 z x2 x1 20 KG 15 KG 10 KG

Eviction Compare spill weights of interfering intervals y R0 R1 Can we Evict? No! y is the cheapest one y R0 R1 x1 w Priority Queue z x1 x2 z x2 x1 20 KG 15 KG 10 KG

Eviction y R0 R1 x1 w Priority Queue z x1 x2 z x2 x1

Register Assignment Mark y to be split y* R0 R1 x1 w z x1 x2 z x2 x1 Priority Queue z x1 x2 z x2 x1

Register Assignment R0 R1 x1 w Priority Queue z x1 y* x2 z x2 x1

Register Assignment Enqueue y* back to the queue R0 R1 y* x1 w z x1 x2 Priority Queue z x1 y* x2 z x2 x1

Register Assignment R0 R1 x1 w Priority Queue z x1 y* x2 z x2 x1

Register Assignment Dequeue interval with highest priority R0 R1 y* x1 Priority Queue z x1 y* x2 z x2 x1

Register Assignment Register Assignment y* R0 R1 x1 w z x1 x2 z x2 x1 Priority Queue z x1 x2 z x2 x1

Split Is split beneficial? y* R0 R1 x1 w z x1 x2 z x2 x1 Priority Queue z x1 x2 z x2 x1

Split Is split beneficial? y* R0 R1 (Assume) No! x1 w z x1 x2 z x2 x1 Priority Queue z x1 x2 z x2 x1

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Spill y* R0 R1 x1 w Priority Queue z x1 x2 z x2 x1

Spill Spill around uses y* R0 R1 Spill after def Reload before use x1 w Priority Queue z x1 x2 spill z reload x2 x1

Spill Create new intervals for spills and reloads y* y1 y2 R0 R1 x1 w Priority Queue z x1 x2 spill spill z reload reload x2 x1

Spill y1 y2 R0 R1 x1 w Priority Queue z x1 x2 z x2 x1

Spill Calculate spill weights y1 y2 R0 R1 x1 w z x1 x2 z x2 x1 3 KG Priority Queue z x1 x2 z x2 x1 3 KG 2 KG

Spill Spill y1 y2 R0 R1 x1 w Priority Queue z x1 y1 y2 x2 z x2 x1

Spill Enqueue y1, y2 into the queue y1 y2 R0 R1 This will also calculate their allocation priority y1 y2 R0 R1 x1 w Priority Queue z x1 y1 y2 x2 z x2 x1

Register Assignment R0 R1 x1 w Priority Queue z x1 y1 y2 x2 z x2 x1

Register Assignment Dequeue interval with highest priority R0 R1 y1 y2 x1 w Priority Queue z x1 y1 y2 x2 z x2 x1

Register Assignment y2 R0 R1 x1 w Priority Queue z x1 y1 x2 z x2 x1

Register Assignment Assign to available register if possible R0 R1 y1 x1 w Priority Queue z x1 y1 x2 z y2 x2 x1

Register Assignment Dequeue interval with highest priority R0 R1 y1 x1 Priority Queue z x1 y1 x2 z y2 x2 x1

Register Assignment y1 R0 R1 x1 w Priority Queue z x1 x2 z y2 x2 x1

Register Assignment Assign to available register if possible R0 R1 x1 w Priority Queue z x1 x2 y1 z y2 x2 x1

Greedy Register Allocator Greedy Register Allocator Overview Region Split Encountered Issues Performance Impact

Motivation %ecx %ebx %edi %edx %ebp %ecx %ebx %edi %ebp %ecx %ebx %edi Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Exploration Why did the register allocator create these redundant mov instructions? Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Exploration Why did the register allocator create these redundant mov instructions? Artifacts of split x* x1 x2 x2 = COPY x1 x1 = COPY x2 Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Exploration Why did the register allocator create these redundant mov instructions? Artifacts of split If we would have chosen to do the split differently we could have avoided the redundant mov instructions Why was this way to split was chosen? Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Greedy Register Allocator Overview General flow Live Interval Analysis Spill Weight Calculation Priority Queue Construction Register Assignment Eviction Split Spill

Region Split How to find the best way to split? How to know if split is beneficial? x* R0 R1 w Priority Queue z y z

Find Best Split x R0 R1 R2 … Rn R0 - Rn are the physical registers

Find Best Split The registers already have assigned intervals x R0 R1 … Rn

Find Best Split The registers already have assigned intervals x R0 R1 … Rn y z w z

Find Best Split The registers already have assigned intervals x R0 R1 These intervals impose allocation constraints x R0 R1 R2 … Rn

Find Best Split Do the split of x for each one of the registers x R0 … Rn

Find Best Split Do the split of x for each one of the registers x R0 … Rn

Find Best Split Do the split of x for each one of the registers x R0 … Rn

Find Best Split Do the split of x for each one of the registers x R0 … Rn

Find Best Split Do the split of x for each one of the registers x R0 Estimate split cost, e.g. the amount of spill code this split may cause x R0 R1 R2 … Rn 20$ 10$ 15$ 20$

Find Best Split Do the split of x for each one of the registers x R0 Choose the cheapest one x R0 R1 R2 … Rn 20$ 10$ 15$ 20$

Find Best Split Do the split of x for each one of the registers x R0 … Rn

Find Best Split for Given Register How do we do the best split for a given register R1? x R1

Find Best Split for Given Register The region split is usually divided into 2 intervals x x1 x2 R1 x2 = COPY x1 x1 = COPY x2

Find Best Split for Given Register The region split is usually divided into 2 intervals ByReg Parts of x that pass on R1 register Should comply with current allocation constraints provided by intervals already assigned to R1 ByStack Parts of x that pass on or on the stack Usually where the already allocated R1 intervals interfere with x x x1 ByReg x2 ByStack R1 x2 = COPY x1 x1 = COPY x2

Find Best Split for Given Register A good split x x1 ByReg x2 ByStack R1 x2 = COPY x1 x1 = COPY x2

Find Best Split for Given Register A good split Reduces the transitions between ByReg and ByStack Each such transitions is potentially a spill/reload In case ByStack is not allocated to another register Places the transitions in blocks less frequently executed x x1 ByReg x2 ByStack R1 x2 = COPY x1 x1 = COPY x2

Find Best Split for Given Register A good split Reduces the transitions between ByReg and ByStack Each such transitions is potentially a spill/reload In case ByStack is not allocated to another register Places the transitions in blocks less frequently executed Use Hopfield neural network Converges to a result that satisfies the above characteristics x x1 ByReg x2 ByStack R1 x2 = COPY x1 x1 = COPY x2

Determine if Split is Beneficial Split reduces the amount of spills compared to spilling around uses BB#0: x = … … BB#1: … = …x… … BB#2: … BB#3: … BB#4: … BB#5: … = …x… …

Determine if Split is Beneficial Split reduces the amount of spills compared to spilling around uses Use/def blocks must have x in a register at some point BB#0: x = … … BB#1: … = …x… … BB#2: … BB#3: … BB#4: … BB#5: … = …x… …

Determine if Split is Beneficial Split reduces the amount of spills compared to spilling around uses Use/def blocks must have x in a register at some point If the split can create “regions” of several basic blocks where x is passed by register this will reduce the amount of spills Only if constraints allow it BB#0: x = … … ByReg BB#1: … = …x… … BB#2: … BB#3: … BB#4: … BB#5: … = …x… …

Determine if Split is Beneficial Split reduces the amount of spills compared to spilling around uses Use/def blocks must have x in a register at some point If the split can create “regions” of several basic blocks where x is passed by register this will reduce the amount of spills Only if constraints allow it BB#0: x = … … Passed by register region BB#1: … = …x… … BB#2: … BB#3: … BB#4: … BB#5: … = …x… …

Greedy Register Allocator Greedy Register Allocator Overview Region Split Encountered Issues Performance Impact

Region Split Cost Issues Inaccurate split cost calculation Root cause of the following encountered issues Does not model the affect of local interference caused by the split Makes the split cost inaccurate The “cheapest” split may actually be more expensive than other splits Can choose suboptimal split 20$ 10$ 15$ 20$

Local Interference Find best split of interval x for R1 R1 Using Hopfield neural network R1 BB#1: … … …

Local Interference Find best split of interval x for R1 R1 Using Hopfield neural network The network determines how x will be passed on the CFG edges R1 BB#1: … … …

Local Interference Find best split of interval x for R1 R1 Using Hopfield neural network The network determines how x will be passed on the CFG edges “ByReg” interval or “By stack” interval R1 x1 ByReg BB#1: … x2 ByStack

Local Interference Find best split of interval x for R1 R1 Using Hopfield neural network The network determines how x will be passed on the CFG edges “ByReg” interval or “By stack” interval Determined which basic block will have a copy between these two intervals R1 x1 ByReg BB#1: … x2 = x1 x2 ByStack

Local Interference Find best split of interval x for R1 R1 Using Hopfield neural network The network determines how x will be passed on the CFG edges “ByReg” interval or “By stack” interval Determined which basic block will have a copy between these two intervals The Hopfield neural network does not model what happens to x inside the basic block R1 x1 ByReg BB#1: … x2 ByStack

Local Interference The Hopfield neural network does not model what happens to x inside the basic block R1 BB#1: …

Local Interference The Hopfield neural network does not model what happens to x inside the basic block x split for R1 determined x’s ByReg interval should enter and leave BB#1 R1 x1 ByReg BB#1: … x1 ByReg

Local Interference The Hopfield neural network does not model what happens to x inside the basic block x split for R1 determined x’s ByReg interval should enter and leave BB#1 y in BB#1 is already assigned to R1 R1 y x1 ByReg BB#1: … y = … … = …y… x1 ByReg

Local Interference The Hopfield neural network does not model what happens to x inside the basic block x split for R1 determined x’s ByReg interval should enter and leave BB#1 y in BB#1 is already assigned to R1 x is used in BB#1 R1 y x1 ByReg BB#1: … = …x… y = … … = …y… … … = …x… … x1 ByReg

Local Interference The Hopfield neural network does not model what happens to x inside the basic block x split for R1 determined x’s ByReg interval should enter and leave BB#1 y in BB#1 is already assigned to R1 x is used in BB#1 y interferes with assigning x to R1 locally in BB#1 R1 y x1 ByReg BB#1: … = …x… y = … … = …y… … … = …x… … x1 ByReg

Local Interference The Hopfield neural network does not model what happens to x inside the basic block x split for R1 determined x’s ByReg interval should enter and leave BB#1 y in BB#1 is already assigned to R1 x is used in BB#1 y interferes with assigning x to R1 locally in BB#1 The part of x that contains this local interference will be added to x’s “ByStack” split artifact R1 y x1 ByReg BB#1: … = …x… y = … … = …y… … … = …x… … x2 ByStack x1 ByReg

Local Interference Local interferences may have very negative affects on assignment of the “ByStack” split artifact Can cause bad eviction chains Encountered issues #1, #2 Can cause a lot of reloads Encountered issue #3 This affect is not considered during split cost calculation R1 y x1 ByReg BB#1: … = …x… … = …y… … … = …x… … x2 ByStack x1 ByReg

Encountered Issue Bad eviction chain Cyclic eviction/split chain – Issue #1 Domino effect eviction – Issue #2 Multiple reloads from the same location Issue #3 Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 llvm/test/CodeGen/X86/ greedy_regalloc_bad_eviction_sequence.ll Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 %ecx %ebx %edi llvm/test/CodeGen/X86/ greedy_regalloc_bad_eviction_sequence.ll %ecx %ebx %edi %edx %ebp %ecx %ebx %edi %ebp %ecx %ebx %edi Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll %ecx %ebx %edi %edx

Encountered Issue #1 Bad eviction chain – scenario 1 Cyclic eviction/split chain Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 Cyclic eviction/split chain Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 y edi Cyclic eviction/split chain y evicts x from edi y edi x Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 y edi Cyclic eviction/split chain y evicts x from edi y edi x Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll 2 KG 1 KG

Encountered Issue #1 Bad eviction chain – scenario 1 y x edi y x Cyclic eviction/split chain y evicts x from edi y x edi Evicts y x Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x edi y x Cyclic eviction/split chain y evicts x from edi x edi Evicts y x y Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x edi y x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x edi Evicts y x y Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x edi y x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x edi Evicts y x y Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x1 x2 edi y x x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x1 x2 edi Evicts y x y x part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x1 edi y x x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x1 represent the part of the split that has local interference with y x1 edi Evicts y x y x Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x1 edi y x x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x1 represent the part of the split that has local interference with y x1 edi Evicts y x y x Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 x1 edi y x x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x1 represent the part of the split that has local interference with y x1 evicts y from edi x1 edi Evicts y x y x Interfering part of Split 3 KG Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll 2 KG

Encountered Issue #1 Bad eviction chain – scenario 1 x1 y edi y x x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x1 represent the part of the split that has local interference with y x1 evicts y from edi x1 y edi Evicts y x Evicts x Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 y edi y x x Cyclic eviction/split chain y evicts x from edi x is split into x1 and x2 x1 represent the part of the split that has local interference with y x1 evicts y from edi y edi Evicts y x x1 Evicts x Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 y x x Cyclic eviction/split chain Every such “movl” duo was created by cyclic eviction/split chain Evicts y x Evicts x Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 z w w Cyclic eviction/split chain Every such “movl” duo was created by cyclic eviction/split chain Evicts z w Evicts w Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 a b b Cyclic eviction/split chain Every such “movl” duo was created by cyclic eviction/split chain Evicts a b Evicts b Interfering part of Split Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll

Encountered Issue #1 Bad eviction chain – scenario 1 y x x The problem x is split in such a way that creates local interference split artifact That artifact causes cyclic eviction The solution Tailored for this case Identify if a split will create a local interference artifact Identify if that split artifact will cause a cyclic eviction Increase split cost Make this split less attractive compared to other splits Commit: https://reviews.llvm.org/rL316295 Evicts y x Evicts x Interfering part of Split 20$ Based on llvm/test/CodeGen/X86/greedy_regalloc_bad_eviction_sequence.ll The solution is not architecture specific, but is currently protected by a flag that is turned on only for x86.

Encountered Issue #2 Bad eviction chain – scenario 2 https://bugs.llvm.org/show_bug.cgi?id=26810

Encountered Issue #2 Bad eviction chain – scenario 2 https://bugs.llvm.org/show_bug.cgi?id=26810 This time parts of the chain are spread around %xmm5 %xmm4 %xmm3 %xmm2 %xmm7 %xmm5 %xmm4 %xmm3 %xmm7 %xmm5 %xmm4 %xmm3 %xmm5 %xmm4 %xmm3 %xmm2

Encountered Issue #2 Bad eviction chain – scenario 2 Domino effect eviction

Encountered Issue #2 Bad eviction chain – scenario 2 x xmm2 Domino effect eviction x evicts y from xmm2 x xmm2 y

Encountered Issue #2 Bad eviction chain – scenario 2 x xmm2 Domino effect eviction x evicts y from xmm2 x xmm2 y 4 KG 1 KG

Encountered Issue #2 Bad eviction chain – scenario 2 x y xmm2 x y Domino effect eviction x evicts y from xmm2 x y xmm2 x Evicts y

Encountered Issue #2 Bad eviction chain – scenario 2 y xmm2 x y Domino effect eviction x evicts y from xmm2 y xmm2 x Evicts y x

Encountered Issue #2 Bad eviction chain – scenario 2 y xmm2 x y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y xmm2 x Evicts y x

Encountered Issue #2 Bad eviction chain – scenario 2 y xmm2 x y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y xmm2 x Evicts y x

Encountered Issue #2 Bad eviction chain – scenario 2 y1 y2 xmm2 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 y2 xmm2 x Evicts y part of y Split x

Encountered Issue #2 Bad eviction chain – scenario 2 y1 xmm2 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 xmm2 x Evicts y Interfering part of y Split x

Encountered Issue #2 Bad eviction chain – scenario 2 y1 xmm2 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 xmm2 x Evicts y Interfering part of y Split x

Encountered Issue #2 Bad eviction chain – scenario 2 y1 xmm2 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 xmm2 x Evicts y Interfering part of y Split x 4 KG 3 KG

Encountered Issue #2 Bad eviction chain – scenario 2 y1 xmm2 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 xmm2 x Evicts y Interfering part of y Split x

Encountered Issue #2 Bad eviction chain – scenario 2 y1 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 x Evicts y Interfering part of y Split

Encountered Issue #2 Bad eviction chain – scenario 2 y1 xmm3 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 evicts z from xmm3 y1 xmm3 x Evicts y Interfering part of y Split z

Encountered Issue #2 Bad eviction chain – scenario 2 y1 xmm3 x y y Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 evicts z from xmm3 y1 xmm3 x Evicts y Interfering part of y Split z 3 KG 1 KG

Encountered Issue #2 Bad eviction chain – scenario 2 y1 z xmm3 x y y z Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 evicts z from xmm3 y1 z xmm3 x Evicts y y Interfering part of Split Evicts z

Encountered Issue #2 Bad eviction chain – scenario 2 z xmm3 x y y z Domino effect eviction x evicts y from xmm2 y is split into y1 and y2 for xmm2 y1 represent the part of the split that has local interference with x y1 cannot evict x from xmm2 y1 evicts z from xmm3 z xmm3 x Evicts y y Interfering part of Split y1 Evicts z

Encountered Issue #2 Bad eviction chain – scenario 2 z xmm3 x y y z Domino effect eviction y1 evicts z from xmm3 z xmm3 x Evicts y y Interfering part of Split y1 Evicts z

Encountered Issue #2 Bad eviction chain – scenario 2 z xmm3 x y y z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z xmm3 x Evicts y y Interfering part of Split y1 Evicts z

Encountered Issue #2 Bad eviction chain – scenario 2 z xmm3 x y y z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z xmm3 x Evicts y y Interfering part of Split y1 Evicts z

Encountered Issue #2 Bad eviction chain – scenario 2 z1 z2 xmm3 x y y Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 z2 xmm3 x Evicts y y Interfering part of Split y1 Evicts z z part of Split

Encountered Issue #2 Bad eviction chain – scenario 2 z1 xmm3 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 xmm3 x Evicts y y Interfering part of Split y1 Evicts z z Interfering part of Split

Encountered Issue #2 Bad eviction chain – scenario 2 z1 xmm3 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 xmm3 x Evicts y y Interfering part of Split y1 Evicts z z Interfering part of Split

Encountered Issue #2 Bad eviction chain – scenario 2 z1 xmm3 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 xmm3 x Evicts y y Interfering part of Split y1 Evicts z z Interfering part of Split 3 KG 2 KG

Encountered Issue #2 Bad eviction chain – scenario 2 z1 xmm3 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 xmm3 x Evicts y y Interfering part of Split y1 Evicts z z Interfering part of Split

Encountered Issue #2 Bad eviction chain – scenario 2 z1 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 x Evicts y y Interfering part of Split Evicts z z Interfering part of Split

Encountered Issue #2 Bad eviction chain – scenario 2 z1 xmm4 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 evicts w from xmm4 z1 xmm4 x Evicts y y Interfering part of Split w Evicts z z Interfering part of Split

Encountered Issue #2 Bad eviction chain – scenario 2 z1 xmm4 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 evicts w from xmm4 z1 xmm4 x Evicts y y Interfering part of Split w Evicts z z Interfering part of Split 2 KG 1 KG

Encountered Issue #2 Bad eviction chain – scenario 2 z1 w xmm4 x y y z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 evicts w from xmm4 z1 w xmm4 x Evicts y y Interfering part of Split Evicts z z Interfering part of Split w Evicts

Encountered Issue #2 Bad eviction chain – scenario 2 w xmm4 x y y z z Domino effect eviction y1 evicts z from xmm3 z is split into z1 and z2 for xmm3 z1 represent the part of the split that has local interference with y1 z1 cannot evict y1 from xmm3 z1 evicts w from xmm4 w xmm4 x Evicts y y Interfering part of Split z1 Evicts z z Interfering part of Split w Evicts

Encountered Issue #2 Bad eviction chain – scenario 2 x y y z z w Domino effect eviction Every such “movl” duo was created by the domino effect x Evicts y y Interfering part of Split Evicts z z Interfering part of Split w Evicts

Encountered Issue #2 Bad eviction chain – scenario 2 x y y z z w w a Domino effect eviction Every such “movl” duo was created by the domino effect x Evicts y y Interfering part of Split Evicts z z Interfering part of Split w Evicts w Interfering part of Split a Evicts

Encountered Issue #2 Bad eviction chain – scenario 2 x y y z z w w a a Domino effect eviction Every such “movl” duo was created by the domino effect x Evicts y y Interfering part of Split Evicts z z Interfering part of Split w Evicts w Interfering part of Split a Evicts a Interfering part of Split b Evicts

Encountered Issue #2 x y Bad eviction chain – scenario 2 y z z w Evicts y Bad eviction chain – scenario 2 The problem y is split to fit the register it was evicted from This split creates local interference split artifact that causes domino effect eviction The solution Tailored for this case Identify if a split for evicted register creates local interference artifact Identify if that split artifact will cause domino effect eviction Increase split cost Make this split less attractive compared to other splits Commit: https://reviews.llvm.org/rL316295 y Interfering part of Split Evicts z z Interfering part of Split w Evicts 20$ The solution is not architecture specific, but is currently protected by a flag that is turned on only for x86.

Encountered Issue #3 Multiple reloads from the same location

Encountered Issue #3 Multiple reloads from the same location All the reloads are from the same location

Encountered Issue #3 Multiple reloads from the same location All the reloads are from the same location Appeared in a hot loop after a higher level change

Encountered Issue #3 Multiple reloads from the same location Before Change After Change Loop MBB is the same until Greedy Other basic blocks have changed because of the higher level change, live ranges of intervals have changed – this is why we see an affect in the allocation decisions in the loop’s MBB as well.

Encountered Issue #3 Multiple reloads from the same location Before Change After Change Loop MBB is the same until Greedy x is split for R0 x is split for R1 Other basic blocks have changed because of the higher level change, live ranges of intervals have changed – this is why we see an affect in the allocation decisions in the loop’s MBB as well.

Encountered Issue #3 Multiple reloads from the same location Before Change After Change Loop MBB is the same until Greedy x is split for R0 Split doesn’t have local interferences x is split for R1 Split has local interference in Loop’s MBB

Encountered Issue #3 Multiple reloads from the same location Before Change After Change Loop MBB is the same until Greedy x is split for R0 Split doesn’t have local interferences x is split for R1 Split has local interference in Loop’s MBB Local interference spilled & reloaded around uses We have a lot of uses in the basic block that has the interference and this is why we see a lot of reloads.

Encountered Issue #3 Multiple reloads from the same location The problem Local interference interval has a lot of uses This interval is spilled and reloaded Solution Identify if the created local interference interval will spill Increase split cost Make this split less attractive compared to other splits Commit: https://reviews.llvm.org/rL323870 20$ The solution is not architecture specific, but is currently protected by a flag that is turned on only for x86.

Greedy Register Allocator Greedy Register Allocator Overview Region Split Encountered Issues Performance Impact

Fix for Bad Eviction Chains - Issues #1, #2 Fix affected mostly EEMBC workloads Regressions unrelated to this change No actual compile time impact on CTMark CTMark – part of the llvm test suite, aimed to test compile time

Fix for Multiple Reloads - Issue #3 Fix affected mostly EEMBC workloads No actual compile time impact on CTMark CTMark – part of the llvm test suite, aimed to test compile time

Conclusions Local interference caused by split may have a negative affect Current split cost does not take this affect into account Committed solutions tailored to catch 3 specific scenarios Need a more holistic approach for quantifying the cost of local interferences caused by split The solution is not architecture specific, but is currently protected by a flag that is turned on only for x86.

marina.yatsina@intel.com