From Sequences of Dependent Instructions to Functions: A Complexity-Effective Approach for Improving Performance Without ILP or Speculation Sami YEHIA.

From Sequences of Dependent Instructions to Functions: A Complexity-Effective Approach for Improving Performance Without ILP or Speculation Sami YEHIA and Olivier TEMAM LRI, Paris South University France

2/18 Scaling Up Processors  Larger pipelines, caches, instruction windows and reservation stations  Aggressive speculation mechanisms : branch prediction, value prediction, data prefetching..  Rely on ILP exploitation  What about scaling with little ILP?

3/18 Concept  2 64*num_registers input! (Theoretically) … addq r1,r2,r3 subq r3,10,r4 … sll r5,6,r6 addq r5,r5,r4 Program r1r2r3rn r6 = f 1 (r1,r2,…,rn)r4 = f 2 (r1,r2,…,rn) Logic circuit r1 63 r1 62 r1 61 r1 1 r1 0 f1 63 f1 62 f1 61 f1 1 f1 0  Combinatorial Functions  A sequence of instructions is a set of functions

4/18 Principles  An « independent » Function for each output f r3 (r9,r10) = r9 + r10 – 1 f r4 (r9,r10) = sign_extension(r9 + r10 – 1)31:0 f r5 (r9,r10) = ((r9 + r10 – 1) > 1 f br (r9,r10) = (r9 + r10 – 1)  ((r9 + r10 – 1) >1) DFG

5/18 Hardware Operator + + ab out c f1f1 f1 i = f’(a i,b i,cout1 i-1 ) cout1 i =f’ c (a i,b i,cout1 i-1 ) out i = f’’(f1 i,c i,cout2 i-1 ) = f’’(a i,b i,c i,cout1 i-1,cout2 i-1 ) cout2 i = f’’ c (a i,b i,c i,cout1 i-1,cout2 i-1 )  Eliminate dependencies to calculate a+b+c  r10 + r9 –1 to hardware operators

6/18 Complexity Effectiveness  Scalability of ILP Vs. Functions Complexity Performance ILP exploitation Functions

7/18 Related Work  ASIC  General-Purpose context 3-1 Interlock Collapsing ALU [Y. Sazeides, S. Vassiliadis and J. Smith, Micro’ 29, 1996] Chimaera [Z. YE et al., ISCA’ 27, 2000] Grid Processors [R. Nagarajan et al., MICRO’ 34, 2001] Cascade one or more hardware operators to execute specific functions AND OR XOR ANDORXOR Adder   

8/18 Building Functions  From traces of instructions to configuration macros compilation toolchain to study: Potential of the approach Performance analysis on a superscalar processor Traces

9/18 Potential of the Approach  Cuts : limits to DFG collapsing (height) Number of inputs Non-collapsable instructions Load instructions (27,7 %) Carries from upper significant bits  Theoretical speedup  The lower the ILP the higher speedup op LD op mem F2 mem F1 @ op Cut @

10/18 Theoretical Speedup

11/18 Number of Inputs

12/18 Non Collapsable Instructions

13/18 Implementation rePlay Framework

14/18 Performance Evaluation

15/18 RePlay Optimization Engine Delay  Function built “offline”

16/18 Latency of Function units

17/18 Future Work  Address prediction to overcome Load cuts Address Prediction & Cache Preloading op LD op mem F2 mem F1 @ op LD op mem @ op @’ F1 @ LD @’ F2 mem

18/18 Q & A

Carries from Upper Significant Bits

Optimization Engine Delay

Latency of Function units

From Sequences of Dependent Instructions to Functions: A Complexity-Effective Approach for Improving Performance Without ILP or Speculation Sami YEHIA.

Similar presentations

Presentation on theme: "From Sequences of Dependent Instructions to Functions: A Complexity-Effective Approach for Improving Performance Without ILP or Speculation Sami YEHIA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Sequences of Dependent Instructions to Functions: A Complexity-Effective Approach for Improving Performance Without ILP or Speculation Sami YEHIA.

Similar presentations

Presentation on theme: "From Sequences of Dependent Instructions to Functions: A Complexity-Effective Approach for Improving Performance Without ILP or Speculation Sami YEHIA."— Presentation transcript:

Similar presentations

About project

Feedback