Presentation is loading. Please wait.

Presentation is loading. Please wait.

EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science.

Similar presentations


Presentation on theme: "EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science."— Presentation transcript:

1 EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana-Champaign IMPACT Group http://www.crhc.uiuc.edu/IMPACT/

2 EPIC Architectures and Compiler Technology Wen-mei Hwu Outline History and Background Control Speculation Predication IMPACT EPIC Architecture Compiler Technology Outlook

3 EPIC Architectures and Compiler Technology Wen-mei Hwu Vision: Bridging the Gap Between Programs and Hardware if (x>=0) if (x==1 || x==2 || x==3) m=f(x); else m=g(x); fg >==== 1320 1 m + x 0 enable x>=0 x!=3 x!=1 x!=2 m=g(x)m=f(x) T F T T T F F F

4 EPIC Architectures and Compiler Technology Wen-mei Hwu Can we get the best of both worlds? Hardware –highly speculative –parallel in nature –efficient logic manipulation –special purpose –area effiicient –enery efficient Programming –conservative semantics –sequential in nature –awkward logic manipulation –easily retargeted –area inefficient –energy inefficient

5 EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Design Objectives To define a programmable architecture model that allows compiled programs to approach special hardware design in –logic manipulation capability –speculation and parallelism –chip area efficiency –energy efficiency

6 EPIC Architectures and Compiler Technology Wen-mei Hwu Significant Milestones 1994 Intel/HP forms IA-64 alliance with U. of Illinois contribution 1997 Announcement of IA-64 1997 Motorola/Lucent forms StarCore alliance with U. of Illinois contribution 1998 major computer vendors adopt IA-64 1998 Announcement of StarCore 1999 Release of user mode architecture

7 EPIC Architectures and Compiler Technology Wen-mei Hwu Evolution of VLIW/EPIC

8 EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC - the IMPACT Perspective IMPACT work done since 1987 to lay foundation for EPIC architectures –Intel/HP IA-64, Motorola/Lucent StarCore Key Technologies –control speculation [ISCA-91] [ASPLOS-92] [MICRO-96] –data (dependence) speculation [ICS-92] [ASPLOS-94] –predicated execution [MICRO-92][ISCA-95] [MICRO-97] –integrated architecture and inline recovery [ISCA-98] –logic minimization approach to predication [ISCA-99] –implementation neutral predication architecture [TBD]

9 EPIC Architectures and Compiler Technology Wen-mei Hwu Outline History and Background Control Speculation Predication IMPACT EPIC Architecture Compiler Technology Outlook

10 EPIC Architectures and Compiler Technology Wen-mei Hwu Control Speculation Executing an instruction before knowing that its execution is required Moving an instruction above a branch –Removes control dependences to increase ILP –Win when branch directions predicted correctly Instruction sequence seen by hardware is changed! –Must ensure that execution result unaffected by such movement

11 EPIC Architectures and Compiler Technology Wen-mei Hwu Control Speculation Example A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 G: MEM(r2+4) = r4

12 EPIC Architectures and Compiler Technology Wen-mei Hwu Scheduling Error An ordering of instructions that will –cause early program termination or –produce results that differ from those of the unscheduled program. To avoid scheduling errors –Live value must be properly preserved - register renaming –Spurious Exception condition must be supressed

13 EPIC Architectures and Compiler Technology Wen-mei Hwu Safe Speculation Compiler analysis to identify –instructions that are always safe. –speculation that will not introduce a new exception. Trivial analysis examples: –array references with constant indices –divide and remainder with non-zero divisor Complex analysis examples: –Branches to ensure legal input operands –Earlier use of the same input operand –Loop analysis

14 EPIC Architectures and Compiler Technology Wen-mei Hwu Silent Instructions Architecture provides silent versions of instructions that may potentially cause exceptions. –Multiflow - silent FP instructions –HPPA - silent FP instructions, silent de-referenced null pointer –SPARC V9 - silent load instruction To move an instr. above a branch, convert it into its silent version. Both Multiflow TRACE and Cydrome Cydra-5 used similar ideas.

15 EPIC Architectures and Compiler Technology Wen-mei Hwu Silent Instructions Memory access instructions –If a segmentation fault condition occurs, the instruction is canceled before it reaches the memory system. An arbitrary garbage value is returned. –If a page fault happens without segmentation fault, the OS page fault handler is immediately invoked as usual. Extra page faults may occur from speculation. Arithmetic instructions –If a trap condition occurs, an arbitrary garbage value is deposited into the destination register. The exception condition is either immediately handled or simply ignored.

16 EPIC Architectures and Compiler Technology Wen-mei Hwu Debugging Implications If the speculated instruction: –the garbage value generated by a silent instruction would not be used. –the exception condition is correctly ignored since the silent instruction should not have been executed. If the branch agrees with compile-time prediction: –the exception condition that occurred to a silent instruction is incorrectly ignored. –the garbage value generated may be used by a subsequent instruction without warning. –not acceptable if exceptions must be reported timely and accurately

17 EPIC Architectures and Compiler Technology Wen-mei Hwu Performance Issues Page faults caused by silent loads are handled right away –no support to defer page fault until execution of instruction is confirmed. –Additional page may faults result from speculation. –The number additional page faults should be small for systems that are designed not to page. Similar issues exist if TLB misess are handled through exception mechanism.

18 EPIC Architectures and Compiler Technology Wen-mei Hwu Sentinel Scheduling Design Objective –Correctly ignore exceptions generated by speculative instructions whose execution turns out to be unnecessary. –Correctly report exceptions generated by speculative instructions whose execution is confirmed. –Support recovery from exceptions thus reported. –Provide the option to handle page faults after the need for executing a speculative instruction is confirmed. –Minimize the extra hardware and instructions needed to achieve the objectives above.

19 EPIC Architectures and Compiler Technology Wen-mei Hwu Accurate Exception Report Each instruction has two parts: –Non-excepting part which performs the actual operation –Sentinel part that flags an exception if necessary Non-excepting part of I can be speculatively executed provided the sentinel part stays in I's home block

20 EPIC Architectures and Compiler Technology Wen-mei Hwu Sentinel Speculation Example A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 sentinels B, C, D, E G: MEM(r2+4) = r4

21 EPIC Architectures and Compiler Technology Wen-mei Hwu Sentinel Elimination The sentinel of I can be eliminated if –there is another instruction in I's home block which uses the result of I OR –I is non-excepting and is not the last direct or indirect use of an excepting instruction's destination Unprotected instruction - an instruction whose sentinel cannot be eliminated. If an unprotected instruction is speculated, an explicit instruction must be created to serve as the sentinel

22 EPIC Architectures and Compiler Technology Wen-mei Hwu Sentinel Speculation Example A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 H: check r5 G: MEM(r2+4) = r4

23 EPIC Architectures and Compiler Technology Wen-mei Hwu Architectural Support Additional bit in opcode field to specify speculative instruction. –can be partially supported by adding speculative version of all opcodes that should be considered for speculative scheduling and that can directly or indirectly cause exceptions. Exception bit (vector) added to each register to mark exceptions caused by a speculative instruction. –These bits need to be preserved across context switches.

24 EPIC Architectures and Compiler Technology Wen-mei Hwu Execution Model Speculative instructions –src(I).except = 0 I does not cause an exception, normal execution I causes an exception – dest(I).except = 1 – dest(I).data = pc of I –src(I).except = 1 (exception propagation) dest(I).except = 1, dest(I).data = src(I).data

25 EPIC Architectures and Compiler Technology Wen-mei Hwu Execution Model Non-speculative instructions –src(I).except = 0 I does not cause an exception - normal execution I causes an exception - I reported as source of exception –src(I).except = 1 (report exception for speculative instruction) signal exception src(I).data is PC of exception

26 EPIC Architectures and Compiler Technology Wen-mei Hwu Scheduling Algorithm Identify unprotected instructions Perform conventional scheduling –if an unprotected instruction is moved above a branch, an explicit sentinel instruction is inserted into list of to- be-scheduled instructions –Explicit sentinel restricted to remain in I's home block with control dependences –All instructions moved above a branch are marked as speculative

27 EPIC Architectures and Compiler Technology Wen-mei Hwu Recovery from Exception Important to allow accurate handling of page faults and TLB misses. Issues: –ensure that instructions can be retried after the exception condition is handled –minimize the negative performance impact in terms of register pressure and instruction count due to recovery.

28 EPIC Architectures and Compiler Technology Wen-mei Hwu Recovery Block Copy speculative instructions into recovery blocks –One entrance point per potential exception reported by a sentinel –Code Expansion vs. Efficiency –must provide a means to reach recovery block - explicit checks Source registers of the instructions not in the recovery blocks are not preserved. Instructions re-executed during recovery are reduced.

29 EPIC Architectures and Compiler Technology Wen-mei Hwu Recovery Block Example A: r6 = r4+1 B: If (r9==0) goto L1 C: r1 = MEM(r2+0) D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 G: MEM(r2+r4) = r4 C: r1 = MEM(r2+0) A: r6 = r4+1 D: r3 = MEM(r2+4) E: r4 = r3+1 F: r5 = r1+1 B: if (r9==0) goto L1 H: check r5, L2 I: check r4, L3 G: MEM(r2+4) = r4

30 EPIC Architectures and Compiler Technology Wen-mei Hwu Recovery Block Example Recovery Block for C L2: C: r1 = MEM (r2+0) E: r5 = r1 + 1 Recovery Block for D L3: D: r3 = MEM (r2+r4) F: r4 = r3 + 1 G: MEM (r2+4) = r4

31 EPIC Architectures and Compiler Technology Wen-mei Hwu Multiple Exceptions Different basic blocks –first sequential exception always reported since check instruction guaranteed to remain in home block of each potential trap-causing instruction Same basic block –An exception will be signaled but no guarantee it will be the first according to original source code

32 EPIC Architectures and Compiler Technology Wen-mei Hwu Outline History and Background Control Speculation Predication IMPACT EPIC Architecture Compiler Technology Outlook

33 EPIC Architectures and Compiler Technology Wen-mei Hwu Predicated Execution Conditional execution of instructions based on a Boolean source operand Execution model –Load r1, r2, r3 –If p1 is TRUE, instruction executes normally –If p1 is FALSE, instruction treated as NOP (with some exceptions)

34 EPIC Architectures and Compiler Technology Wen-mei Hwu Full Predication Support Predicate defining instructions Full set of predicated instructions Separate predicate register file Best performance Cydra-5, IA-64, TI-C60, StarCore

35 EPIC Architectures and Compiler Technology Wen-mei Hwu Partial Predication Support Adds limited set of predicated instructions to existing ISA –no extension to operand format –CMOV Brings some performance increase to existing ISA’s SPARC, Alpha, MIPS, P6

36 EPIC Architectures and Compiler Technology Wen-mei Hwu HP-PD Predicate Defines pred dest, src1, src2 (Pin) - condition: =, >, <, etc. –Unconditional (U, U) –OR-type (O, O) –AND-type (A, A)

37 EPIC Architectures and Compiler Technology Wen-mei Hwu Unconditional Predicate Defines For blocks reached on one condition If (a < 10) c= c+1; else if (b > 20) d = d+1; else e = e+1; bge a, 10, L1 add c, c, 1 jmp L3 ble b, 20, L2 add d, d, 1 jmp L3 add e, e, 1 L3 F TF T

38 EPIC Architectures and Compiler Technology Wen-mei Hwu Unconditional Predicate Define Pout bge a, 10, L1 add c, c, 1 jmp L3 ble b, 20, L2 add d, d, 1 jmp L3 add e, e, 1 L3 F T F T pred p1(U), p2(U), a  10 add c, c, 1 (p2) pred p3(U), p3(U), b  20 (p1) add d, d, 1 (p4) add e, e, 1 (p3)

39 EPIC Architectures and Compiler Technology Wen-mei Hwu Or Predicate Defines For blocks reached on multiple conditions If (a && b) c= c+1; else d = d+1; beq a, 0, L1 beq b, 0, L1 add d, d, 1 jmp L2 L1: add e, e, 1 L2: F T F T

40 EPIC Architectures and Compiler Technology Wen-mei Hwu Or-type Predicate Define Pout pred_clr p1 pred p1(O), p2(U), a = 0 pred p1(O), p3(U), b = 0 (p2) add d, d, 1 (p3) add e, e, 1 (p1) bge a, 0, L1 ble b, 0, L1 add d, d, 1 jmp L2 L1: add e, e, 1 L2: F T F T

41 EPIC Architectures and Compiler Technology Wen-mei Hwu And-type Predicate Define Pout pred_clr p1 pred_set p3 pred p1(O), p3(A), a = 0 pred p1(O), p3(A), b = 0 add d, d, 1 (p3) add e, e, 1 (p1) bge a, 0, L1 ble b, 0, L1 add d, d, 1 jmp L2 L1: add e, e, 1 L2: F T F T

42 EPIC Architectures and Compiler Technology Wen-mei Hwu Outline History and Background Control Speculation Predication IMPACT EPIC Architecture Compiler Technology Outlook

43 EPIC Architectures and Compiler Technology Wen-mei Hwu IMPACT EPIC Architecture Predication –base model is HP-PD [Schlansker,Rau, Kathail] –added implicit predicate pR to facilitate speculation –prefix alternative for code size control [EuroPar-99] –added new conjunctive and disjunctive types to facilitate minimization of program decision logic –moving towards implementation-neutral predication Control Speculation –based on Sentinel model [ASPLOS-92] –added R-Tags (in addition to E-tags) and pR (implicit recovery predicate) to enable inline recovery

44 EPIC Architectures and Compiler Technology Wen-mei Hwu IMPACT EPIC Architecture Register File Value/PCE-TagR-Tag Memory Conflict Buffer RegisterTag and Attribute S Instructions DS T/FE-TagR-Tag Predicate Register File LOADPred S DS CHECKPred OPERATION T/FE-TagR-TagpR

45 EPIC Architectures and Compiler Technology Wen-mei Hwu Control Speculative Execution Speculative instruction causes an exception –write current PC into destination register –set E-Tag in destination register Speculative instruction propagates an exception –a source register with set E-Tag –Propagate PC from source to destination register –set E-Tag in destination register Non-speculative instruction detects exceptions –a source register with set E-Tag

46 EPIC Architectures and Compiler Technology Wen-mei Hwu Microprocessor Microarchitecture

47 EPIC Architectures and Compiler Technology Wen-mei Hwu Result of Applying EPIC Techniques

48 EPIC Architectures and Compiler Technology Wen-mei Hwu Integrated Predication and Control Speculation All of the following must be true for a predicated instruction to take effect –input predicate true –input predicate E-Tag false –either pR false, or R-Tag of at least one input registers true

49 EPIC Architectures and Compiler Technology Wen-mei Hwu Speculation Example Speculative (affected by exception) speculative (not affected) Non-speculative branch check (non-speculative use)

50 EPIC Architectures and Compiler Technology Wen-mei Hwu Inline Recovery Model Processor enters recovery mode, set pR –PC in source register used as recovery PC –The speculative instruction at recovery PC is executed non-speculatively. –Exception processing is performed. –If exception is non-terminating, the result is stored into destination register, set R-Tag. –Instructions with R-Tag set in source registers are executed, set R-Tag in destination register

51 EPIC Architectures and Compiler Technology Wen-mei Hwu Inline Recovery Model (cont.) –Non-speculative instructions not repeated. Stores, self-incrementing loads and stores, etc. are safe. Same effect is achieved by recovery blocks. Source registers of non-speculative instructions do not need to be preserved. –Branches and predicate defines repeated to reproduce original control flow input condition must be preserved –Recovery model is turned off when reaching check with set source R-Tag.

52 EPIC Architectures and Compiler Technology Wen-mei Hwu Recovery Block - Code Size

53 EPIC Architectures and Compiler Technology Wen-mei Hwu Instruction Cache Miss Comparison (32k, direct mapped)

54 EPIC Architectures and Compiler Technology Wen-mei Hwu Instruction Cache Miss Comparison (64k, 8way)

55 EPIC Architectures and Compiler Technology Wen-mei Hwu Spurious Cache Misses and Exceptions Spurious cache misses, TLB misses, and page faults are frequent in speculated code. Failing to suppress them can have a detrimental effect on performance.

56 EPIC Architectures and Compiler Technology Wen-mei Hwu Outline History and Background Control Speculation Predication IMPACT EPIC Architecture Compiler Technology Outlook

57 EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Compiler Technology Overview If-Conversion Classical Optimization Predicate Optimization ILP Optimization Scheduling/Partial Reverse If-Conversion Register Allocation Code Generation Debugging of Optimized Code Predicated Dataflow Predicate Analysis Source Memory Disambiguation Machine Description

58 EPIC Architectures and Compiler Technology Wen-mei Hwu Technology Vision Compiler Technology –analyze programmatic intentions pointer alias analysis integer range analysis predicate analysis – transformations program decision logic minimization [ISCA-99] fully resolved predicate optimizatios data structure optimization algorithm transformations Architecture Support –logic manipulation instructions efficient condition tests instructions to efficiently combine conditions –highly effective speculative execution cache misses, TLB misses exceptions and dependence violations [ISCA-98]

59 EPIC Architectures and Compiler Technology Wen-mei Hwu Vision: Bridging the Gap Between Programs and Hardware if (x>=0) if (x==1 || x==2 || x==3) m=f(x); else m=g(x); fg >==== 1320 1 m + x 0 enable x>=0 x!=3 x!=1 x!=2 m=g(x)m=f(x) T F T T T F F F

60 EPIC Architectures and Compiler Technology Wen-mei Hwu Analysis of Predicated Codes Live Variable Analysis Example: –Without Predicate Aware Dataflow (Only instructions on TRUE predicate can kill.) R7 is defined and killed by instruction 5; R7 is used by instruction 6. R7’s live range is (5,6). R3 is not defined and killed by instruction 3 in all cases because it is predicated on P1. R3 is used by instruction 4. R3’s live range is (1,2,3,4) and live out the top of the CB. –With Predicate Aware Dataflow R7’s live range is also (5,6). R3’s live range is (3,4) because instruction 3 defines R3 for all uses by instruction 4. This is known by studying the relation of P1 to P2. Dataflow without regard to predicates leads to conservative results.

61 EPIC Architectures and Compiler Technology Wen-mei Hwu Dataflow Analysis of Predicated Code Traditional dataflow requires reverse if-conversion (RIC) RIC of some codes is exponential (wc: 5,20,80,240,...) Factoring reduces order of complexity (wc: 8,15,22,28,...) RIC of one iter. (width 5) Code example (wc) RIC of code with 2x unroll (width 20)

62 EPIC Architectures and Compiler Technology Wen-mei Hwu Code Size Control using Predication Code example (099.go copyshape): Predication reduced code size by instruction merging (in example 35%) Original Predicated B S L J S B S L J S B S L J S B S L J S S L J S B S L J S B S L J S B P X PPPP P PL X X S S S S X Code example (MediaBench Experimental Image Compression reflect1): Original (Overhead=8/17 instrs (47%)) Optimized (6/19 (30%)) Predicated (3/13 (23%)) B1 J B2 B1 J J J J B2 J J P1 I0 I1 I3 I4 I5 I6 I0 I1 I7 I8 I9 I2 I4 I3 I5 I8 I9 I7 I6 I2 I7 I8 I9 I8 I7 P3 I1 P2 I3 I4I5 I6

63 EPIC Architectures and Compiler Technology Wen-mei Hwu Program Decision Logic Optimization Express control as a predicate network Reformulate decision as a logic network mimicking circuit minimization techniques

64 EPIC Architectures and Compiler Technology Wen-mei Hwu Working Example - 132.ijpeg in SPEC95 Contains 477 functions and 25,889 lines of code Spends 200 seconds and 18MB of memory in analysis 229 of 266 indirect call-sites are converted into direct ones f6 f3f7 f3(&s1, &i, &j); f7(s1); f? *s->p = 10; *s->q = 20; (*s->fp)(s); s1s i j v1 v2 s1q p j i fp f5 t = malloc(); t->p = v1; t->q = v2; t->fp = f5; *s = t; Prior to object elevation After object elevation

65 EPIC Architectures and Compiler Technology Wen-mei Hwu Debugging of Optimized Code (PLDI-99) When to take over execution and when to stop forward recovery? –original execution order of instructions has to be tracked –instructions might be moved up to different paths leading to the breakpoint or down to different paths starting from the breakpoint I1(S1) I1’(S4) I5 (S4) I2(S2) I3(S3) I4(S3) A B C D E F I1(S1) I1’(S4) I5 (S4) I2(S2) I3(S3) I4(S3) A B C D E F I3’(S3) I4’(S3) I1’’(S2,S4) A B D E F I3’(S3) I4’(S3) breakpoint I5 (S4)

66 EPIC Architectures and Compiler Technology Wen-mei Hwu Outline History and Background Control Speculation Predication IMPACT EPIC Architecture Compiler Technology Outlook

67 EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Research Challenges Implementation neutral architecture Profile independence and program transparent profiling Code size optimizations Analysis of predicated code Interprocedural alias analysis Debugging of optimized code

68 EPIC Architectures and Compiler Technology Wen-mei Hwu Outlook Compilers critical to the performance of EPIC uP’s –Use of predication and speculation is a serious challenge –Any misuse will lead to performance loss. –Brand new algorithms will be deployed in the EPIC compilers. –Existing software development models must be supported. Expect performance robustness issues –Awesome performance leap seen for some applications. –Less for others due to limitations of analyses and optimizations. –It can take years for the performance gain to be universal. –A lot of research activities needed, www.trimaran.org. Evolution of EPIC architectures –Revisions of architectures are likely as compilers mature. –Code size and power consumption are critical for embedded EPICs.


Download ppt "EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science."

Similar presentations


Ads by Google