Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Systems Performance Workshop 2004© David Ryan Koes 20041 MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani.

Similar presentations


Presentation on theme: "Memory Systems Performance Workshop 2004© David Ryan Koes 20041 MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani."— Presentation transcript:

1 Memory Systems Performance Workshop 2004© David Ryan Koes 20041 MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani Seth Copen Goldstein

2 Memory Systems Performance Workshop 2004© David Ryan Koes 20042 Outline Motivation #pragma independent Automated Annotation Evaluation Conclusion

3 Memory Systems Performance Workshop 2004© David Ryan Koes 20043 Problem Potentially aliasing pointers inhibit compiler optimization. Fully determining pointer aliasing may be infeasible or expensive. How to get the benefit without paying the cost?

4 Memory Systems Performance Workshop 2004© David Ryan Koes 20044 Memory Dependencies Memory dependencies inhibit optimization Introduce edges into dependence graph Limits parallelization Inhibits code motion –instruction scheduling –loop invariant code motion –partial redundancy elimination –register promotion Breaking memory dependencies difficult compile-time analysis infeasible or expensive run-time analysis limited to local window

5 Memory Systems Performance Workshop 2004© David Ryan Koes 20045 Examples while(len--) { *p++ = *q++; } There is a real data dependence between the load and store within a single iteration. Unroll loop to exploit parallelism.L26: mov r24 = r33 mov r17 = r32 adds r22 = 8, r33 adds r19 = 8, r32 adds r20 = 12, r33 adds r21 = 12, r32 ;; ld4 r14 = [r24], 4 adds r33 = 16, r33 adds r32 = 16, r32 ;; st4 [r17] = r14, 4 ld4 r23 = [r24] ;; st4 [r17] = r23 ld4 r18 = [r22] ;; st4 [r19] = r18 ld4 r16 = [r20] ;; st4 [r21] = r16 br.cloop.L26 ;; Itanium assembly from gcc.L26: mov r18 = r33 mov r23 = r32 adds r25 = 8, r33 adds r24 = 12, r33 adds r22 = 8, r32 adds r21 = 12, r32 ;; ld4 r14 = [r18], 4 ld4 r19 = [r25] adds r33 = 16, r33 adds r32 = 16, r32 ;; st4 [r23] = r14, 4 ld4 r16 = [r18] ld4 r20 = [r24] ;;.mmb st4 [r23] = r16 st4 [r22] = r19 st4 [r21] = r20 br.cloop.L26 ;; without memory dependence

6 Memory Systems Performance Workshop 2004© David Ryan Koes 20046 Examples for(i = 0; i < len; i++) {...... = *q;... *p =... } t0 = *q; for(i = 0; i < len; i++) {...... = t0;... t1 =... } *p = t1; if loop was executed t0 = *q; if loop will be executed for(i = 0; i < len; i++) {...... = t0;... *p =... } loop invariant code motion register promotion Hardware can’t do this

7 Memory Systems Performance Workshop 2004© David Ryan Koes 20047 Pointer Analysis Memory Disambiguation is important hardware can’t do everything so have compiler figure it out... int p[10]; foo() { int q[10];... } foo() { int *p, *q; int a,b; if(...) { p = &a; q = &b; } else { p = &b; q = &a; }... } foo(int *p, int *q) {... } easy! harder.. need precise dataflow analysis requires inter-procedural information

8 Memory Systems Performance Workshop 2004© David Ryan Koes 20048 Inter-procedural Pointer Analysis Just apply same techniques as used for intraprocedural may not be possible –gcc -c foo.c may not be feasible –n 2 analysis on source code of Microsoft Office? Use less precise analysis still might not be possible (separate compilation, libraries) still takes time (every time you compile, or at least link) less precise » less optimization

9 Memory Systems Performance Workshop 2004© David Ryan Koes 20049 Alternative: Have Programmer Do It Programmer annotates source code informs compiler of pointer relationships Previous Work ANSI C99 restrict keyword –difficult for compiler and programmer to reason about –non-local semantics MIPSpro #pragma ivdep –break loop carried dependence in inner loop

10 Memory Systems Performance Workshop 2004© David Ryan Koes 200410 Outline Motivation #pragma independent Automated Annotation Evaluation Conclusion

11 Memory Systems Performance Workshop 2004© David Ryan Koes 200411 #pragma independent Syntax #pragma independent ptr1 ptr2 Example int x[100] int y; void foo(int *a, int *b) { #pragma independent a b int arr[50]; … } x y malloc_site_1 arr malloc_site_2 pointers guaranteed to always point to different objects

12 Memory Systems Performance Workshop 2004© David Ryan Koes 200412 Examples void f(int len, int * p, int * q) { #pragma independent p q while (len--) *p++ = *q++; } void example(int *a, int *b, int *c) { #pragma independent a b #pragma independent a c (*b)++; *a = *b; *a = *a + *c; } pragmas allow compiler to eliminate a store to *a

13 Memory Systems Performance Workshop 2004© David Ryan Koes 200413 #pragma independent Advantages more flexible and powerful than restrict relationships between pointers explicit easy to reason about –effects only listed pointers easy to implement in compiler –fewer than 100 lines of code Possible Disadvantage could take programmer a lot of time to annotate existing source

14 Memory Systems Performance Workshop 2004© David Ryan Koes 200414 Outline Motivation #pragma independent Automated Annotation Evaluation Conclusion

15 Memory Systems Performance Workshop 2004© David Ryan Koes 200415 Automated Annotation Toolflow *.c *.h compilerexecution script pragma aware compiler programmer executable with runtime checks invalid pointer pairs execution frequencies candidate pointer pairs static scores pragma annotations ranked by score source code with verified pragmas faster executable Compiler finds interesting pointer pairs pairs which inhibit optimization pairs whose aliasing is unknown Inserts profiling code and checks inputs

16 Memory Systems Performance Workshop 2004© David Ryan Koes 200416 Automated Annotation Toolflow *.c *.h compilerexecution script pragma aware compiler programmer executable with runtime checks invalid pointer pairs execution frequencies candidate pointer pairs static scores pragma annotations ranked by score source code with verified pragmas faster executable Instrumented executable run on input records pointers which conflict counts number of pointer uses inputs

17 Memory Systems Performance Workshop 2004© David Ryan Koes 200417 Automated Annotation Toolflow *.c *.h compilerexecution script pragma aware compiler programmer executable with runtime checks invalid pointer pairs execution frequencies candidate pointer pairs static scores pragma annotations ranked by score source code with verified pragmas faster executable Script combines static and dynamic info eliminates conflicting pairs assigns score to each pair inputs

18 Memory Systems Performance Workshop 2004© David Ryan Koes 200418 Automated Annotation Toolflow *.c *.h compilerexecution script pragma aware compiler programmer executable with runtime checks invalid pointer pairs execution frequencies candidate pointer pairs static scores pragma annotations ranked by score source code with verified pragmas faster executable Programmer verifies pointer pairs can verify high scoring pairs only inputs

19 Memory Systems Performance Workshop 2004© David Ryan Koes 200419 Example Output void summer(int *p, int *q, int n, int *result) { #pragma independent p q /* score: 1100 */ #pragma independent p result /* score: 15 */ #pragma independent q result /* score: 12 */ int i, sum = 0; for(i = 0; i < n; i++) { *p += *q; sum += *q; } *result = sum; }

20 Memory Systems Performance Workshop 2004© David Ryan Koes 200420 Sample Score Distribution

21 Memory Systems Performance Workshop 2004© David Ryan Koes 200421 Outline Motivation #pragma independent Automated Annotation Evaluation Conclusion

22 Memory Systems Performance Workshop 2004© David Ryan Koes 200422 Targets & Benchmarks Targets Itanium EPIC/VLIW architecture instruction scheduling important for good performance ASH (Application Specific Hardware) can take full advantage of parallelism Benchmarks Mediabench small, multimedia applications can’t time accurately on Itanium Spec95, Spec2000 general purpose integer longer running –sometimes days for ASH simulation

23 Memory Systems Performance Workshop 2004© David Ryan Koes 200423 Compilers gcc not very sophisticated optimizations -funroll-loops -O2 CASH more sophisticated optimizations memory dependencies are first class objects –token edge –pragma independent removes edge

24 Memory Systems Performance Workshop 2004© David Ryan Koes 200424 Questions Do we find a reasonable number of potential annotations? Yes! Do the annotations result in faster code? Yes! Does our scoring mechanism find the pointer pairs with the biggest impact on performance? Yes! How much time does the programmer have to spend verifying pragmas? Not a lot!

25 Memory Systems Performance Workshop 2004© David Ryan Koes 200425 Annotations Found

26 Memory Systems Performance Workshop 2004© David Ryan Koes 200426 Do the annotations result in faster code? Of 19 Spec benchmarks, these were the only ones to demonstrate measurable speedup Itanium Speedup

27 Memory Systems Performance Workshop 2004© David Ryan Koes 200427 Do the annotations result in faster code? CASH Speedup

28 Memory Systems Performance Workshop 2004© David Ryan Koes 200428 Does our scoring mechanism work? mpeg2_e

29 Memory Systems Performance Workshop 2004© David Ryan Koes 200429 How much time does the programmer have to spend?

30 Memory Systems Performance Workshop 2004© David Ryan Koes 200430 Verified Speedup

31 Memory Systems Performance Workshop 2004© David Ryan Koes 200431 Conclusions We’ve performed a limit study of pointer analysis gcc doesn’t fully exploit the results of pointer analysis CASH and ASH can fully exploit parallelism Programmer specified annotations are effective faster and more flexible than inter-procedural analysis Annotations can be automatically generated automatic score successfully focuses programmer’s attention manual verification does not take long

32 Memory Systems Performance Workshop 2004© David Ryan Koes 200432

33 Memory Systems Performance Workshop 2004© David Ryan Koes 200433 ANSI C99 restrict keyword An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to that object use, directly or indirectly, the value of that particular pointer.) The intended use of the restrict qualifier (like the register storage class) is to promote optimization, and deleting all instances of the qualifier from all preprocessing translation units composing a conforming program does not change its meaning (i.e., observable behavior). ISO/IEC 9899 Second edition 1999-12-01 6.7.3-7

34 Memory Systems Performance Workshop 2004© David Ryan Koes 200434 restrict Example void f(int len, int * restrict p, int * restrict q) { while (len--) *p++ = *q++; } restrict tells the compiler that p and q refer to different objects, enabling optimizations

35 Memory Systems Performance Workshop 2004© David Ryan Koes 200435 Problems with restrict 6.7.3.1

36 Memory Systems Performance Workshop 2004© David Ryan Koes 200436 gcc’s restrict Implementation No two restricted pointers can alias A restricted pointer and an unrestricted pointer may alias This definition is intuitive for both the programmer and compiler But not the C99 definition!


Download ppt "Memory Systems Performance Workshop 2004© David Ryan Koes 20041 MSP 2004 Programmer Specified Pointer Independence David Koes Mihai Budiu Girish Venkataramani."

Similar presentations


Ads by Google