© 2009 IBM Corporation 19-20 July, 2009 | PADTAD Chicago, Illinois A Proposal of Operation History Management System for Source-to-Source Optimization.

Slides:

Advertisements

Similar presentations

Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.

Advertisements

Lab 2 – DSP software architecture and the real life DSP characteristics of signals that make it necessary.

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow

MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.

Original Development Team The Compiler and Architecture Research Group (formerly part of Hewlett-Packard Laboratories) Illinois Microarchitecture Project.

Discovery of Locality-Improving Refactorings by Reuse Path Analysis – Kristof Beyls – HPCC pag. 1 Discovery of Locality-Improving Refactorings.

Convey Computer Status Steve Wallach swallach”at”conveycomputer.com.

ATOM: A System for Building Customized Program Analysis Tools.

Introduction CS 524 – High-Performance Computing.

Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.

University of Kansas Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000.

Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.

1 The Pépite project Élisabeth Delozanne, Paris Universitas, UPMC D. Prévit, B. Grugeon, F. Chenevotot Automatic Multi-criteria Assessment of Open-Ended.

New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion.

Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Semi-Automatic Composition of Data Layout Transformations for Loop Vectorization Shixiong Xu, David Gregg University of Dublin, Trinity College

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Research, Development, Consulting, Training High Fidelity Modeling and Simulation Where we are going… …future plans.

Exploiting SIMD parallelism with the CGiS compiler framework Nicolas Fritz, Philipp Lucas, Reinhard Wilhelm Saarland University.

MM5 Optimization Experiences and Numerical Sensitivities Found in Convective/Non-Convective Cloud Interactions Carlie J. Coats, Jr., MCNC

Quantum Programming Languages for Specification and Optimization Fred Chong UC Santa Barbara Ken Brown, Ravi Chugh, Margaret Martonosi, John Reppy and.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

A Data Cache with Dynamic Mapping P. D'Alberto, A. Nicolau and A. Veidenbaum ICS-UCI Speaker Paolo D’Alberto.

Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.

© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.

TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.

Pirouz Bazargan SabetDecember 2003 Outline Architecture of a RISC Processor Implementation.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.

Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.

Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.

© Andrew IrelandDependable Systems Group Proof Automation for the SPARK Approach to High Integrity Ada Andrew Ireland Computing & Electrical Engineering.

Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.

Parallel Processing Presented by: Wanki Ho CS147, Section 1.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Sunpyo Hong, Hyesoon Kim

Fortran Compilers David Padua University of Illinois at Urbana-Champaign.

Chapter 13 RISC Peter Wong CS147 Fall2010. What is RISC? RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that utilizes.

Parallel Computing Presented by Justin Reschke

Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.

1 Chapter 1 Background Fundamentals of Java: AP Computer Science Essentials, 4th Edition Lambert / Osborne.

Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,

1 Removing Impediments to Loop Fusion Through Code Transformations Bob Blainey 1, Christopher Barton 2, and Jos’e Nelson Amaral 2 1 IBM Toronto Software.

Invitation to Computer Science 6th Edition

Code Optimization.

Computer Science 2 What’s this course all about?

Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.

Tohoku University, Japan

System Programming and administration

Constructing a system with multiple computers or processors

Lecture 5: GPU Compute Architecture

Morphable Multithreaded Memory Tiles (M3T)*

Compiler Construction

Introduction to Computer Systems

High Performance Computing (CS 540)

Memory Hierarchies.

EEL 4713/EEL 5764 Computer Architecture

CSC Classes Required for TCC CS Degree

Benjamin Goldberg Compiler Verification and Optimization

Multivector and SIMD Computers

Numerical Algorithms Quiz questions

Lecture 19: Code Optimisation

What Are Performance Counters?

Presentation transcript:

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois A Proposal of Operation History Management System for Source-to-Source Optimization of HPC Programs Yasushi Negishi, Hiroki Murata and Takao Moriyama Deep Computing, Tokyo Research Laboratory, IBM Research July, 2009 | PADTAD Chicago, Illinois

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 2 Outline of this Presentation 1.Proposal of an algorithm for managing operation history of source-to-source optimization. 2.Prototype system with new user interface for managing operation history explicitly.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 3 Outline of this Presentation 1.Proposal of an algorithm for managing operation history of source-to-source optimization. 2.Prototype system with new user interface for managing operation history explicitly.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 4 Background  Improvement of single processor performance is stopping, and architectures of supercomputers is becoming more complex. –Architecture-specific optimizations are needed to utilize various kinds of network and processor architectures to achieve reasonable performance.  Application areas for numerical simulations continue to expand. –We need solve performance issues more effectively and more easily.  Source-to-source optimization tools are becoming important. –Automatic conversion (a.k.a. refactoring) for optimization –Support typical architecture-specific and application-specific performance optimization patterns. –Reduce programmer’s time and human errors by supporting routine but troublesome optimization.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 5  Strength reduction –Replace costly operation with an equivalent but less expensive operation E.g. x = r ** (-1)  x = 1 / r –Steps 1.Modify the code to use less expensive operation by manual editing  Loop unrolling & SIMDization –Use SIMD instructions If compiler does not generate optimal SIMD instructions in a loop E.g. x(i) = a(i) + b(i) * c(i)  x(i) = FPMADD(a(i), b(i), c(i)) x(i+1) = a(i+1) + b(i+1) * c(i+1) –Steps 1.Unroll the loop by automatic conversion with specifying the range and unroll factor. 2.Modify the unrolled loop body with in-line assemble code for SIMD by manual editing  Loop tiling (a.k.a. loop blocking, strip mine and interchange) –Change loop structure to increase memory access locality and cache hit ratio. E.g. –Steps 1.Modify the loop by automatic conversion with specifying the range and blocking factors. Typical Source-to-Source Optimization Steps for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = c[i]+ a[i,j]*b[j]; for (i=0; i<N; i+= Bi) for (j=0; j<N; j+= Bj ） for (ii=i; ii<min(i+Bi,N); ii++) for (jj=j; jj<min(j+Bj,N); jj++) c[ii] =c[ii]+ a[ii,jj]*b[jj];  Optimization steps are combinations of automatic conversion and manual editing

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 6 “Reapplication Conflict”  Because of trial-and-error nature of optimization work, it is sometimes required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code.  We call this conflict caused by a single user as “Reapplication Conflict”.  System for supporting Source-to-Source optimization should handle this conflict correctly.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 7 Issues of Existing Version Management Systems Handling “Reapplication Conflict”  Because of trial-and-error nature of optimization work, it is sometimes required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code. –We call this conflict caused by a single user as “Reapplication Conflict”.  System should handle this conflict correctly.  Existing version management systems use algorithm of “patch” command or similar one to handle conflicts.  But the patch algorithm has a issue. –As for modification by manual editing, the patch algorithm works fine. The algorithm applies difference by an operation on different base code, with adjusting target range to be applied. –As for modification by automatic conversion, the patch algorithm may generate unexpected results.  Scenario in which existing system does not work expectedly is shown.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 8 Example Scenario of “Reapplication Conflict” (original) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original Original code is checked out.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 9 Example Scenario of “Reapplication Conflict” (Step 1) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Original Operation A Step 1: Do loop invariant code motion by manual editing, and check it in

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 10 Step 2: Do strength reduction by manual editing, and check it in. Example Scenario of “Reapplication Conflict” (Step 2) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Step 2: Original AB

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 11 Step 3: Do loop unrolling by automatic conversion, and check it in. Example Scenario of “Reapplication Conflict” (Step 3) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Step 2: Original AB C Step 3:

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 12 Example Scenario of “Reapplication Conflict” (Step 4) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Step 2: Original AB C Step 3: Step 4: Compile and execute the code, and analyze effects of optimizations  Find the following results Optimization A: not effective Optimization B: effective Optimization C: effective N.G.O.K.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 13 Example Scenario of “Reapplication Conflict” (Step 5) Original: Step 1: Step 2: Original AB C Step 3: Step 5: Step 5: Undo the optimization A by “patch” command program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Target of optimization A Not target of optimization A, but influenced

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 14 Example Scenario of “Reapplication Conflict” (Final Results) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Problem: The wrong line is unrolled !! Because “patch” does not actually apply the automatic conversion operation again, but does just apply difference of the results by automatic conversion operation. System for managing automatic conversion operations needed. (1) Adjust the target range (2) Apply the automatic operation actually again.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 15 Proposed Algorithm for saving/applying automatic operations  Manual editing handled by the patch algorithm  Automatic conversion handled by our proposed algorithm Original code Optimization results Manual Editing Context difference file Saving an operation Modified code Applying an saved operation Optimized results on modified code Patch algorithm Original Code Pseudo change file Specify Range Optimization results Specify Conversion ID and arguments Operation log Context difference file Operation log Conversion ID Arguments Modified Code Pseudo change file Optimization results Context difference file Conversion ID Arguments Operation log Context difference file Operation log Patch algorithm Apply automatic conversion

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 16 Scenario of Proposed Algorism to Save Automatic Operations program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Algorithm for saving operation history program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end pseudo change file Step 1: Generate pseudo change file by inserting special lines to specify range for the automatic operation. Step 2: Create context difference file between the file before editing and the pseudo change file “loop unrolling” *** opeB.F Sat Jul 11 11:36: opeC2.F Sun Jul 12 13:36: *************** *** 19,27 **** , enddo t2 = rtc() - s s = rtc() + $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo + $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 4 By saving this context difference file, range-adjust algorithm of “patch” command can be used for identifying the target range of automatic conversion. Step 3: Save identifier of automatic conversion operation (e.g. “loop unrolling”), its parameter (e.g. “4”), and the context difference file as its operation log. context difference file parameter Identifier of automatic conversion Operation log

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 17 Scenario of Proposed Algorism to Apply Automatic Operation (Step 1) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Algorithm for applying operation history on modified target code Step1: Apply the context diff file to the target program by using algorithm used by the “patch” command. Trial 1: Apply the history at the same position Not Match Trial 2: Ignore the starting and ending line numbers Match “loop unrolling” *** opeB.F Sat Jul 11 11:36: opeC2.F Sun Jul 12 13:36: *************** *** 19,27 **** , enddo t2 = rtc() - s s = rtc() + $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo + $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 4 context difference file parameter Identifier of automatic conversion Operation log Trial 3: Ignore outer most one line before/after the modification Trial 4: Ignore outer most two lines before/after the modification pseudo change file program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 18 Scenario of Proposed Algorism to Apply Automatic Operation (Step 2) Algorithm for applying operation history on modified target code Step2: Redo automatic conversion with its parameter saved in the operation log. *** opeB.F Sat Jul 11 11:36: opeC2.F Sun Jul 12 13:36: *************** *** 19,27 **** , enddo t2 = rtc() - s s = rtc() + $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo + $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 context difference file parameter Identifier of automatic conversion Operation log pseudo change file program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end “loop unrolling” 4 Redo “loop unrolling” “4” times on “the loop”

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 19 Proposed Algorism to Apply Automatic Operation (Final Results) program sample implicit none integer i, n parameter(n= ) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+2) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+3) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Problem solved. The correct line is unrolled !! The proposed system can reapply automatic conversion operations correctly.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 20 Outline of this Presentation 1.Proposal of an algorithm for managing operation history of source-to-source optimization. 2.Prototype system with new user interface for managing operation history explicitly.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 21 Prototype Implementation of the Proposed System  Implemented as an Eclipse plug-in module –Worked with open source CDT/Photran modules –Use CDT/Photran’s C/Fortran parser Eclipse Photran module (Fortran) Open Source HPC refactoring module CDT module (C) Open Source Pre-defined Transformation rules User defined Transformation rules User defined Transformation rules

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 22 Proposal of user interface for operation history management system Source code tree view Information and console output view Source code view Operation history view 1. Operation History is displayed as a sequence, and user can select and modify any point of source code. 3. Operations are categorized into the following three categories according to the status and necessity of the reapplication, and are displayed by using three colors. Green: Applied Yellow: Not tried to applied Red: Tried to applied, but fail. 2. The succeeding operations are automatically reapplied as needed to produce a new version according to the user’s instructions.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 23 Conclusion 1.Explained proposal of an algorithm for managing operation history of source-to-source optimization. 2.Explained Prototype system with new user interface for managing operation history explicitly.

© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois 24 Questions ?