Array Operation Synthesis to Optimize Data Parallel Programs Department of Computer Science, National Tsing-Hua University Student:Gwan-Hwan Hwang Advisor:

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

MEMS Thermal & Fluid Control Lab. 國立台灣大學機械工程系微機械熱流控制實驗室 Department of Mechanical Engineering National Taiwan University, Taipei, Taiwan Department of Mechanical.
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan Although the [Fortran] group broke new ground …
Connective Fault Tolerance in Multiple-Bus System Hung-Kuei Ku and John P. Hayes IEEE Transactions on parallel and distributed System, VOL. 8, NO. 6, June.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu, Mauro Bianco, Nancy Amato Parasol Lab, Department of Computer.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
CIS 101: Computer Programming and Problem Solving Lecture 8 Usman Roshan Department of Computer Science NJIT.
Department of Electronic Engineering City University of Hong Kong BEng (Hons) in Information Engineering 資訊工程學榮譽工學士 BEng (Hons) in Information Engineering.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
October 14-15, 2005Conformal Computing Geometry of Arrays: Mathematics of Arrays and  calculus Lenore R. Mullin Computer Science Department College.
1 高等演算法 Homework One 暨南大學資訊工程學系 黃光璿 2004/11/11. 2 Problem 1.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
1 Programming Languages b Each type of CPU has its own specific machine language b But, writing programs in machine languages is cumbersome (too detailed)
張 燕 光 資訊工程學系 Dept. of Computer Science & Information Engineering,
Programming Languages Structure
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
Embedded Systems An Overview to Embedded Software C.-Z. Yang Sept.-Dec
中華大學 資訊工程系 Fall 2002 Chap 4 Laplace Transform. Page 2 Outline Basic Concepts Laplace Transform Definition, Theorems, Formula Inverse Laplace Transform.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Embedded Systems Interrupts C.-Z. Yang Sept.-Dec
國立陽明大學生資學程 陳虹瑋. Genetic Algorithm Background Fitness function ……. population selection Cross over mutation Fitness values Random cross over.
Compiler Design Nai-Wei Lin Department of Computer Science National Chung Cheng University.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
教育部「大學學術追求卓越發展計畫」 下一世代資訊通訊網路尖端技術及應用 分項計畫四 無障礙網路技術的研究與發展 計畫主持人 清大電機系許雅三教授 清大資工系金仲達教授.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
1 Three-Party Authenticated Key Agreements and Its Applications- PCSs Roaming Protocol 李添福 (Tian-Fu Lee) 國立成功大學資訊工程博士 Cryptography/ Network security/ Wireless.
1 中華大學資訊工程學系 Ching-Hsien Hsu ( 許慶賢 ) Localization and Scheduling Techniques for Optimizing Communications on Heterogeneous.
Real-Time Embedded Software Synthesis 即時嵌入式軟體合成 熊博安國立中正大學資訊工程學系民國九十年十一月廿九日.
1 Introduction Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
VHDL Symbolic simulator in OCaml Florent Ouchet TIMA Labs – GINP – UJF – CNRS – VDS group OCaml Meeting 2009.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Copyright © 2011, Resource allocation for MMOG based on AFK players in the cloud 指導教授:王國禎 博士 學生:陳治豪 國立交通大學網路工程研究所 行動計算與寬頻網路實驗室.
Institute for Software Science – University of ViennaP.Brezany Parallel and Distributed Systems Peter Brezany Institute for Software Science University.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
Digital Integrated Circuit Design Laboratory Department of Computer Science and Information EngineeringNational Cheng Kung University LAB - 03 陳培殷 國立成功大學.
Intelligent Space 國立台灣大學資訊工程研究所 智慧型空間實驗室 Service Behavior Consistency in the OSGi Platform Authors Y.Qin, H.Hao,L.Jun, G.Jidong and L.Jian Proceedings.
Component 4: Introduction to Information and Computer Science Unit 5: Overview of Programming Languages, Including Basic Programming Concepts Lecture 2.
Location-Aware 吳俊興 國立高雄大學 資訊工程學系 CSF645 – Mobile Computing 行動計算
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Comparison of Array Operation Synthesis and Straightforward Compilation FORALL (I=1:N:1; J=1:N:1) IF (1
Maximization of System Lifetime for Data-Centric Wireless Sensor Networks 指導教授:林永松 博士 具資料集縮能力無線感測網路 系統生命週期之最大化 研究生:郭文政 國立臺灣大學資訊管理學研究所碩士論文審查 民國 95 年 7 月.
國立清華大學高速通訊與計算實驗室 NTHU High-Speed Communication & Computing Laboratory Optimal Provisioning for Elastic Service Oriented Virtual Network Request in Cloud.
Jungpyo Lee Plasma Science & Fusion Center(PSFC), MIT Parallelization for a Block-Tridiagonal System with MPI 2009 Spring Term Project.
Programming Languages
1 Array Operation Synthesis to Optimize Data Parallel Programs Speaker : Gwan-Hwan Hwang (黃冠寰), Ph.D. Associate Professor Department of Information and.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
HPF (High Performance Fortran). What is HPF? HPF is a standard for data-parallel programming. Extends Fortran-77 or Fortran-90. Similar extensions exist.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
1 An LSB Substitution base Information Hiding Technique 國立彰化師範大學 資訊工程學系教授兼系主任 蕭如淵 (Ju-Yuan Hsiao) 中華民國九十四年十二月十六日.
The Functions and Purposes of Translators Translators, Interpreters and Compilers - High Level Languages.
The Functions and Purposes of Translators Translators, Interpreters and Compilers - High Level Languages.
KERRY BARNES WILLIAM LUNDGREN JAMES STEED
CS 614: Theory and Construction of Compilers Lecture 8 Fall 2002 Department of Computer Science University of Alabama Joel Jones.
a medium allowing humans and computers to communicate an abstraction of the real world a notation for expressing algorithms the set of all syntactically.
CMPUT Compiler Design and Optimization
课程名 编译原理 Compiling Techniques
Error Minimization of Diffusion Operator
Presentation transcript:

Array Operation Synthesis to Optimize Data Parallel Programs Department of Computer Science, National Tsing-Hua University Student:Gwan-Hwan Hwang Advisor: Dr. Jenq Kuen Lee

Array Operation Synthesis to Optimize Data Parallel Programs 國立清華大學 資訊工程系 Student: 黃冠寰 Advisor: 李政崑博士

Array Operation Synthesis on Distributed-memory Machines 國立清華大學 資訊工程學系 黃冠寰, Phd.

Compiler Optimization for Compiler Optimization for Parallel Computations on Parallel Computations on Distributed & Shared Memory Distributed & Shared Memory Machines Machines Communication Code for Block-Cyclic Distribution of HPF(IPPS’98) Array Operation Synthesis for Intrinsic Array Functions (JPDC, ACM PPoPP’95, ICPP’96) Research Interests Key Issues Automatic Alignment for Data Parallel Languages (LCPC’97) Concurrent Testing Concurrent Testing Reachability Testing of Concurrent Program (IJSEKE’95, APSEC’93) Parallel Object Program Model & Parallel Object Program Model & Heterogeneous Computing Heterogeneous Computing Java-Based Network Computing Environment Transparent Parallel Computing Environment (Ongoing)

Outline of Presentation Fortran 90 Intrinsic Array Operations Array Operation Synthesis(AOS) SYNTOOL Apply AOS to Shared-Memory Machines Apply AOS to Distributed-Memory Machines Conclusion and Future Work

Outline of Presentation Fortran 90 Intrinsic Array Operations Array Operation Synthesis(AOS) SYNTOOL Apply AOS to Shared-Memory Machines Apply AOS to Distributed-Memory Machines Integrate AOS with Automatic Data Alignment Conclusion and Future Work

Intrinsic Array Operations Provided by Modern Program Languages.  E.g. Fortran 90, High Performance Fortran(HPF), HPF2, Fortran 97, APL, MATLAB, MATHEMATICA, NESL, C* Engineering and Scientific Applications Facilitate a Compilation Analysis for Optimization Support Parallel Execution and Portability

Intrinsic Array Operations(Cont’d) Array Operations Provided by Fortran 90, HPF. Examples: CSHIFT, TRANSPOSE, MERGE, EOSHIFT, RESHAPE SPREAD, Section Move, Where Constructs, Reductions. B=CSHIFT(A,1,1) C=TRANSPOSE(B)

Consecutive Array Expressions Array Expression Consecutive Array Operations C=EOSHIFT(MERGE(RESHAPE(S,/N,N/),A+B,T),1,0,1) FXP=CSHIFT(F1,1,+1) FXM=CSHIFT(F1,1,-1) FXP=CSHIFT(F1,2,+1) FYM=CSHIFT(F1,2,-1) FDERIV=ZXP*(FXP-F1)+ZXM*(FXM-F1)+ ZYP*(FYP-F1)+ZYM*(FYM-F1)

Classification of Array Operations Model Array Operations by Data Access Functions (DAF) Type 1Type 2Type 3 Type 4

Data Access Functions Represent Array Operations by Mathematical Functions Model Array Operations by Data Access Functions (DAF)  Single-Source, Multiple-Source  Single-Clause, multiple-Clause

Type 1: Single-source Single- clause Data Access Function One Source Array One Data Access Pattern B=TRANSPOSE(A) Data Access Function is B(I,J)=A(J,I)

Single-source Single-clause Data Access Function One Source Array One Data Access Pattern B=TRANSPOSE(A) Data Access Function is B(I,J)=A(J,I)

Type 2: Multiple-source Single- clause Data Access Function Multiple Source Arrays One Data Access Pattern R=MERGE(T,F,M) Data Access Function is where Array TArray FArray MArray R

Multiple-source Single-clause Data Access Function Multiple Source Arrays One Data Access Pattern R=MERGE(T,F,M) Data Access Function is where Array TArray FArray MArray R

Type 3: Single-source Multiple- clause Data Access Function Single Source Array Multiple Data Access Patterns B=CSHIFT(A,1,1) Data Access Function is Array AArray B : a segmentation descriptor

Single-source Multiple-clause Data Access Function Single Source Array Multiple Data Access Patterns B=CSHIFT(A,1,1) Data Access Function is Array AArray B : a segmentation descriptor

Type 4: Multiple-source Multiple- clause Data Access Function Multiple Source Arrays Multiple Data Access Patterns  No array operation of Fortran 90 belongs to type 4  Synthesis of multiple array operations may derive a type 4 data access function.

Multiple-source Multiple-clause Data Access Function Multiple Source Arrays Multiple Data Access Patterns  No array operation of Fortran 90 belongs to this type  Synthesis of multiple array operations may derive a multiple-source multiple-clause data access function

Straightforward Compilation Translate each operation into a parallel loop B=CSHIFT((TRANSPOSE(EOSHIFT(A,1,0,1),1,1) FORALL (I=1:N:1; J=1:N:1) T2 (I,J)= T1 (J,I) ENDFORALL FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN B(I,J)= T2 (I+1,J) ELSE B(I,J)= T2 (I-N,J) ENDFORALL FORALL (I=1:N:1; J=1:N:1) IF (1<=I<=N-1) and (1<=J<=N) THEN T1 (I,J)=A(I+1,J) ELSE T1 (I,J)=0 ENDFORALL EOSHIFT TRANSPOSE CSHIFT

Array Operation Synthesis Construct the Parse Tree of Array Expression Represent Array Operations by Mathematical Functions (DAF) B=CSHIFT((TRANSPOSE(EOSHIFT(A,1,0,1),1,1) CSHIFT TRANSPOSE EOSHIFT

Array Operation Synthesis (Cont’d) CSHIFT TRANSPOSE Synthesis of two functions COSHIFT+ TRANSPOSE EOSHIFT

Substitution (Term Rewriting like method)  Having two Data Access Patterns:  The Synthesized Data Access Pattern is: Synthesis of two Data Access Functions where

For example, By the substitution rule  Synthesis of two DAFs (Cont’d)

For example, Synthesis of two DAFs (Cont’d)

Code Generation for Synthesized Data Access Function FORALL (I=1:N:1; J=1:N:1) IF  (/I,J/,/1:N-1,1:N/)   (/J,I+1/,/1:N-1,1:N/) THEN B(I,J)=A(J+1, I+1) IF  (/I,J/,/1:N-1,1:N/)   (/J,I+1/,/N:N,1:N/) THEN B(I,J)=0 IF  (/I,J/,/N:N,1:N/)   (/J,I+1/,/1:N-1,1:N/) THEN B(I,J)=A(J+1, I-N+1) IF  (/I,J/,/N:N,1:N/)   (/J,I+1/,/N:N,1:N/) THEN B(I,J)=0 ENDFORALL Code Generation

Code Generation for Synthesized Data Access Function After Optimization 1 N-1 N 1 N

Simplifying the ranges at compilation time instead of runtime Optimization process:  Normalize:  Intersection for each dimension: Optimization