High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Slides:

Advertisements

Similar presentations

879 CISC Parallel Computation High Performance Fortran (HPF) Ibrahim Halil Saruhan Although the [Fortran] group broke new ground …

Advertisements

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,

More Tools Done basic, most useful programming tools: MPI OpenMP What else is out there? Languages and compilers? Parallel computing is significantly more.

Compiler Challenges for High Performance Architectures

Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy Optimizing Compilers for Modern Architectures.

Performance Visualizations using XML Representations Presented by Kristof Beyls Yijun Yu Erik H. D’Hollander.

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.

Reference: Message Passing Fundamentals.

1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Parallel Computing Overview CS 524 – High-Performance Computing.

Introduction to Fortran Fortran Evolution Drawbacks of FORTRAN 77 Fortran 90 New features Advantages of Additions.

Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Optimizing Compilers for Modern Architectures Compiling High Performance Fortran Allen and Kennedy, Chapter 14.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Novel Architectures Copyright 2004 Daniel J. Sorin Duke University.

Issues in Translation of High Performance Fortran Bryan Carpenter NPAC at Syracuse University Syracuse, NY 13244

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

1 中華大學資訊工程學系 Ching-Hsien Hsu ( 許慶賢 ) Localization and Scheduling Techniques for Optimizing Communications on Heterogeneous.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,

Institute for Software Science – University of ViennaP.Brezany Parallel and Distributed Systems Peter Brezany Institute for Software Science University.

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.

Compiling Fortran D For MIMD Distributed Machines Authors: Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng Published: 1992 Presented by: Sunjeev Sikand.

An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications Daniel Chavarría-Miranda John Mellor-Crummey Dept. of Computer Science Rice.

Lecture 9 Architecture Independent (MPI) Algorithm Design

HPF (High Performance Fortran). What is HPF? HPF is a standard for data-parallel programming. Extends Fortran-77 or Fortran-90. Similar extensions exist.

Parallel Computing Presented by Justin Reschke

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Auburn University

Conception of parallel algorithms

Parallel Programming By J. H. Wang May 2, 2017.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

3- Parallel Programming Models

Parallel Algorithm Design

Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan

Parallel Programming in C with MPI and OpenMP

Array Processor.

Summary Background Introduction in algorithms and applications

Parallel Matrix Operations

Parallel Programming in C with MPI and OpenMP

شیوه های موازی سازی parallelization methods

Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.

Parallel Programming in C with MPI and OpenMP

An Orchestration Language for Parallel Objects

Presentation transcript:

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

Question Can't we just have a clever compiler generate a parallel program from a sequential program? Fine-grained parallelism x = a*b + c*d Trivial parallelism for i := 1 to 100 do for j := 1 to 100 do C [i, j] := dotproduct ( A [ i,*], B [*, j ]); od

Automatic parallelism Automatic parallelization of any program is extremely hard Solutions: Make restrictions on source program Restrict kind of parallelism used Use semi-automatic approach Use application-domain oriented languages

High Performance Fortran (HPF) Designed by a forum from industry, government, universities Extends Fortran 90 To be used for computationally expensive numerical applications Portable to SIMD machines, vector processors, shared-memory MIMD and distributed- memory MIMD

Fortran 90 - Base language of HPF Extends Fortran 77 with 'modern' features abstract data types, modules recursion pointers, dynamic storage Array operators A = B + C A = A A(1:7) = B(1:7) + B(2:8) WHERE (X /= 0) X = 1.0/X

Data parallelism Data parallelism: same operation applied to different data elements in parallel Data parallel program: sequence of data parallel operations Overall approach: Programmer does domain decomposition Compiler partitions operations automatically Data may be regular (array) or irregular (tree, sparse matrix) Most data parallel languages only deal with arrays

Data parallelism - Concurrency Explicit parallel operations A = B + C ! A, B, and C are arrays Implicit parallelism do i = 1,m do j = 1,n A(i,j) = B(i,j) + C(i,j) enddo

Compiling data parallel programs Programs are translated automatically into parallel SPMD (Single Program Multiple Data) programs Each processor executes same program on a subset of the data Owner computes rule: - Each processor owns subset of the data structures - Operations required for an element are executed by the owner - Each processor may read (but not modify) other elements

Example real s, X(100), Y(100) ! s is scalar, X and Y are arrays X = X * 3.0 ! Multiply each X(i) by 3.0 do i = 2,99 Y(i) = (X(i-1) + X(i+1))/2 ! Communication required enddo s = SUM(X) ! Communication required Arrays X and Y are distributed (partitioned) Scalar s is replicated on each machine X Y

HPF primitives for data distribution Directives: PROCESSORS: shape & size of abstract processors ALIGN: align elements of different arrays DISTRIBUTE: distribute (partition) an array Directives affect performance of the program, not its result

Processors directive !HPF$ PROCESSORS P(32) !HPF$ PROCESSORS Q(4,8) Mapping of abstract to physical processors not specified in HPF (implementation-dependent)

Alignment directive Aligns an array with another array Species that specific elements should be mapped to the same processor real A(50), B(50), C(50,50) !HPF$ ALIGN A(I) WITH B(I) !HPF$ ALIGN A(I) WITH B(I+2) !HPF$ ALIGN A(:) WITH C(*,:)

Figure 7.6 from Foster's book

Distribution directive Species how elements should be partitioned among the local memories Each dimension can be distributed as follows: *no distribution BLOCK (n)block distribution CYCLIC (n)cyclic distribution

Figure 7.7 from Foster's book

Example program See Ian Foster, Program 7.2 program hpf_finite_difference !HPF$ PROCESSORS pr(4) ! use 4 CPUs real X(100, 100), New(100, 100) ! data arrays !HPF$ ALIGN New(:,:) WITH X(:,:) !HPF$ DISTRIBUTE X(BLOCK,*) ONTO pr ! row-wise New(2:99, 2:99) = (X(1:98, 2:99) + X(3:100, 2:99) + X(2:99, 1:98) + X(2:99, 3:100))/4 diffmax = MAXVAL (ABS (New-X)) end

Example program (2) Use block distribution instead of row distribution program hpf_finite_difference !HPF$ PROCESSORS pr(2,2)! use 2x2 grid real X(100, 100), New(100, 100)! data arrays !HPF$ ALIGN New(:,:) WITH X(:,:) !HPF$ DISTRIBUTE X(BLOCK, BLOCK) ONTO pr! block-wise New(2:99, 2:99) = (X(1:98, 2:99) + X(3:100, 2:99) + X(2:99, 1:98) + X(2:99, 3:100))/4 diffmax = MAXVAL (ABS (New-X)) end

Performance Distribution affects Load balance Amount of communication Example (communication costs): !HPF$ PROCESSORS pr(3) integer A(8), B(8), C(8) !HPF$ ALIGN B(:) WITH A(:) !HPF$ DISTRIBUTE A(BLOCK) ONTO pr !HPF$ DISTRIBUTE C(CYCLIC) ONTO pr

Figure 7.9 from Foster's book

Conclusions High-level model User species data distribution Compiler generates parallel program + communication More restrictive than general message passing model (only data parallelism) Restricted to array-based data structures HPF programs will be easy to modify, which enhances portability Changing data distribution only requires changing directives