Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science

Slides:

Advertisements

Similar presentations

CUDACL: A Tool for CUDA and OpenCL Programmers Ferosh Jacob 1, David Whittaker 2, Sagar Thapaliya 2, Purushotham Bangalore 2, Marjan Memik 32, and Jeff.

Advertisements

Presentation by Prabhjot Singh

Parallel Processing with OpenMP

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.

Software Process Models

Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches Martin Burtscher 1 and Hassan Rabeti 2 1 Department of Computer Science,

Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.

1 CS 501 Spring 2003 CS 501: Software Engineering Lecture 2 Software Processes.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.

Chapter 1 Software Development. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 1-2 Chapter Objectives Discuss the goals of software development.

Parallelization of FFT in AFNI Huang, Jingshan Xi, Hong Department of Computer Science and Engineering University of South Carolina.

Problems with reuse – Increased maintenance costs; lack of tool support; not-invented- here syndrome; creating, maintaining, and using a component library.

Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

1 The Architectural Design of FRUIT: A Family of Retargetable User Interface Tools Yi Liu, H. Conrad Cunningham and Hui Xiong Computer & Information Science.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.

Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 14, 2005 Operating System.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Tunis International Centre for Environmental Technologies Small Seminar on Networking Technology Information Centers UNFCCC secretariat offices Bonn, Germany.

Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

1 Performance Optimization In QTP Execution Over Video Automation Testing Speaker : Krishnesh Sasiyuthaman Nair Date : 10/05/2012.

Component Technology. Challenges Facing the Software Industry Today’s applications are large & complex – time consuming to develop, difficult and costly.

June 05 David A. Gaitros Jean Muhammad Introduction to OOD and UML Dr. Jean Muhammad.

Project Overview Graduate Selection Process Project Goal Automate the Selection Process.

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University VerXCombo: An.

A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli

Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.

Parallel Programming with MPI and OpenMP

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

1-1 Software Development Objectives: Discuss the goals of software development Identify various aspects of software quality Examine two development life.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Chapter 5: Software Re-Engineering Omar Meqdadi SE 3860 Lecture 5 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.

FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

Analyzing the Current Database. Why Analyze To find out Does the current database support the organizations current needs Are there structural problems.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

1 The Requirements Problem Chapter 1. 2 Standish Group Research Research paper at:  php (1994)

Making Software Executable by Others Varun Ratnakar USC/ISI April 17, 2015

CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.

From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.

Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.

Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.

Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Topic: Programming Languages and their Evolution + Intro to Scratch

The Development Process of Web Applications

ECRG High-Performance Computing Seminar

Locality-driven High-level I/O Aggregation

ACOE301: Computer Architecture II Labs

Looping and Random Numbers

Using Tensorflow to Detect Objects in an Image

Rui Wu, Jose Painumkal, Sergiu M. Dascalu, Frederick C. Harris, Jr

Alternative Processor Panel Results 2008

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

An Introduction to Eclipse

Potential Influence of Prior Experience in an Undergraduate-Graduate Level HPC Course Chris Fietkiewicz, Ph.D. Department of Electrical Engineering and.

Presentation transcript:

Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science

Challenges in Parallel Programming the execution plot of the satisfiability problem in Figure 1 shows that even though the performance of OpenMP and MPI are comparable, for small problems the OpenMP version is faster than an MPI solution. In cases where the size of the data varies, different versions of the same program might be required if a single HPC library is used.

Architecture Dependencies in Parallel Programming Manually maintaining such variations induces unnecessary redundant effort that is also very prone to human errors in maintaining and updating the core algorithms. Therefore, the development of an HPC program is often limited to a specific parallel library. Otherwise, the programmer pays the price of developing and maintaining several versions of the same program.

Program Analysis of FORTRAN OpenMP Programs A DSL that uses only the parallel features can express parallel problems in a platform-independent manner. Most of the programs involve an initialization segment that initializes the execution of the parallel part, and a code segment that is used to collect data from the parallel instances Analysis Conclusion

Proposed Approach to Express Parallel Programs in FORTRAN

MPI Case Study

Conclusion and Future Work For the evolution of high performance FORTRAN code, it is necessary to separate the code of the core computation from the machine or architecture dependencies that may come from usage of a specific API. We analyzed ten FORTRAN programs from diverse domains to understand the usage of OpenMP in scientific code. The analysis revealed that programs often share a common structure such that platform and machine details could be specified in a different file. A case study is included to show that the approach can be extended to other architectures. Future work includes refactoring the legacy code to the approach specified in this paper with minimum input from the user. Another direction will be focused on executing the parallel programs to a GPU. Conducting a user study to explore the advantages and disadvantages from a human factors perspective is another direction of work.

Thank You Questions? Ferosh Jacob University of Alabama Department of Computer Science