Programming Project: Hybrid Programming Rebecca Hartman-Baker Oak Ridge National Laboratory

Slides:



Advertisements
Similar presentations
Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Hadi Goudarzi and Massoud Pedram
Distributed Indexed Outlier Detection Algorithm Status Update as of March 11, 2014.
Ionization of the Hydrogen Molecular Ion by Ultrashort Intense Elliptically Polarized Laser Radiation Ryan DuToit Xiaoxu Guan (Mentor) Klaus Bartschat.
MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.
MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions.
Functions.
Computer Science 162 Section 1 CS162 Teaching Staff.
Partial differential equations Function depends on two or more independent variables This is a very simple one - there are many more complicated ones.
COMP 14: Intro. to Intro. to Programming May 23, 2000 Nick Vallidis.
CS 206 Introduction to Computer Science II 02 / 25 / 2009 Instructor: Michael Eckmann.
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
The hybird approach to programming clusters of multi-core architetures.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Writing a Program Chapter 1. Introduction We all “start” by learning how to code in some programming language. –With a small, hypothetical, and fairly.
CS 221 – May 13 Review chapter 1 Lab – Show me your C programs – Black spaghetti – connect remaining machines – Be able to ping, ssh, and transfer files.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Java: Chapter 1 Computer Systems Computer Programming II.
General Programming Introduction to Computing Science and Programming I.
Chapter 3: Completing the Problem- Solving Process and Getting Started with C++ Introduction to Programming with C++ Fourth Edition.
Computer Science 320 Load Balancing for Hybrid SMP/Clusters.
Test-based programming Ask: “What should this software do?” Write a test first “Does this software do X correctly?” Fill in the code, and keep working.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Parallel Solution of the 3-D Laplace Equation Using a Symmetric-Galerkin Boundary Integral.
COMP Exception Handling Yi Hong June 10, 2015.
CS 320 Assignment 1 Rewriting the MISC Osystem class to support loading machine language programs at addresses other than 0 1.
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 5: Software Design & Testing; Revision Session.
Hybrid MPI and OpenMP Parallel Programming
CGI Common Gateway Interface. CGI is the scheme to interface other programs to the Web Server.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Daniel Gagnon’s Final Project Number Guessing Game Widget By: Daniel Gagnon.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
1 Typical performance bottlenecks and how they can be found Bert Wesarg ZIH, Technische Universität Dresden.
1.3 Solving Systems by Substitution. Steps for Substitution 1.Solve for the “easiest” variable 2.Substitute this expression into the other equation 3.Solve.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Object Oriented Programming (OOP) LAB # 1 TA. Maram & TA. Mubaraka TA. Kholood & TA. Aamal.
§3.6 Newton’s Method. The student will learn about
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
PC Support & Repair Chapter 4 Preventive Maintenance & Troubleshooting.
Technology Education THE PERSONAL COMPUTER (PC) SOFTWARE PART 2.
ME451 Kinematics and Dynamics of Machine Systems Numerical Integration October 28, 2013 Radu Serban University of Wisconsin-Madison.
CMSC 2021 Software Development. CMSC 2022 Software Development Life Cycle Five phases: –Analysis –Design –Implementation –Testing –Maintenance.
1 TP #2: How does Prolog answer questions? n Miscellaneous info; n Last TP exercises solved; n How does Prolog answer questions? n Recursive Prolog programs;
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
INF3110 Group 2 EXAM 2013 SOLUTIONS AND HINTS. But first, an example of compile-time and run-time type checking Imagine we have the following code. What.
Barnes Hut – A Broad Review Abhinav S Bhatele The 27th day of April, 2006.
CS 177 Recitation Week 1 – Intro to Java. Questions?
Concurrency and Performance Based on slides by Henri Casanova.
The Swept Rule for Breaking the Latency Barrier in Time-Advancing PDEs FINAL PROJECT MIT FALL 2015 PROJECT SUPERVISOR: PROFESSOR QIQI WANG MAITHAM.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.
Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.
Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio
Discussion #11 11/21/16.
Discussion 11 Final Project / Git.
Food for Thought.
May 19 Lecture Outline Introduce MPI functionality
Hybrid Programming with OpenMP and MPI
ECE 554 Digital Engineering Laboratory Nam Sung Kim (Chunhua Yao –TA)
Relaxation Technique CFD
Class 4: Repetition Pretest Posttest Counting Flowchart these!
4C Physics Reintroducing GUSS.
An Introduction to Debugging
Digital Engineering Laboratory
Basic Mr. Husch.
Creating Maintainable code
Creating readable code
Presentation transcript:

Programming Project: Hybrid Programming Rebecca Hartman-Baker Oak Ridge National Laboratory

2 Hybrid Programming Project Laplace Equation The Code Its Performance Your Project

3 Laplace Equation Second-order elliptic partial differential equation: Simplest of elliptic PDEs, applied in many application areas (e.g., electromagnetism, astronomy, fluid dynamics)

4 Laplace Equation We will solve 2-D Laplace equation with boundary conditions and analytical solution

5 The Code I downloaded “interesting” MPI implementation of Laplace equation from the internet – You can find anything on the internet! – Actual URL withheld to protect the innocent This program is great example of both things to do and things not to do – Fixing codes like this and scaling them to full machine (224,256 cores on Jaguar) is part of my job description – If you get a kick out of doing this project, let me know when you are graduating!

6 The Code Pure MPI program One process is manager, remaining processes are workers Manager just manages (does not perform computations) – Bad idea? Yes and no. (but “yes” as this code is implemented) Workers receive info from manager and perform calculations After they finish, workers send results to manager who prints them to output file Comments in code indicate that student also wrote a hybrid implementation Anyone curious I will provide with hybrid implementation, but probably best not to be influenced by it

7 Performance Ran on ORNL’s Jaguar, 4 processors, roughly 6 seconds Recompiled with VampirTrace instrumentation, and viewed trace with Vampir (my new favorite tool) Very interesting trace results

8 Performance: Overall Trace

9 Performance: Initialization

10 Performance: Load Imbalance

11 Performance: Finalization

12 Performance: Communication Matrix

13 Your Assignment Fix this code! – First, test for correctness (code runs, but does it work?) and fix, if necessary Trust me, this is important first step any time you work on someone else’s code! – Add OpenMP directives – Add threading support in MPI – Rewrite manager-worker paradigm so manager does some work too Things to consider – Message-passing: who sends to whom? What performance gains (if any) can be attained by threads sending messages? (Hint: try it and see!) – Is there a case for having a manager process (either MPI or OpenMP) that only communicates?