CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

ArrayLists David Kauchak cs201 Spring Extendable array Arrays store data in sequential locations in memory Elements are accessed via their index.
BWUPEP2011, UIUC, May 29 - June Blue Waters Undergraduate Petascale Education Program May 29 – June Hybrid MPI/CUDA Scaling accelerator.
Math for Liberal Studies.  Problems can occur when data is transmitted from one place to another  The two main problems are  transmission errors: the.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Procedures in more detail. CMPE12cCyrus Bazeghi 2 Procedures Why use procedures? Reuse of code More readable Less code Microprocessors (and assembly languages)
Point-to-Point Communication Self Test with solution.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Waves!. Solving something like this… The Wave Equation (1-D) (n-D)
RAID Systems CS Introduction to Operating Systems.
The hybird approach to programming clusters of multi-core architetures.
CS 179: GPU Programming Lecture 20: Cross-system communication.
1 Product Reliability Chris Nabavi BSc SMIEEE © 2006 PCE Systems Ltd.
Non-Blocking I/O CS550 Operating Systems. Outline Continued discussion of semaphores from the previous lecture notes, as necessary. MPI Types What is.
CS 221 – May 13 Review chapter 1 Lab – Show me your C programs – Black spaghetti – connect remaining machines – Be able to ping, ssh, and transfer files.
Guidelines for the CMM coding project 5 October 2006 (or, “How to make your life easier in the long run”)
Networked File System CS Introduction to Operating Systems.
Nachos Phase 1 Code -Hints and Comments
CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:
Variables, Functions & Parameter Passing CSci 588 Fall 2013 All material not from online sources copyright © Travis Desell, 2011.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
CacheLab Recitation 7 10/8/2012. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Tips (warnings, getopt,
Games Development 2 Concurrent Programming CO3301 Week 9.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
Lecture 11 Page 1 CS 111 Online Memory Management: Paging and Virtual Memory CS 111 On-Line MS Program Operating Systems Peter Reiher.
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Lecture 5 Checksum. 10.2CHECKSUM Checksum is an error-detecting technique that can be applied to a message of any length. In the Internet, the checksum.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
An Introduction to MPI (message passing interface)
Pintos project 3: Virtual Memory Management
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
Outline Announcements: –HW I key online this afternoon –HW II due Friday –Sign up to discuss projects Debugging Testging for correctness.
RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Implementing Processes and Threads CS550 Operating Systems.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Get up to speed Save your files in the format that works best Access 2007 uses a new file format and a new file extension. What does that mean to you?
Preocedures A closer look at procedures. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
Inverses are everywhere – when we think about reversing a process, we are thinking about the inverse.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Lecture 5 Page 1 CS 111 Online Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command.
CS Introduction to Operating Systems
Outline Paging Swapping and demand paging Virtual memory.
Mechanism: Limited Direct Execution
Chapter 2: System Structures
COMPSCI210 Recitation 12 Oct 2012 Vamsi Thummala
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command Some by invocation from other running.
Waves!.
Advanced TAU Commander
Concurrency Control via Validation
Lecture 14: Inter-process Communication
CS 179: Lecture 12.
Chapter 4: Memory Management
Error Detection and Correction
Example 5A: Solving Simple Rational Equations
CS703 – Advanced Operating Systems
46: Indices and Laws of Logarithms
Presentation transcript:

CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver

MPI/CUDA – Wave Equation  Big idea: Divide our data array between n processes!

MPI/CUDA – Wave Equation  Problem if we’re at the boundary of a process! x t t-1 t+1

Wave Equation – Simple Solution  After every time-step, each process gives its leftmost and rightmost piece of “current” data to neighbor processes! Proc0 Proc1 Proc2Proc3 Proc4

Wave Equation – Simple Solution  Pieces of data to communicate: Proc0 Proc1 Proc2Proc3 Proc4

Wave Equation – Simple Solution  Can do this with MPI_Irecv, MPI_Isend, MPI_Wait:  Suppose process has rank r:  If we’re not the rightmost process:  Send data to process r+1  Receive data from process r+1  If we’re not the leftmost process:  Send data to process r-1  Receive data from process r-1  Wait on requests

Wave Equation – Simple Solution  Boundary conditions:  Use MPI_Comm_rank and MPI_Comm_size  Rank 0 process will set leftmost condition  Rank (size-1) process will set rightmost condition

Simple Solution – Problems  Communication can be expensive!  Expensive to communicate every timestep to send 1 value!  Better solution: Send some m values every m timesteps!

Possible Implementation  Initial setup: (Assume 3 processes) Proc0Proc1 Proc2

Possible Implementation  Give each array “redundant regions”  (Assume communication interval = 3) Proc0Proc1 Proc2

Possible Implementation  Every (3) timesteps, send some of your data to neighbor processes!

Possible Implementation  Send “current” data (current at time of communication) Proc0Proc1 Proc2

Possible Implementation  Then send “old” data Proc0Proc1 Proc2

 Then…  Do our calculation as normal, if we’re not at the ends of our array  Our entire array, including redundancies!

What about corruption?  Suppose we’ve just copied our data… (assume a non- boundary process) . = valid  ? = garbage  ~ = doesn’t matter  (Recall that there exist only 3 spaces – gray areas are nonexistent in our current time

What about corruption?  Calculate new data…  Value unknown!

What about corruption?  Time t+1:  Current -> old, new -> current (and space for old is overwritten by new…)

What about corruption?  More garbage data!  “Garbage in, garbage out!”

What about corruption?  Time t+2…

What about corruption?  Even more garbage!

What about corruption?  Time t+3…  Core data region - corruption imminent!?

What about corruption?  Saved!  Data exchange occurs after communication interval has passed!

“It’s okay to play with garbage… just don’t get sick”

Boundary Conditions  Applied only at the leftmost and rightmost process!

Boundary corruption?  Examine left-most process:  We never copy to it, so left redundant region is garbage! (B = boundary condition set)

Boundary corruption?  Calculation brings garbage into non-redundant region!

Boundary corruption?  …but boundary condition is set at every interval!

Other details  To run programs with MPI, use the “mpirun” command, e.g. mpirun -np (number of processes) (your program and arguments)  CMS machines: Add this to your.bashrc file: alias mpirun=/cs/courses/cs179/openmpi-1.6.4/bin/mpirun

Common bugs (and likely causes)  Lock-up (it seems like nothing’s happening):  Often an MPI issue – locks up on MPI_Wait because some request wasn’t fulfilled  Check that all sends have corresponding receives  Your wave looks weird:  Likely cause 1: Garbage data is being passed between processes  Likely cause 2: Redundant regions aren’t being refreshed and/or are contaminating non-redundant regions

 Your wave is flat-zero:  Left boundary condition isn’t being initialized and/or isn’t propagating  Same reasons as previous Common bugs (and likely causes)

 General debugging tips:  Run at MPI with process number = 1 or 2  Set kernel to write constant value