Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface Portable Parallel Programs.
Advertisements

MPI Message Passing Interface
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
1) Overview of the UV Group 2) Gauss: A Framework for Verifying Scientific Computing Software 3) Random things (perhaps more useful than 1 or 2) presented.
Reference: / MPI Program Structure.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Tutorial on MPI Experimental Environment for ECE5610/CSC
High Performance Computing
Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School.
1 An Approach to Formalization and Analysis of Message Passing Libraries Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done at the.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Building Symbiotic Relationships Between Formal Verification and High Performance Computing Mike Kirby School of Computing and Scientific Computing and.
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Message Passing Interface (MPI) Part I NPACI Parallel.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
The Problem  Rigorous descriptions for widely used APIs essential  Informal documents / Experiments not a substitute Goals / Benefits  Define MPI rigorously.
Some Challenges in Parallel and Distributed Hardware Design and Programming Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City,
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Introduction In the process of writing or optimizing High Performance Computing software, mostly using MPI these days, designers can inadvertently introduce.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
IBM Research © 2006 IBM Corporation CDT Static Analysis Features CDT Developer Summit - Ottawa Beth September.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
Some Challenges in Parallel and Distributed Hardware Design and Programming Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City,
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
Parallel Programming with MPI By, Santosh K Jena..
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
NORA/Clusters AMANO, Hideharu Textbook pp. 140-147.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Interface Using resources from
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
Verification of Data-Dependent Properties of MPI-Based Parallel Scientific Software Anastasia Mironova.
Introduction to parallel computing concepts and technics
Introduction to MPI.
MPI Message Passing Interface
CS 584.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
Message Passing Models
Introduction to parallelism and the Message Passing Interface
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Gauss: A Framework for Verifying Scientific Software Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah Supported in part by NSF Award ITR

Motivations “One of the simulations will run for 30 days. A Cray supercomputer built in 1995 would take 60,000 years to perform the same calculations.” 12,300 GFLOPS Need permission/grant to use it.

Motivations $10k/week on Blue Gene (180 GFLOPS) at IBM’s Deep Computing Lab 136,800 GFLOPS Max

Motivations 50% of development of parallel scientific codes spent in debugging [Vetter and deSupinski 2000] Programmers from a variety of backgrounds—often not computer science

Overview What Scientific programs look like What challenges are faced by scientific code developers How formal methods can help The Utah Gauss project

SPMD Programs Single Program Multiple Data –Same image runs on each node in the grid –Processes do different things based on rank –Possible to impose a virtual topology within the program

MPI Library for communication MPI is to HPC what PThreads to systems or OpenGL is to Graphics More than 60% of HPC applications use MPI Libraries in some form There are proprietary and open source implementations Provides both communication primitives and virtual topologies in MPI-1

Concurrency Primitives Point to point communications that –Don’t specify system buffering (but might have it in some implementations) and Block Don’t’ block –Use user program provided buffering (with possibly hard or soft limitations) and Block Don’t block Collective communications that –“can (but are not required to) return as soon as their participation in the collective communication is complete.” [MPI-1.1 Standard pg 93, lines 10-11]

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

MPI Tutorial #include #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } P0P1P2P3

Why is parallel scientific programming hard? Portability Scaling Performance

Variety of bugs that are common in parallel scientific programs Deadlock Race Conditions Misunderstanding the semantics of MPI procedures Resource related assumptions Incorrectly matched send/receives

State of the art in Debugging TotalView –Parallel debugger – trace visualization Parallel DBX gdb MPICHECK –Does some deadlock checking –Uses trace analysis

Related work Verification of wildcard free models [Siegel, Avrunin, 2005] –Deadlock free with length zero buffers ==> deadlock free with length > zero buffers. SPIN models of MPI programs [Avrunin, Seigel, Seigel, 2005] and [Seigel, Mironova, Avrunin, Clarke, 2005] –Compare serial and parallel versions of numerical computations for numerical equivelnace.

Automatic Formal Analysis Can prove it correct by hand in a theorem prover Don’t want to spend time making models Approach should be completely automatic (Intended for use by the scientific community at large)

The Big Picture Automatic model extraction Improved static analysis –Model checking Better partial-order reduction Parallel state-space enumeration Symmetry Abstraction Refinement Integration with existing tools –Visual Studio –TotalView

The Big Picture Model Generator MC Server MC Client … #include int main(int argc, char** argv){ int myid; int numprocs; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); if(myid == 0){ int i; for(i = 1; i < numprocs; ++i){ MPI_Send(&i, 1, MPI_INT, i, 0, MPI_COMM_WORLD); } printf("%d Value: %d\n", myid, myid); } else { int val; MPI_Status s; MPI_Recv(&val, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s); printf("%d Value: %d\n", myid, val); } MPI_Finalize(); return 0; } MPI Program int y; active proctype T1(){ int x; x = 1; if :: x = 0; :: x = 2; fi; y = x; } active proctype T2(){ int x; x = 2; if :: y = x + 1; :: y = 0; fi; assert( y == 0 ); } Program Model Compiler MPI Binary Error Simulator Result Analyzer Refinement OK proctype MPI_Send(chan out, int c){ out!c; } proctype MPI_Bsend(chan out, int c){ out!c; } proctype MPI_Isend(chan out, int c){ out!c; } typedef MPI_Status{ int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; } … MPI Library Model + Zing Abstractor Environment Model +

Environment modeling C –Very prevalent among HPC developers Want to analyze code as it is written for performance Zing has it all. –Numeric types –Pointers, Arrays, Casting, Recursion, … –Missing one thing only: “&” (not bitwise and) We provide a layer that makes this possible –Can also track pointer arithmetic and unsafe casts Also provide a variety of stubs for system calls

Environment Example class pointer{ object reference; static object addressof(pointer p){ pointer ret; ret = new pointer; ret.reference = p; return ret; } … } Data encapsulated in Zing objects makes it possible to handle additional Cisms

MPI Library MPI Library modeled carefully by hand from the MPI Specification Preliminary shared memory based implementation –Send, Recv, Barrier, BCast, Init, Rank, Size, Finalize.

Library Example integer MPI_Send(pointer buf, integer count, integer datatype, integer dest, integer tag, integer c){ … comm = getComm(c); atomic { ret = new integer; msg1 = comm.create(buf, count, datatype, _mpi_rank, dest, tag, true); msg2 = comm.find_match(msg1); if (msg1 != msg2) { comm.copy(msg1, msg2); comm.remove(msg2); msg1.pending = false; } else { comm.add(msg1); } ret.value = 0; } select{ wait(!msg1.pending) -> ; }

Model Extraction Map C onto Zing (using CIL) –First through the cpp –Processes to Zing Threads –File to Zing Class Structs and Unions also extracted to Classes –Integral data types to environment layer All numeric types Pointer Class

Extraction Example __cil_tmp45 = integer.addressof(recvdata1); __cil_tmp46 = integer.create(1); __cil_tmp47 = integer.create(6); __cil_tmp48 = integer.create(1); __cil_tmp49 = integer.add(mynode, __cil_tmp48); __cil_tmp50 = integer.mod(__cil_tmp49, totalnodes); __cil_tmp51 = integer.create(1); __cil_tmp52 = integer.create(91); __cil_tmp53 = __anonstruct_MPI_Status_1.addressof(status); MPI_Recv(__cil_tmp45, __cil_tmp46, __cil_tmp47, __cil_tmp50, __cil_tmp51, __cil_tmp52,__cil_tmp53); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status);

Experimental Results Correct example –2 processes: 12,882 states –4 processes: Does not complete Deadlock example –24 processes: 2,522 states

Possible Improvements Atomic regions Constant reuse More formal extraction semantics

Looking ahead (in the coming year) Full MPI-1.1 Library Model in Zing –All point to point and collective communication primitives –Virtual Topologies ANSI C Capable Model Extractor –Dependencies: CIL, GCC, CYGWIN Preliminary Tool Integration –Error visualization and simulation Text book and validation suite examples

Looking ahead (beyond) Better Static Analysis through –Partial-order reduction MPI library model is intended to leverage transaction based reduction Can improve by determining transaction independence

Looking ahead (beyond) Better Static Analysis through –Abstraction Refinement Control flow determined mostly by rank Nondeterministic over-approximation

Looking ahead (beyond) Better Static Analysis through –Distributed computing Grid based Client server based

Looking ahead (beyond) More library support –MPI-2 Library One sided communication –PThread Library Mixed MPI/PThread concurrency More languages –FORTRAN –C++ Additional static analysis techniques

Can we get more performance? Can we phrase a performance bug as a safety property? –There does not exist a communication chain longer than N Is there a way to leverage formal methods to reduce synchronizations? Can formal methods help determine the right balance between MPI and PThreads for concurrency?

Questions?