MPI 3 RMA Bill Gropp and Rajeev Thakur. 2 Why Change RMA? Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations.

Slides:



Advertisements
Similar presentations
Proposal (More) Flexible RMA Synchronization for MPI-3 Hubert Ritzdorf NEC–IT Research Division
Advertisements

MPI3 RMA William Gropp Rajeev Thakur. 2 MPI-3 RMA Presented an overview of some of the issues and constraints at last meeting Homework - read Bonachea's.
C. Bell, D. Bonachea, R. Nishtala, and K. Yelick, 1Berkeley UPC: Optimizing Bandwidth Limited Problems Using One-Sided Communication.
Parallel Processing with OpenMP
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
SE-292 High Performance Computing
Types of Parallel Computers
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
The Triumph of Hope over Experience * ? Bill Gropp *Samuel Johnson.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen, Dan Bonachea, Jason Duell, Parry Husbands, Costin Iancu,
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1: the Berkeley UPC Experience Christian Bell and Wei Chen CS252.
Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
RFC: Breaking Free from the Collective Requirement for HDF5 Metadata Operations.
Parallel Computer Architecture and Interconnect 1b.1.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Software Caching for UPC Wei Chen Jason Duell Jimmy Su Spring 2003.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Release Consistency Yujia Jin 2/27/02. Motivations Place partial order on memory accesses for correct parallel program behavior Relax partial order for.
MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.
CS533 Concepts of Operating Systems Jonathan Walpole.
CS 732: Advance Machine Learning
Understanding Parallel Computers Parallel Processing EE 613.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Parallel Computing Presented by Justin Reschke
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
UPC at NERSC/LBNL Kathy Yelick, Christian Bell, Dan Bonachea,
Chapter 4: Threads.
Lecture 5 Approaches to Concurrency: The Multiprocessor
Software Coherence Management on Non-Coherent-Cache Multicores
Parallel I/O Optimizations
Lecture 18: Coherence and Synchronization
Constructing a system with multiple computers or processors
Chapter 4: Threads.
Cache Coherence Protocols:
Cache Coherence Protocols:
Chapter 4: Threads.
Shared Memory Consistency Models: A Tutorial
Interconnect with Cache Coherency Manager
Simulation of computer system
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Alternative Processor Panel Results 2008
Thread Synchronization
Constructing a system with multiple computers or processors
Lecture 25: Multiprocessors
High Performance Computing
EE 4xx: Computer Architecture and Performance Programming
15-740/ Computer Architecture Lecture 14: Prefetching
Lecture 25: Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Types of Parallel Computers
COT 5611 Operating Systems Design Principles Spring 2014
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

MPI 3 RMA Bill Gropp and Rajeev Thakur

2 Why Change RMA? Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations mpi.pdf mpi.pdf But note the assumption of cache coherence Mismatch with hardware evolution Or not - will GPGPUs be cache-coherent? 1000 core processors? Lack of support for classical shared-memory operations (including the misnamed Win_lock); active messages Lack of use by MPI programmers

3 Possible Topics Read-modify-update for MPI RMA Blocking RMA routines (with implied sync); what is memory consistency model? Respond to Dan Bonachea's paper on why MPI RMA unsuited to UPC with enhancements (such as a different "window" model). Probably includes light-weight replacements Changes to MPI RMA for cache coherent systems (good idea or is it too late?) Other RMA models (remote mmap?)

4 Scope must include: MPI "flavor" Portability and Ubiquity Performance on the fastest platforms Memory consistency with THREAD_MULTIPLE