Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco.

Slides:



Advertisements
Similar presentations
ECE-777 System Level Design and Automation Hardware/Software Co-design
Advertisements

Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
CS487 Software Engineering Omar Aldawud
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Symmetry-Aware Predicate Abstraction for Shared-Variable Concurrent Programs Alastair Donaldson, Alexander Kaiser, Daniel Kroening, and Thomas Wahl Computer.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Parallel Programming Models and Paradigms
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Experience with K42, an open- source, Linux-compatible, scalable operation-system kernel IBM SYSTEM JOURNAL, VOL 44 NO 2, 2005 J. Appovoo 、 M. Auslander.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Contemporary Languages in Parallel Computing Raymond Hummel.
Chapter 10: Architectural Design
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 1.
Overview of Eclipse Parallel Tools Platform Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
What is Concurrent Programming? Maram Bani Younes.
Task Farming on HPCx David Henty HPCx Applications Support
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.
CCA Forum Fall Meeting October CCA Common Component Architecture Update on TASCS Component Technology Initiatives CCA Fall Meeting October.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
1 Discussions on the next PAAP workshop, RIKEN. 2 Collaborations toward PAAP Several potential topics : 1.Applications (Wave Propagation, Climate, Reactor.
1/19 Component Design On-demand Learning Series Software Engineering of Web Application - Principles of Good Component Design Hunan University, Software.
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
An Introduction to Software Architecture
ITEA International Workshop on Challenges in Methodology, Representation, and Tooling for Automotive Embedded Systems, Berlin 2012 Target Mapping.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
CASC This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under.
Lecture 9: Chapter 9 Architectural Design
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Overview of Recent MCMD Developments Manojkumar Krishnan January CCA Forum Meeting Boulder.
Lecture on Computer Science as a Discipline. 2 Computer “Science” some people argue that computer science is not a science in the same sense that biology.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Marie desJardins University of Maryland, Baltimore County.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Chapter 13 Architectural Design
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Multilevel Parallelism using Processor Groups Bruce Palmer Jarek Nieplocha, Manoj Kumar Krishnan, Vinod Tipparaju Pacific Northwest National Laboratory.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
1 IBM Software Group ® Mastering Object-Oriented Analysis and Design with UML 2.0 Module 9: Describe the Run-time Architecture.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
Design and implementation Chapter 7 – Lecture 1. Design and implementation Software design and implementation is the stage in the software engineering.
Parallel Computing Presented by Justin Reschke
1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.
Why is Design so Difficult? Analysis: Focuses on the application domain Design: Focuses on the solution domain –The solution domain is changing very rapidly.
Chapter 9 Architectural Design. Why Architecture? The architecture is not the operational software. Rather, it is a representation that enables a software.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Introduction to Machine Learning, its potential usage in network area,
Parallel Programming By J. H. Wang May 2, 2017.
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Hierarchical Architecture
Advanced Operating Systems
CSE8380 Parallel and Distributed Processing Presentation
CS 584.
Chapter 9 Architectural Design.
Presentation transcript:

Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco

MCMD Working Group Recent activities focus on development of specifications for CCA-based processor groups teams BOFs held during CCA meetings in April and July, 2007 Mini-Workshop held January 24, 2007 Use cases documented and analyzed Wiki webpage and mailing list: Specifications document version 0.3 Telecon held Sept 28, 2007 Several other people sent good comments by Issues about threads, fault tolerant environment, MPI-centric narrative and examples, ID representation Plans Complete work on the spec document be end of 2007 Telecon, mailing list discussions and reviews Prototype implementation and some application evaluation NWChem, subsurface

Multilevel Parallelism How can applications effectively exploit the massive amount of h/w parallelism available in petaflop-scale machines? Massive numbers of CPUs in future systems require algorithm and software redesign to exploit all available parallelism Multilevel parallelism Divide work into parts that can be executed concurrently on groups of processors Can exploit massive hardware parallelism Increases granularity of computation => improve the overall scalability Task 2 Task 1 Task 2Task 1

Multiple Component Multiple Data MCMD extends the SCMD (single component multiple data) model that was the main focus of CCA in Scidac-1 Prototype solution described at SC’05 for computational chemistry Allows different groups of processors execute different CCA components Main motivation for MCMD is support for multiple levels of parallelism in applications SCMD MCMD NWChem example SCMD MCMD

MCMD Use Cases Coop Parallelism Hierarchical Parallelism in Computational Chemistry Ab Initio Nuclear Structure Calculations Coupled Climate Modeling Molecular Dynamics, Multiphysics Simulations Fusion use-case described at Silver Springs Meeting

Target Execution Model and Global Ids Global id specification global id = + + +

Group Management Various execution models E.g. coop parallelism vs. single mpirun Programming Models Should be MPI-Friendly but also open to other models MPI, Threads, GAS models including GA, UPC, HPCS languages Global process and team ids Group translators

CCA Processor Teams We propose to use a slightly different term of process(or) teams rather than groups Avoid confusion with existing terminology and interfaces in programming models Some use cases call for something more general than MPI groups e.g., COOP with multiple mpiruns For example, CCA team can encompass a collection of processes in two different MPI jobs. We cannot construct a single MPI group corresponding to that. Operations on CCA teams might not have direct mapping to group operations in programming models that support groups MPI Job A MPI Job B MPI groups CCA Process Team

CCA Team Service How do initialize the application? COOP example makes it non-trivial Provides the following Create, destroy, compare, split teams More capabilities can be added as required Assigns global ids to tasks from one or more jobs running on one or more machines Global id = + +  Also, if we were to support threads at component level in the future Locality Information Gets the job id, machines id, task id of the given task

Plugins CCA Team MPI Group Service GA Group PVM Group Interoperable GroupService Layer MPI Group Service GA Group Service PVM Group Service XYZ Prog Model’s Group Service CCA Team Service Provide mappings between CCA teams and task/image/thread groups for programming models components written in

Example LandOcean Coupled System PVM ProcGroup GA ProcGroup MPI ProcGroup Ocean Model Land ModelI/O Global CCA Team PVM Job AMPI/GA Job B

Specification Document Version 0.3 on wiki (Word, PDF) Please review and contribute Looking at candidate applications and component s/w for initial evaluation Numerical, I/O

Issues from the Telecon Eliminate threads from the spec + Add more emphasis on mixing multiple programming models + How do we handle global ids ? Pros and cons of using integers Conclusion is to use "global ids" as objects and introduce a new representaion called "global ranks". Need for dynamic team management

Dynamic Behavior We want to support dynamic nature of applications Application composed of parallel jobs that are launched and complete at different stages of application execution Fault tolerance in style of FT-MPI Adaptation to faults Teams can shrink/expand. Cannot count of persistency of values returned by team service calls.