Overview of Recent MCMD Developments Manojkumar Krishnan January CCA Forum Meeting Boulder.

Slides:



Advertisements
Similar presentations
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Advertisements

Kellan Hilscher. Definition Different perspectives on the components, behavioral specifications, and interactions that make up a software system Importance.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Contemporary Languages in Parallel Computing Raymond Hummel.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
CCA Forum Fall Meeting October CCA Common Component Architecture Update on TASCS Component Technology Initiatives CCA Fall Meeting October.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
1 Discussions on the next PAAP workshop, RIKEN. 2 Collaborations toward PAAP Several potential topics : 1.Applications (Wave Propagation, Climate, Reactor.
An Introduction to Software Architecture
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Dynamic Time Variant Connection Management for PGAS Models on InfiniBand Abhinav Vishnu 1, Manoj Krishnan 1 and Pavan Balaji 2 1 Pacific Northwest National.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CASC This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Lecture on Computer Science as a Discipline. 2 Computer “Science” some people argue that computer science is not a science in the same sense that biology.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Software Architecture and Design Dr. Aldo Dagnino ABB, Inc. US Corporate Research Center October 23 rd, 2003.
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Targets for project progress 2015: graduation review – clear documentation and PoC implementation specify general framework and API requirements gap analysis.
Multilevel Parallelism using Processor Groups Bruce Palmer Jarek Nieplocha, Manoj Kumar Krishnan, Vinod Tipparaju Pacific Northwest National Laboratory.
HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
1 IBM Software Group ® Mastering Object-Oriented Analysis and Design with UML 2.0 Module 9: Describe the Run-time Architecture.
Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Composition in Modeling Macromolecular Regulatory Networks Ranjit Randhawa September 9th 2007.
WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
4/27/2000 A Framework for Evaluating Programming Models for Embedded CMP Systems Niraj Shah Mel Tsai CS252 Final Project.
1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
VisIt Project Overview
Parallel Programming By J. H. Wang May 2, 2017.
CS 584 Lecture 3 How is the assignment going?.
CSC 480 Software Engineering
Parallel Algorithm Design
Hierarchical Architecture
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Operating Systems and Systems Programming
湖南大学-信息科学与工程学院-计算机与科学系
Multithreaded Programming
What is Concurrent Programming?
An Orchestration Language for Parallel Objects
Parallel Programming with ForkJoinPool Tasks in Java
Presentation transcript:

Overview of Recent MCMD Developments Manojkumar Krishnan January CCA Forum Meeting Boulder

MCMD Working Group 2007 activities focus on development of specifications for CCA-based processor groups teams BOFs held during CCA meetings in April and July, 2007 Mini-Workshop held January 24, 2007 Use cases documented and analyzed Wiki webpage and mailing list: Specifications document version 0.4 Discussed during Sept and Nov HPC initiative telecons Several other people sent good comments by Issues about threads, fault tolerant environment, MPI-centric narrative and examples, ID representation Recent developments Prototype implementation Application evaluation NWChem, subsurface

Multiple Component Multiple Data MCMD extends the SCMD (single component multiple data) model that was the main focus of CCA in Scidac-1 Prototype solution described at SC’05 for computational chemistry Allows different groups of processors execute different CCA components Main motivation for MCMD is support for multiple levels of parallelism in applications SCMD MCMD NWChem example SCMD MCMD

MCMD Use Cases Coop Parallelism Hierarchical Parallelism in Computational Chemistry Ab Initio Nuclear Structure Calculations Coupled Climate Modeling Molecular Dynamics, Multiphysics Simulations Fusion use-case described at Silver Springs Meeting

Target Execution Model and Global Ids Global id specification global id = + + +

Group Management Various execution models E.g. coop parallelism vs. single mpirun Programming Models Should be MPI-Friendly but also open to other models MPI, Threads, GAS models including GA, UPC, HPCS languages Global process and team ids Group translators

CCA Processor Teams We propose to use a slightly different term of process(or) teams rather than groups Avoid confusion with existing terminology and interfaces in programming models Some use cases call for something more general than MPI groups e.g., COOP with multiple mpiruns For example, CCA team can encompass a collection of processes in two different MPI jobs. We cannot construct a single MPI group corresponding to that. Operations on CCA teams might not have direct mapping to group operations in programming models that support groups MPI Job A MPI Job B MPI groups CCA Process Team

CCA Team Service How do initialize the application? COOP example makes it non-trivial Provides the following Create, destroy, compare, split teams More capabilities can be added as required Assigns global ids to tasks from one or more jobs running on one or more machines Global id = + +  Also, if we were to support threads at component level in the future Locality Information Gets the job id, machines id, task id of the given task

Example LandOcean Coupled System PVM ProcGroup GA ProcGroup MPI ProcGroup Ocean Model Land ModelI/O Global CCA Team PVM Job AMPI/GA Job B

Prototype Implementation – MCMD Specification Based on Spec 0.4 Version 0.4 available on wiki Please review and contribute Proof of concept It works! Not really high performance (Future work) MCMD Initializer MCMD TeamService (port?) and classes class Team ProcessID - to store global ID and other info Create and manage Teams and parallel jobs

MCMD Initialization One or more parallel jobs E.g. COOP style Init() All processes must participate MCMD Barrier and Locks File based Job file – input Similar to machinefile or hostfile MCMD environment initialized based on this information …

MCMD TeamService TeamService (port) create gRank globalID globalID2 gSize jobCount jobSize jobProcList Team proclist rank rank2 size split compare merge create destroy jobCount joblist jobSize jobProcList ProcessID rank machineId jobId procId

Molecular Dynamics Application How can applications effectively exploit the massive amount of h/w parallelism available in petaflop-scale machines? Massive numbers of CPUs in future systems require algorithm and software redesign to exploit all available parallelism Molecular Dynamics Example Multilevel parallelism Divide work into parts that can be executed concurrently on groups of processors Can exploit massive hardware parallelism Increases granularity of computation => improve the overall scalability MD Task 2 MD Task 1 MD Task 2MD Task 1

MCMD driver int32_t mcmd::Driver_impl::go_impl () { // DO-NOT-DELETE splicer.begin(mcmd.Driver.go) gov::cca::Port mcmdport = svc.getPort("mcmdport"); if(mcmdport._not_nil()) { mcmd::TeamService ts = ::babel_cast (mcmdport); // initialize XM’s mcmd service – implementation specific for now xm::TeamService ts_xm = ::babel_cast (ts); ts_xm.init(); int32_t jobcnt = ts.jobCount(); int32_t grank = ts.gRank(); int32_t gsize = ts.gSize(); ……. mcmd::Team t1 = ts.create(ranks, teamsize); ….. printf("%d: A new team is created. size=%d, rank=%d\n", grank, t1.size(), t1.rank());

Ongoing (and Future) work “Getting NWChem to Petascale” meeting Internal meeting – CS and Chemistry group. Most of the modules are group-aware NWChem components are now part of release (production version) Current implementation is not high performance Explore Low-level network API Elan, OpenIB, etc. No sockets! MCMD Application Generator Dynamic MCMD environment Similar to MPI-2 Task-based parallelism

Ongoing (and Future) work