Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.

Slides:



Advertisements
Similar presentations
Introduction to C++ An object-oriented language Unit - 01.
Advertisements

COS 461 Fall 1997 Network Objects u first good implementation: DEC SRC Network Objects for Modula-3 u recent implementation: Java RMI (Remote Method Invocation)
C Language.
Programming Languages and Paradigms The C Programming Language.
Programming Paradigms Introduction. 6/15/2005 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved. L1:
RPC Robert Grimm New York University Remote Procedure Calls.
Overview of Data Structures and Algorithms
Review for Final Exam Dilshad M. NYU. In this review Arrays Pointers Structures Java - some basic information.
Charm++ Arrays, Parameter Marshalling, Load Balancing and what they have in common-- PUP 9/1/2001 Orion Sky Lawlor PPL Developer Bachelor's in Math and.
Chapter 9 Imperative and object-oriented languages 1.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Chapter 5 - Arrays CSC 200 Matt Kayala 2/27/06. Learning Objectives  Introduction to Arrays  Declaring and referencing arrays  For-loops and arrays.
1 Introduction to C++ Programming Concept Basic C++ C++ Extension from C.
1 Procedural Concept The main program coordinates calls to procedures and hands over appropriate data as parameters.
C++ fundamentals.
OOP Languages: Java vs C++
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
1 Charm++ on the Cell Processor David Kunzman, Gengbin Zheng, Eric Bohm, Laxmikant V. Kale.
Programming Languages and Paradigms Object-Oriented Programming.
Java and C++, The Difference An introduction Unit - 00.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Introduction of C++ language. C++ Predecessors Early high level languages or programming languages were written to address a particular kind of computing.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Advanced / Other Programming Models Sathish Vadhiyar.
1 Programs Composed of Several Functions Syntax Templates Legal C++ Identifiers Assigning Values to Variables Declaring Named Constants String Concatenation.
1 C++ Syntax and Semantics, and the Program Development Process.
1 How Charm works its magic Laxmikant Kale Parallel Programming Laboratory Dept. of Computer Science University of Illinois at.
CPS120: Introduction to Computer Science Decision Making in Programs.
CPS120: Introduction to Computer Science Functions.
Chapter 6 Programming Languages (1) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
CPS120: Introduction to Computer Science Lecture 14 Functions.
1 Basic Charm++ and Load Balancing Gengbin Zheng charm.cs.uiuc.edu 10/11/2005.
Charm++ Tutorial Presented by: Laxmikant V. Kale Kumaresh Pattabiraman Chee Wai Lee.
User Defined Functions Chapter 7 2 Chapter Topics Void Functions Without Parameters Void Functions With Parameters Reference Parameters Value and Reference.
C++ Programming Basic Learning Prepared By The Smartpath Information systems
Copyright 2006 Oxford Consulting, Ltd1 February Polymorphism Polymorphism Polymorphism is a major strength of an object centered paradigm Same.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
1 Chapter 2 C++ Syntax and Semantics, and the Program Development Process.
1 CS161 Introduction to Computer Science Topic #9.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Salman Marvasti Sharif University of Technology Winter 2015.
 Objects versus Class  Three main concepts of OOP ◦ Encapsulation ◦ Inheritance ◦ Polymorphism  Method ◦ Parameterized ◦ Value-Returning.
Concurrency, Processes, and System calls Benefits and issues of concurrency The basic concept of process System calls.
Slide 1 Chapter 5 Arrays. Slide 2 Learning Objectives  Introduction to Arrays  Declaring and referencing arrays  For-loops and arrays  Arrays in memory.
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Implementation: Charm++ Orion Sky Lawlor
1 Charm++ Tutorial Parallel Programming Laboratory, UIUC.
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
1 Classes II Chapter 7 2 Introduction Continued study of –classes –data abstraction Prepare for operator overloading in next chapter Work with strings.
Programming an SMP Desktop using Charm++ Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
OOP Basics Classes & Methods (c) IDMS/SQL News
1 Charm++ Tutorial Parallel Programming Laboratory, UIUC.
1 Creating Simulations with POSE Terry L. Wilmarth, Nilesh Choudhury, David Kunzman, Eric Bohm Parallel Programming Laboratory University of Illinois at.
1 Charm++ Tutorial Parallel Programming Laboratory, UIUC.
Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
MPI Message Passing Interface
Implementing Chares in a High-Level Scripting Language
Realizing Concurrency using Posix Threads (pthreads)
Realizing Concurrency using the thread model
An Orchestration Language for Parallel Objects
Presentation transcript:

Charm++ Data-driven Objects L. V. Kale

Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling (sequencing) –On each processor Machine dependent expression –Express the above decisions for the particular parallel machine The parallel objects model of Charm++ automates Mapping, Scheduling, and machine dependent expression

Shared objects model: Basic philosophy: –Let the programmer decide what to do in parallel –Let the system handle the rest: Which processor executes what, and when With some override control to the programmer, when needed Basic model: –The program is set of communicating objects –Objects only know about other objects (not processors) –System maps objects to processors And may remap the objects for load balancing etc. dynamically Shared objects, not shared memory –in-between “shared nothing” message passing, and “shared everything” of SAS –Additional information sharing mechanisms –“Disciplined” sharing

Charm++ Charm++ programs specify parallel computations consisting of a number of “objects” –How do they communicate? By invoking methods on each other, typically asynchronously Also by sharing data using “specifically shared variables” –What kinds of objects? Chares: singleton objects Chare arrays: generalized collections of objects Advanced: Chare group (Used by library writers, system)

Data Driven Execution in Charm++ Scheduler Message Q Objects

Need for Proxies Consider: –Object x of class A wants to invoke method f of obj y of class B. –x and y are on different processors –what should the syntax be? y->f( …)? : doesn’t work because y is not a local pointer Needed: –Instead of “y” we must use an ID that is valid across processors –Method Invocation should use this ID –Some part of the system must pack the parameters and send them –Some part of the system on the remote processor must invoke the right method on the right object with the parameters supplied

Charm++ solution: proxy classes Classes with remotely invokeable methods –inherit from “chare” class (system defined) –entry methods can only have one parameter: a subclass of message For each chare class D –which has methods that we want to remotely invoke –The system will automatically generate a proxy class Cproxy_D –Proxy objects know where the real object is –Methods invoked on this class simply put the data in an “envelope” and send it out to the destination Each chare object has a proxy –CProxy_D thisProxy; // thisProxy inherited from “CBase_D” –Also you can get a proxy for a chare when you create it: CProxy_D myNewChare = CProxy_D::ckNew(arg);

Chare creation and method invocation CProxy_D x = CProxy_D::ckNew(25); x.f(5,7); Sequential equivalent: y = new D(25); y->f(5,7);

Chares (Data driven Objects) Regular C++ classes, –with some methods designated as remotely invokable (called entry methods ) Creation: of an instance of chare class C –CProxy_C myChareProxy = CProxy_C::ckNew(args); –Creates an instance of C on a specified processor “pe” CProxy_C::ckNew (args, pe); –Cproxy_C: a proxy class generated by Charm for chare class C declared by the user

Remote method invocation Proxy Classes: –For each chare class C, the system generates a proxy class. (C : CProxy_C) –Global: in the sense of being valid on all processors –thisProxy (analogous to this) gets you your own proxy –You can send proxies in messages –Given a proxy p, you can invoke methods: –p.method(msg);

CProxy_main mainProxy; main::main(CkArgMsg * m) { int i = 0; for (i=0; i<100; i++) new CProxy_piPart(); responders = 100; count = 0; mainProxy = thisProxy; // readonly initialization } void main::results(int pcount) { count += pcount; if (0 == --responders) { cout << "pi=: “ << 4.0*count/ << endl; CkExit(); } argc/argv Execution begins here Exit the program

piPart::piPart() { // declarations.. srand48((long) this); mySamples = /100; for (i= 0; i<= mySamples; i++) { x = drand48(); y = drand48(); if ((x*x + y*y) <= 1.0) localCount++; } mainProxy.results(localCount); delete this; }

Generation of proxy classes How does charm generate the proxy classes? –Needs help from the programmer –name classes and methods that can be remotely invoked –declare this in a special “charm interface” file (pgm.ci) –Include the generated code in your program pgm.ci mainmodule PiMod { mainchare main { entry main(); entry results(int pc); }; chare piPart { entry piPart(void); }; Generates PiMod.def.h pgm.h #include “PiMod.decl.h”.. Pgm.c … #include “PiMod.def.h”

Charm++ Data Driven Objects Message classes Asynchronous method invocation Prioritized scheduling Object Arrays Object Groups: –global object with a “representative” on each PE Information sharing abstractions –readonly data –accumulators –distributed tables

Object Arrays A collection of chares, –with a single global name for the collection, and –each member addressed by an index –Mapping of element objects to processors handled by the system A[0]A[1]A[2]A[3]A[..] A[3]A[0] User’s view System view

Introduction Elements are parallel objects like chares Elements are indexed by a user-defined data type-- [sparse] 1D, 2D, 3D, tree,... Send messages to index, receive messages at element. Reductions and broadcasts across the array Dynamic insertion, deletion, migration-- and everything still has to work! Interfaces with automatic load balancer.

module m { array [1D] Hello { entry Hello(void); entry void SayHi(int HiData); }; //Create an array of Hello’s with 4 elements: int nElements=4; CProxy_Hello p = CProxy_Hello::ckNew(nElements); //Have element 2 say “hi” P[2].SayHi(12345); 1D Declare & Use In the interface (.ci) file In the.C file

1D Definition class Hello:public CBase_Hello{ public: Hello(void) { … thisProxy … … thisIndex … } void SayHi(int m) { if (m <1000) thisProxy[thisIndex+1].SayHi(m+1); } Hello(CkMigrateMessage *m) {} }; Inherited from ArrayElement1D

module m { array [3D] Hello { entry Hello(void); entry void SayHi(int HiData); }; CProxy_Hello p= CProxy_Hello::ckNew(); for (int i=0;i<800000;i++) p(x(i),y(i),z(i)).insert(); p.doneInserting(); p(12,23,7).SayHi( 34); 3D Declare & Use

3D Definition class Hello:public CBase_Hello{ public: Hello(void) {... thisProxy thisIndex.x, thisIndex.y, thisIndex.z... } void SayHi(int HiData) {... } Hello(CkMigrateMessage *m) {} };

Pup Routine void pup(PUP::er &p) { // Call our superclass’s pup routine: ArrayElement3D::pup(p); p|myVar1;p|myVar2;... }

module m{ array [Foo] Hello { entry Hello(void); entry void SayHi(int data); }; CProxy_Hello p= CProxy_Hello::ckNew(); for (...) p[CkArrayIndexFoo(..)].insert(); p.doneInserting(); p[CkArrayIndexFoo(..)].SayHi(..); Generalized “arrays”: Declare & Use

class Hello:public CBase_Hello { public: Hello(void) {... thisIndex... class CkArrayIndexFoo: public CkArrayIndex { Bar b; //char b[8]; float b[2];.. public: CkArrayIndexFoo(...) {... nInts=sizeof(b)/sizeof(int); } }; General Definition

Broadcast message SayHi: p.SayHi(data); Reduce x across all elements: contribute(sizeof(x),&x,CkReduction::sum_int,c b); Where do reduction results go? To a “callback” function, named cb above: // Call some function foo with fooData when done: CkCallback cb(foo,fooData); // Broadcast the results to my method “bar” when done: CkCallback cb(CkIndex_MyArray::bar,thisProxy); Collective ops

Delete element i: p[i].destroy(); Migrate to processor destPe: migrateMe(destPe); Enable load balancer: by creating a load balancing object Provide pack/unpack functions: Each object that needs this, provides a “pup” method. (pup is a single abstraction that allows data traversal for determining size, packing and unpacking) Migration support

Object Groups A group of objects (chares) –with exactly one representative on each processor –A single proxy for the group as a whole –invoke methods in a branch (asynchronously), all branches (broadcast), or in the local branch –creation: agroup = Cproxy_C::ckNew(msg) –remote invocation: p.methodName(msg); // p.methodName(msg, peNum); p.ckLocalBranch()->f(….);

Information sharing abstractions Observation: –Information is shared in several specific modes in parallel programs Other models support only a limited sets of modes: –Shared memory: everything is shared: sledgehammer approach –Message passing: messages are the only method Charm++: identifies and supports several modes –Readonly / writeonce –Tables (hash tables) –accumulators –Monotonic variables

Compiling Charm++ programs Need to define an interface specification file –mod.ci for each module mod –Contains declarations that the system uses to produce proxy classes –These produced classes must be included in your mod.C file –See examples provided on the class web site. More information: –Manuals, example programs, papers These slides are currently at: –

Fortran 90 version Quick implementation on top of Charm++ How to use: –follow example program, with the same basic concepts –Only use object arrays, for now Most useful construct Object groups can be implemented in C++, if needed

Further Reading More information: –Manuals, example programs, papers These slides are currently at: –