1 Converse BlueGene Emulator Gengbin Zheng Parallel Programming Lab 2/27/2001.

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

Threads Relation to processes Threads exist as subsets of processes Threads share memory and state information within a process Switching between threads.

Practical techniques & Examples

13/04/2015Client-server Programming1 Block 6: Threads 1 Jin Sa.

Mutual Exclusion The Art of Multiprocessor Programming Spring 2007.

AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 6: Threads Chapter 4.

Threads By Dr. Yingwu Zhu. Review Multithreading Models Many-to-one One-to-one Many-to-many.

Lab2 TA Notes. Set up for Visual C++ New Workspace Win32 Console Application – This runs from a command line Chose Blank Project Add your.C file Build.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Outline Unix Architecture Vi Fork Pthread. UNIX Architecture.

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.

1 Charm++ on the Cell Processor David Kunzman, Gengbin Zheng, Eric Bohm, Laxmikant V. Kale.

A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.

Project 2 Data Communication Spring 2010, ICE Stephen Kim, Ph.D.

Parallel Processing LAB NO 1.

CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.

Chapter 4 Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.

Thread. A basic unit of CPU utilization. It comprises a thread ID, a program counter, a register set, and a stack. It is a single sequential flow of control.

Distributed-Memory Programming Using MPIGAP Vladimir Janjic International Workhsop “Parallel Programming in GAP” Aug 2013.

PRINCIPLES OF OPERATING SYSTEMS Lecture 6: Processes CPSC 457, Spring 2015 May 21, 2015 M. Reza Zakerinasab Department of Computer Science, University.

Introduction to Charm++ Machine Layer Gengbin Zheng Parallel Programming Lab 4/3/2002.

Charm++ Workshop 2008 BigSim Tutorial Presented by Eric Bohm Charm++ Workshop 2008 Parallel Programming Laboratory University of Illinois at Urbana-Champaign.

Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.

1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.

Share Memory Program Example int array_size=1000 int global_array[array_size] main(argc, argv) { int nprocs=4; m_set_procs(nprocs); /* prepare to launch.

CS333 Intro to Operating Systems Jonathan Walpole.

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University.

University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Real-Time Systems Lecture 3 Teachers: Olle Bowallius Phone: Anders Västberg Phone:

Shan Gao Fall 2007 Department of Computer Science Georgia State University.

MPI and OpenMP.

A Preliminary Investigation on Optimizing Charm++ for Homogeneous Multi-core Machines Chao Mei 05/02/2008 The 6 th Charm++ Workshop.

IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,

Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.

EIToolkit stub doc. main.cpp file int main(int argc, char* argv[]) { EIProperties::UseStandards(); // create the stub ExampleStub stub; stub.Start();

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

BigSim Tutorial Presented by Gengbin Zheng and Eric Bohm Charm++ Workshop 2004 Parallel Programming Laboratory University of Illinois at Urbana-Champaign.

BigSim Tutorial Presented by Eric Bohm LACSI Charm++ Workshop 2005 Parallel Programming Laboratory University of Illinois at Urbana-Champaign.

2.2 Threads  Process: address space + code execution  There is no law that states that a process cannot have more than one “line” of execution.  Threads:

Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a.

Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.

Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.

Thread & Processor Scheduling

Accelerating Large Charm++ Messages using RDMA

CS399 New Beginnings Jonathan Walpole.

Thread Programming.

uGNI-based Charm++ Runtime for Cray Gemini Interconnect

Stack Lesson xx This module shows you the basic elements of a type of linked list called a stack.

Using compiler-directed approach to create MPI code automatically

Gengbin Zheng and Eric Bohm

Stack buffer overflow.

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.

CS510 Operating System Foundations

Operating System Concepts

Hybrid Parallel Programming

Jonathan Walpole Computer Science Portland State University

Operating Systems Threads 1.

CSCE569 Parallel Computing

MPI MPI = Message Passing Interface

BigSim: Simulating PetaFLOPS Supercomputers

A Map-Reduce System with an Alternate API for Multi-Core Environments

Hybrid Parallel Programming

Emulating Massively Parallel (PetaFLOPS) Machines

Presentation transcript:

1 Converse BlueGene Emulator Gengbin Zheng Parallel Programming Lab 2/27/2001

2 Objective Completely rewritten the previous Charm++ Blue Gene emulator; Bluegene emulator for architecture studying (PetaFLOPS computers); Performance estimation (with proper time stamping) Provide API for building Charm++ on top of it.

3 Big picture - emulator Emulator Processor Node(x 2,y 2,z 2 )Node(x 3,y 3,z 3 )  34x34x36 nodes  25 processors per node  8 threads per processor Node(x 1,y 1,z 1 )

4 Bluegene Emulator Node Structure Communication threads Non-affinity message queue Affinity message queue Worker thread inBuffer

5 Communication Threads Communication threads get messages from inbuffer –If small work, execute the task itself. –If affinity message, put to the thread’s local queue; –If non-affinity message, put to the node queue;

6 Worker threads Worker threads examine messages from two queues: affinity queue and non-affinity queue; Compare the receive-time of two messages and pick the one that comes first and execute it;

7 Low-level API Class NodeInfo: id, x, y, z, udata, commThQ, workThQ Class ThreadInfo: (thread private variable) id, type, myNode, currTime Class BgMessage: node, threadID, handlerID, type, sendTime, recvTime, data getFullBuffer() checkReady() addBgNodeMessage() addBgThreadMessage() sendPacket()

8 User’s API BgGetXYZ() BgGetSize(), BgSetSize() BgGetNumWorkThread(), BgSetNumWorkThread() BgGetNumCommThread(), BgSetNumCommThread() BgRegisterHandler() BgGetNodeData(), BgSetNodeData() BgGetThreadID(), BgGetGlobalThreadID() BgGetTime() BgSendPacket(), etc BgShutdown() BgEmulatorInit(), BgNodeStart()

9 Bluegene application example - Ring void BgEmulatorInit(int argc, char **argv) { if (argc \n"); BgSetSize(atoi(argv[1]), atoi(argv[2]), atoi(argv[3])); BgSetNumCommThread(atoi(argv[4])); BgSetNumWorkThread(atoi(argv[5])); passRingID = BgRegisterHandler(passRing); } void BgNodeStart(int argc, char **argv) { int x,y,z; int nx, ny, nz; int data=888; BgGetXYZ(&x, &y, &z); nextxyz(x, y, z, &nx, &ny, &nz); if (x == 0 && y==0 && z==0) BgSendPacket(nx, ny, nz, passRingID, LARGE_WORK, sizeof(int), (char *)&data); } void passRing(char *msg) { int x, y, z; int nx, ny, nz; int data = *(int *)msg; BgGetXYZ(&x, &y, &z); nextxyz(x, y, z, &nx, &ny, &nz); if (x==0 && y==0 && z==0) if (++iter == MAXITER) BgShutdown(); BgSendPacket(nx, ny, nz, passRingID, LARGE_WORK, sizeof(int), (char *)&data); }

10 Performance Pingpong –Close to Converse pingpong; us v.s. 92 us RTT –Charm++ pingpong 116 us RTT –Charm++ Bluegene pingpong us RTT

11 Charm++ on top of Emulator BlueGene thread represents Charm++ node; Name conflict: –Cpv, Ctv –MsgSend, etc –CkMyPe(), CkNumPes(), etc