1 Converse BlueGene Emulator Gengbin Zheng Parallel Programming Lab 2/27/2001.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Threads Relation to processes Threads exist as subsets of processes Threads share memory and state information within a process Switching between threads.
Practical techniques & Examples
13/04/2015Client-server Programming1 Block 6: Threads 1 Jin Sa.
Mutual Exclusion The Art of Multiprocessor Programming Spring 2007.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 6: Threads Chapter 4.
Threads By Dr. Yingwu Zhu. Review Multithreading Models Many-to-one One-to-one Many-to-many.
Lab2 TA Notes. Set up for Visual C++ New Workspace Win32 Console Application – This runs from a command line Chose Blank Project Add your.C file Build.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Outline Unix Architecture Vi Fork Pthread. UNIX Architecture.
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.
1 Charm++ on the Cell Processor David Kunzman, Gengbin Zheng, Eric Bohm, Laxmikant V. Kale.
A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.
Project 2 Data Communication Spring 2010, ICE Stephen Kim, Ph.D.
Parallel Processing LAB NO 1.
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Chapter 4 Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Thread. A basic unit of CPU utilization. It comprises a thread ID, a program counter, a register set, and a stack. It is a single sequential flow of control.
Distributed-Memory Programming Using MPIGAP Vladimir Janjic International Workhsop “Parallel Programming in GAP” Aug 2013.
PRINCIPLES OF OPERATING SYSTEMS Lecture 6: Processes CPSC 457, Spring 2015 May 21, 2015 M. Reza Zakerinasab Department of Computer Science, University.
Introduction to Charm++ Machine Layer Gengbin Zheng Parallel Programming Lab 4/3/2002.
Charm++ Workshop 2008 BigSim Tutorial Presented by Eric Bohm Charm++ Workshop 2008 Parallel Programming Laboratory University of Illinois at Urbana-Champaign.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.
Share Memory Program Example int array_size=1000 int global_array[array_size] main(argc, argv) { int nprocs=4; m_set_procs(nprocs); /* prepare to launch.
CS333 Intro to Operating Systems Jonathan Walpole.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University.
University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Real-Time Systems Lecture 3 Teachers: Olle Bowallius Phone: Anders Västberg Phone:
Shan Gao Fall 2007 Department of Computer Science Georgia State University.
MPI and OpenMP.
A Preliminary Investigation on Optimizing Charm++ for Homogeneous Multi-core Machines Chao Mei 05/02/2008 The 6 th Charm++ Workshop.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Charm++ overview L. V. Kale. Parallel Programming Decomposition – what to do in parallel –Tasks (loop iterations, functions,.. ) that can be done in parallel.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
EIToolkit stub doc. main.cpp file int main(int argc, char* argv[]) { EIProperties::UseStandards(); // create the stub ExampleStub stub; stub.Start();
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
BigSim Tutorial Presented by Gengbin Zheng and Eric Bohm Charm++ Workshop 2004 Parallel Programming Laboratory University of Illinois at Urbana-Champaign.
BigSim Tutorial Presented by Eric Bohm LACSI Charm++ Workshop 2005 Parallel Programming Laboratory University of Illinois at Urbana-Champaign.
2.2 Threads  Process: address space + code execution  There is no law that states that a process cannot have more than one “line” of execution.  Threads:
Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a.
Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
Thread & Processor Scheduling
Accelerating Large Charm++ Messages using RDMA
CS399 New Beginnings Jonathan Walpole.
Thread Programming.
uGNI-based Charm++ Runtime for Cray Gemini Interconnect
Stack Lesson xx   This module shows you the basic elements of a type of linked list called a stack.
Using compiler-directed approach to create MPI code automatically
Gengbin Zheng and Eric Bohm
Stack buffer overflow.
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.
CS510 Operating System Foundations
Operating System Concepts
Hybrid Parallel Programming
Jonathan Walpole Computer Science Portland State University
Operating Systems Threads 1.
CSCE569 Parallel Computing
MPI MPI = Message Passing Interface
BigSim: Simulating PetaFLOPS Supercomputers
A Map-Reduce System with an Alternate API for Multi-Core Environments
Hybrid Parallel Programming
Emulating Massively Parallel (PetaFLOPS) Machines
Presentation transcript:

1 Converse BlueGene Emulator Gengbin Zheng Parallel Programming Lab 2/27/2001

2 Objective Completely rewritten the previous Charm++ Blue Gene emulator; Bluegene emulator for architecture studying (PetaFLOPS computers); Performance estimation (with proper time stamping) Provide API for building Charm++ on top of it.

3 Big picture - emulator Emulator Processor Node(x 2,y 2,z 2 )Node(x 3,y 3,z 3 )  34x34x36 nodes  25 processors per node  8 threads per processor Node(x 1,y 1,z 1 )

4 Bluegene Emulator Node Structure Communication threads Non-affinity message queue Affinity message queue Worker thread inBuffer

5 Communication Threads Communication threads get messages from inbuffer –If small work, execute the task itself. –If affinity message, put to the thread’s local queue; –If non-affinity message, put to the node queue;

6 Worker threads Worker threads examine messages from two queues: affinity queue and non-affinity queue; Compare the receive-time of two messages and pick the one that comes first and execute it;

7 Low-level API Class NodeInfo: id, x, y, z, udata, commThQ, workThQ Class ThreadInfo: (thread private variable) id, type, myNode, currTime Class BgMessage: node, threadID, handlerID, type, sendTime, recvTime, data getFullBuffer() checkReady() addBgNodeMessage() addBgThreadMessage() sendPacket()

8 User’s API BgGetXYZ() BgGetSize(), BgSetSize() BgGetNumWorkThread(), BgSetNumWorkThread() BgGetNumCommThread(), BgSetNumCommThread() BgRegisterHandler() BgGetNodeData(), BgSetNodeData() BgGetThreadID(), BgGetGlobalThreadID() BgGetTime() BgSendPacket(), etc BgShutdown() BgEmulatorInit(), BgNodeStart()

9 Bluegene application example - Ring void BgEmulatorInit(int argc, char **argv) { if (argc \n"); BgSetSize(atoi(argv[1]), atoi(argv[2]), atoi(argv[3])); BgSetNumCommThread(atoi(argv[4])); BgSetNumWorkThread(atoi(argv[5])); passRingID = BgRegisterHandler(passRing); } void BgNodeStart(int argc, char **argv) { int x,y,z; int nx, ny, nz; int data=888; BgGetXYZ(&x, &y, &z); nextxyz(x, y, z, &nx, &ny, &nz); if (x == 0 && y==0 && z==0) BgSendPacket(nx, ny, nz, passRingID, LARGE_WORK, sizeof(int), (char *)&data); } void passRing(char *msg) { int x, y, z; int nx, ny, nz; int data = *(int *)msg; BgGetXYZ(&x, &y, &z); nextxyz(x, y, z, &nx, &ny, &nz); if (x==0 && y==0 && z==0) if (++iter == MAXITER) BgShutdown(); BgSendPacket(nx, ny, nz, passRingID, LARGE_WORK, sizeof(int), (char *)&data); }

10 Performance Pingpong –Close to Converse pingpong; us v.s. 92 us RTT –Charm++ pingpong 116 us RTT –Charm++ Bluegene pingpong us RTT

11 Charm++ on top of Emulator BlueGene thread represents Charm++ node; Name conflict: –Cpv, Ctv –MsgSend, etc –CkMyPe(), CkNumPes(), etc