Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 IBM Corporation Blue Heron Project IBM Rochester: Tom Budnik: Amanda Peters: Condor: Greg Thain With contributions.

Similar presentations


Presentation on theme: "© 2008 IBM Corporation Blue Heron Project IBM Rochester: Tom Budnik: Amanda Peters: Condor: Greg Thain With contributions."— Presentation transcript:

1

2 © 2008 IBM Corporation Blue Heron Project IBM Rochester: Tom Budnik: tbudnik@us.ibm.com Amanda Peters: apeters@us.ibm.com Condor: Greg Thain With contributions from: IBM Rochester: Mark Megerian, Sam Miller, Brant Knudson and Mike Mundy Other IBMers: Patrick Carey, Abbas Farazdel, Maria Iordache and Alex Zekulin UW-Madison Condor: Dr. Miron Livny April 30, 2008

3 © 2008 IBM Corporation 2 Agenda  What is the Blue Heron Project?  Condor and IBM Blue Gene Collaboration  Introduction to Blue Gene/P  What applications fit the Blue Heron model?  How does Blue Heron work?  Information Sources  Condor on BG/P demo (Greg Thain)

4 © 2008 IBM Corporation 3 What is the Blue Heron Project? Blue Gene Environment Serial and Pleasantly Parallel Apps Highly Scalable Msg Passing Apps Paths Toward a General Purpose Machine *** NEW *** Available 5/16/08 HTCHPC (MPI) Blue Heron = Blue Gene/P HTC and Condor Blue Heron provides a complete integrated solution that gives users a simple, flexible mechanism for submitting single-node jobs.  Blue Gene looks like a "cluster" from an app’s point of view  Blue Gene supports hybrid application environment  Classic HPC (MPI) apps and now HTC apps

5 © 2008 IBM Corporation 4 and Blue Gene Collaboration  Both IBM and Condor teams engaged in adapting code to bring Condor and Blue Gene technologies together  Previous Activities (BG/L) Prototype/research Condor running HTC workloads  Current Activities (BG/P) Blue Heron Project  Partner in design of HTC services  Condor supports HTC workloads using static partitions  Future Collaboration (BG/P and BG/Q) Condor supports dynamic machine partitioning Condor supports HPC (MPI) jobs I/O Node exploitation with Condor Persistent memory support (data affinity scheduling) Petascale environment issues

6 © 2008 IBM Corporation 5 Introduction to Blue Gene Technology Roadmap 2004 2007 Blue Gene/P PPC 450 @ 850MHz Scalable to 3+ PF Blue Gene/Q Blue Gene/L PPC 440 @ 700MHz Scalable to 596+ TF BG/P is the 2 nd Generation of the Blue Gene Family

7 © 2008 IBM Corporation 6 Introduction to Blue Gene/P Chip 13.6 GF/s 8 MB EDRAM 4 processors 1 chip, 20 DRAMs 13.6 GF/s 2 or 4 GB DDR2 32 Node Cards up to 64x10 GigE I/O links 14 TF/s 2 or 4 TB up to 3.56 PF/s 512 or 1024 TB Cabled Rack System Compute Card 435 GF/s 64 or 128 GB 32 Compute Cards up to 2 I/O cards Node Card Leadership performance in a space-saving, processor dense, power-efficient package. High reliability: Designed for less then 1 failure per rack per year (7 days MTBF for 72 racks). Easy administration using the powerful web based Blue Gene Navigator. Ultrascale capacity machine (“cluster buster”): run 4,096 HTC jobs on a single rack. The system scales from 1 to 256 racks: 3.56 PF/s peak Quad-Core PowerPC System-on-Chip up to 256 racks

8 © 2008 IBM Corporation 7 What applications fit the Blue Heron model?  Master/Worker Paradigm:  Many “pleasantly parallel” apps on BG/P use a compute node as the “master node”  Advantage of Blue Heron (HTC) Solution:  Move the “master node” from a Blue Gene compute node to the Front-End Node (FEN). This is a better solution for the following reasons:  Application resiliency: In MPI model a single node failure kills entire app for the partition. In HTC mode only the job running on the failed node is ended, other single node jobs continue to run on partition.  FEN has more memory, better performance, more functionality than a single compute node  Code that runs on the compute nodes is much cleaner, since it only contains the work to be performed, and leaves the coordination to a script or scheduler (NO MPI NEEDED)  The coordinator functionality can be a Perl script, Python, compiled program, or anything that runs on Linux  The coordinator can interact directly with DB2 or MySQL, to either get the inputs for the application, or to store the results. This can eliminate the need to create a flat-file input for the app, or to generate the results in an output file.  Example: American Monte Carlo (options pricing) Reference: en.wikipedia.org/wiki/Monte_Carlo_methods_in_finance MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) { // send work to other nodes and collect results } else { // do real work }

9 © 2008 IBM Corporation 8 How does Blue Heron work? “Software Architecture Viewpoint”  Lightweight  Extreme scalability  Flexible scalability  High throughput (fast) Design Goals:

10 © 2008 IBM Corporation 9 How does Blue Heron work? “End user perspective”  “submit” client:  Acts as a shadow or proxy for the real job running on the compute node – very lightweight  Submit jobs to location or pool  Pool id concept: scheduler alias for a collection of partitions available to run a job on  location: the resource where the job will execute in the form of a processor or wildcard location  Example #1 (submit to location): submit -location “R00-M0-N00-J05-C00” -exe hello_world  Example #2 (submit to pool): submit -pool BIOLOGY –exe hello_world  Job scheduler example:  Submit jobs using Condor (“condor_submit”) Submitting jobs (typically from FEN):

11 © 2008 IBM Corporation 10 Navigator  Viewing active HTC jobs running on Blue Gene partitions (blocks)

12 © 2008 IBM Corporation 11 Navigator  Viewing HTC job history on Blue Gene

13 © 2008 IBM Corporation 12 Information Sources Official Website  www.ibm.com/servers/deepcomputing/bluegene.html Blue Gene Redbooks and Redpapers  For the latest list go to www.redbooks.ibm.com and search for “Blue Gene” IBM Journal of Research and Development  researchweb.watson.ibm.com/journal/rd/521/team.html  www.research.ibm.com/journal/rd49-23.html Research Site  www.research.ibm.com/bluegene/index.html TOP500 List  www.top500.org Green500 List  www.green500.org

14 © 2008 IBM Corporation 13 Condor using HTC on BG/P Demo: Rosetta++ with MySQL  Rosetta++ is a protein prediction algorithm  It is very well-suited to HTC, since it runs many simulations of the same protein, using different random number seeds  The one that results in the lowest energy model among those attempted is the “solution”  Rosetta++ had already been shown to work on Blue Gene, by David Baker’s lab  Our goal was to show that it runs well in HTC mode  Very little actual code changes were required:  Compiled for Blue Gene, but using the single node version (NO MPI)  Changed a few places that did file output to use stdout, since that made it easier for the submitting script to associate each task to its results  Created a simple database front-end using both DB2 and MySQL, to contain the proteins and the seeds  Perl script reads inputs from database, submits each task to Condor, and processes results back into the database  Demonstrates HTC mode using Condor, with perfect linear scaling and no MPI

15 © 2008 IBM Corporation 14 Questions?

16 © 2008 IBM Corporation 15 Backup Slides

17 © 2008 IBM Corporation 16 What are the Blue Gene System Components? Blue Gene Rack(s) Hardware/Software Host System Service Node and Front End (login) Nodes SuSE SLES/10, HPC SW Stack, File Servers, Storage Subsystem, XLF/C Compilers, DB2 3 rd Party  Ethernet Switch

18 © 2008 IBM Corporation 17 Blue Gene Integrated Networks  Torus  Compute nodes only  Direct access by app  DMA  Collective  Compute and I/O node attached  16 routes allow multiple network configurations to be formed  Contains an ALU for collective operation offload  Direct access by app  Barrier  Compute and I/O nodes  Low latency barrier across system (< 1usec for 72 racks)  Used to synchronize time bases  Direct access by app  10Gb Functional Ethernet  I/O nodes only  1Gb Private Control Ethernet  Provides JTAG, i2c, etc, access to hardware. Accessible only from Service Node  Clock network  Single clock source for all racks

19 © 2008 IBM Corporation 18 Blue Gene is the most Power, Space, and Cooling Efficient Supercomputer (Published specs per peak performance) IBM BG/P

20 © 2008 IBM Corporation 19 Blue Gene is Orders of Magnitude more Reliable than other Platforms Results of survey conducted by Argonne National Lab on 10 clusters ranging from 1.2 to 365 TFlops (peak); excluding storage subsystem, management nodes, SAN network equipment, software outages. * Estimated based on reliability improvements implemented in BG/P compared to BG/L 394 127 1 800 <1 *

21 © 2008 IBM Corporation 20 Blue Gene Software Hierarchical Organization  Compute nodes dedicated to running user application, and almost nothing else - simple compute node kernel (CNK)  I/O nodes run Linux and provide a more complete range of OS services – files, sockets, process launch, signaling, debugging, and termination  Service node performs system management services (e.g., heart beating, monitoring errors) - transparent to application software

22 © 2008 IBM Corporation 21 Quad Mode  Also called Virtual Node Mode  All 4 cores run 1 process each  No threading  Each process gets ¼ node memory  MPI/HTC programming model Dual Mode  2 cores run 1 process each  Each process may spawn 1 thread on core not used by other process  Each process gets ½ node memory  MPI/OpenMP/HTC programming model SMP Mode  1 core runs 1 process  Process may spawn threads on each of the other cores  Process gets full node memory  MPI/OpenMP/HTC programming model M P M P M P Memory address space M Core 0 P Application Core 1Core 2Core 3 Application M P T M P T Core 0 Core 1 Core 2 Core 3 Memory address space CPU2CPU3 Application M P TTT Core 0 Core 1Core 2Core 3 Memory address space BG/P Job Modes allow Flexible use of Compute Node Resources

23 © 2008 IBM Corporation 22 Why and for What is Blue Gene Used?  Improve understanding – significantly larger scale, more complex and higher resolution models; new science applications  Multiscale and multiphysics – From atoms to mega-structures; coupled applications  Shorter time to solution – Answers from months to minutes Physics – Materials Science Molecular Dynamics Environment and Climate ModelingLife Sciences: Sequencing Biological Modeling – Brain Science Computational Fluid Dynamics Life Sciences: In-Silico Trials, Drug Discovery Financial Modeling Streaming Data Analysis Geophysical Data Processing Upstream Petroleum

24 © 2008 IBM Corporation 23 Many Computational Science Modeling and Simulation Algorithms and Numerical Methods are Massively Parallel

25 © 2008 IBM Corporation 24 What applications fit the Blue Heron model?  Wide range of applications can run in HTC mode  Many applications that run on Blue Gene today are “embarrassingly (pleasantly) parallel” or “independently parallel”  They don’t exploit the torus for MPI communication and just want a large number of small tasks, with a coordinator of results HTC Application Identification  Solution Statement:  A high-throughput computing (HTC) application is one in which the same basic calculation must be performed over many independent input data elements and the results collected. Because each calculation is independent, it is extremely easy to spread calculations out over multiple cluster nodes. For this reason, high-throughput applications are sometimes called “embarrassingly parallel.” HTC applications occur much more frequently than one might think, showing up in areas such as parameters studies, search applications, data analytics, and what-if calculations.  Identifying a HTC application:  There are a number of identifiers you can use to determine if your specific computing problem fits into the category of a high-throughput application:  Do you need to run many instances of the same application with different arguments or parameters?  Do you need to run the same application many times with different input files?  Do you have an application that can select subsets of the input data and whose results can be combined by a simple merge process such as concatenating, placing them into a single data base, or adding them together? If the answer to any of these questions is “yes,” then it is quite likely that you have a HTC application. Source: Grid.org

26 © 2008 IBM Corporation 25 How does Blue Heron work? Key Features:  Provides a job submit command that is simple, lightweight, and extremely fast  Job state is integrated into Control System database, so administrators know which nodes have jobs, and which are idle  Provides stdin/stdout/stderr on a per-job basis  Enables individual jobs to be signaled or killed  Maintains a user ID on per-job basis (allows multiple users per partition)  Blue Gene Navigator shows HTC jobs (active or in history) with job exit status & runtime stats  Designed for easy integration with job schedulers (e.g. Condor, LoadLeveler, SIMPLE, etc.)

27 © 2008 IBM Corporation 26 submit command./submit [options] or./submit [options] binary [arg1 arg2... argn] Job options: [-]-exe executable to run [-]-args "arg1 arg2... argn" arguments, must be enclosed in double quotes [-]-env add an environmental for the job [-]-exp_env export an environmental to the job's environment [-]-env_all add all current environmentals to the job's environment [-]-cwd the job's current working directory [-]-timeout number of seconds before the job is killed [-]-strace run job under system call tracing Resource options: [-]-mode the job mode [-]-location compute core location to run the job [-]-pool compute node pool ID to run the job Options: [-]-port listen port of the submit mux to connect to (default 10246) [-]-trace tracing level, default(6) [-]-enable_tty_reporting disable the default line buffering of stdin, stdout, and stderr when input (stdin) or output (stdout/stderr) is not a tty [-]-raise if a job dies with a signal, submit will raise this signal


Download ppt "© 2008 IBM Corporation Blue Heron Project IBM Rochester: Tom Budnik: Amanda Peters: Condor: Greg Thain With contributions."

Similar presentations


Ads by Google