Download presentation
Presentation is loading. Please wait.
Published byDeborah Sparks Modified over 8 years ago
1
Building The TeraGrid An Update Pete Beckman Director of Engineering, TeraGrid Argonne Nat’l Laboratory
2
2 NPACI 2003 AHM 3/19/2003 The Last Time I Spoke Here: Developing a Cluster Strategy for NPACI Friday, February 11, 2000 Chaired by Andrew Chien, UCSD The advent of robust large-scale clusters is revolutionizing the high- performance computing landscape. This session will include briefs from leading efforts around the nation and from applications users who are leading the demand for cluster computing. Our objective is to define what NPACI need do to be a leader in bringing cluster HPC to the user community. Speakers: - Pete Beckman, Los Alamos National Laboratories - David Culler, University of California, Berkeley - Rob Pennington, Alliance (NCSA) - Rolf Riesen, Sandia National Laboratories - Detlef Stammer, Scripps Institution of Oceanography (UCSD) Pete Beckman
3
3 NPACI 2003 AHM 3/19/2003 The Expansion of Linux in the Top500 Pete Beckman
4
4 NPACI 2003 AHM 3/19/2003 Then, Now, and the Future Then: Linux clusters breaking on the scene Linux clusters broken on the scene NPACI identifies critical inflection and technology shift Now: TeraGrid: $89M to build world’s most ambitious Grid, Linux, Compute, and Data infrastructure in the world Tremendous investment and expertise in Linux and Grid technologies Future: Hold on… more later… Pete Beckman
5
5 NPACI 2003 AHM 3/19/2003 The TeraGrid Vision Distributing the resources is better than putting them at one site Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications New hardware, new networks, new software, new practices, new policies Transform existing centers into prototype for cyberinfrastructure Distributed, coordinated operations center Exploit unique partner expertise and resources to make whole greater than the sum of its parts Leverage homogeneity to make the distributed computing easier and simplify initial development and standardization Run single job across entire TeraGrid Move executables between sites Pete Beckman
6
6 NPACI 2003 AHM 3/19/2003 What Is Being Built? An environment that is the ideal target for Grid-based projects and applications Easy will make the difference Traditional View: Big Iron & Fast Networks New View: Easy to use Grid Hosting Environment Grids… uh, we install Globus, right? Pete Beckman
7
Infrastructure, Infrastructure, Infrastructure (Carl is almost always right) It is the Software and the Organization
8
8 NPACI 2003 AHM 3/19/2003 A Grid Hosting Environment A SLA for a Virt Org that hosts other Virt Orgs on a Grid environment Classic Unix-like Environment /bin/sh, bin/cp, /bin/ls, Unix file system & tools, dev tools (make, compilers) etc Web Hosting Env. Example: PHP, Perl, Python scripting. MySQL, FrontPage 100 POP accts, 100MB disk SMTP, IMAP & Webmail US$49 per year Special Capabilities Experimental math libraries Unique storage system Large shared memory arch…. TeraGrid Hosting Env. Single Contact: help@teragrid.org Unified Ops center Certified Software Stack MPICH, Globus GridFTP, BLAS, Linpack, Atlas, SoftEnv, gsi-ssh $TG_SCRATCH, … $89 Million Pete Beckman
9
9 NPACI 2003 AHM 3/19/2003 “TeraGrid Roaming” It must be easy, it must be easy, it must be easy Nearly eliminate the barrier to entry Develop application at ANL, run at NCSA Run at CalTech with data from SDSC Run large job across all sites Unified accounting and billing Predictable levels of service NCSA ANL PSC SDSC CalTech Launching a new service is hard Enormous investment Ubiquitous Easy Paying Customers (remember Iridium?) Pete Beckman
10
10 NPACI 2003 AHM 3/19/2003 Job #1: Create virtual organization for TG participants Single, Distributed Team Software Testing, QA, Verification Software Stack Advanced Networking Data Services Viz Services Accounting SW User-visible Org Operations Center User Services Accounting CVS Repo *.pl *.doc Developer’s Org EC Directors Site Leads Working Group Leads Engineers It is not your code, it is our code It is not your doc, it is our doc One repo to rule them all, one repo to find them one repo to bring them all and in the grid-world bind them Pete Beckman
11
11 NPACI 2003 AHM 3/19/2003 TeraGrid Software Stack V1.0 A social contract with the user: LORA: Learn Once, Run Anywhere Precise definitions: Services Software User Environment Reproducibility Standard configure, build, and install Single CVS repository for software Pete Beckman
12
12 NPACI 2003 AHM 3/19/2003 Testing, Verification, Monitoring New software, built to create the Grid Hosting Environment Goal: Use ‘unit test’, ‘version reporter’, and ‘integration tests’ to assure each the quality of each component in the system Pete Beckman
13
13 NPACI 2003 AHM 3/19/2003 Software Development Principles For Building The TeraGrid Drive development with applications TG Software Stack and environment is homogeneous across all sites except where a difference is clearly justified to the users and driven by their requirements Every package is versioned and the build/install/config parameters reproducible Every package has version, unit, and integrated tests A Test Harness is constantly working to insure stability and conformance to the TG Hosting Environment After successful deployment on the TestGrid, new components are tested on the ProdGrid Pete Beckman
14
14 NPACI 2003 AHM 3/19/2003 TeraGrid: Round 2, More Sites You must be this high to ride the TeraGrid Fast network Non-trivial resources Meet SLA (testing and QA requirements) Become a member of the virtual organization Capable of TG hosting (peering arrangements) TG Software Environment: User (download, configure, install, and run TG 1.0) Developer (join distributed engineering team) TG Virtual Organization Operations, User-services Add new capability, make the whole greater than the sum of its parts Pete Beckman
15
15 NPACI 2003 AHM 3/19/2003 Early TG Performance Results caveat: brand new IA64 hw, fast lambda networking 1000s of files in the CVS Repo 10s of people working together from traditionally competitive sites 1000s of bytes of jointly developed source code 10s of co-developed policies and documents 0 Fist fights (avoided a SDSC / NCSA arm wrestling match) 1 TeraGrid Infrastructure for users to build Grid- capable applications and build scientific collaboration Pete Beckman
16
16 NPACI 2003 AHM 3/19/2003 The Cool New Future… Yawn… a big Linux cluster, big disk array, fast network & big tape robot An Infrastructure! (Cybertools, Software) Grid-enabled applications are developed for Grid hosting environments such as the TeraGrid Virtual organizations collaborate, share data and resources as easily as using Kazaa Pete Beckman
17
17 NPACI 2003 AHM 3/19/2003 New Technologies The next inflection point.. Lambda Routing Remote storage pipes on demand Grid Skins We want more than your hosted environment, we want to extend it with our Grid view: Load(HEP-skin); New Linux cluster technologies HW accelerators, Reconfig comp., BG/BG Pete Beckman
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.