O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Deployment, Deployment, Deployment March, 2002 Randy Burris Center for Computational Sciences.

Slides:



Advertisements
Similar presentations
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
Advertisements

Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /
SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1 IS112 – Chapter 1 Notes Computer Organization and Programming Professor Catherine Dwyer Fall 2005.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
Installing and Maintaining ISA Server. Planning an ISA Server Deployment Understand the current network infrastructure Review company security policies.
Module 2: Planning to Install SQL Server. Overview Hardware Installation Considerations SQL Server 2000 Editions Software Installation Considerations.
Introduction to Computer Administration System Administration
Installing Windows Vista Lesson 2. Skills Matrix Technology SkillObjective DomainObjective # Performing a Clean Installation Set up Windows Vista as the.
Test Review. What is the main advantage to using shadow copies?
1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.
Michigan Grid Testbed Report Shawn McKee University of Michigan UTA US ATLAS Testbed Meeting April 4, 2002.
11 SECURITY TEMPLATES AND PLANNING Chapter 7. Chapter 7: SECURITY TEMPLATES AND PLANNING2 OVERVIEW  Understand the uses of security templates  Explain.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Project 3.3 Optimizing Shared Access to Tertiary Storage March, 2002 Presenter - Randy Burris.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
ESubnet Enterprises Inc. Richard Danielli, eSubnet Higher sales volumes through high network availability INTIX 2010.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY DOE-UltraScience Net (& network infrastructure) Update JointTechs Meeting February 15, 2005.
HDF5 A new file format & software for high performance scientific data management.
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the activities of information systems professionals Describe the technical knowledge of computer.
High Performance Storage System Harry Hulen
Storage Area Networks Back up and Management. Introduction  The Problems –Large Amounts of Data –Shared Resources –Need for interference-free back up.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Computer Systems Lab The University of Wisconsin - Madison Department of Computer Sciences Linux Clusters David Thompson
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
The II SAS Testbed Site Jan Astalos - Institute of Informatics Slovak Academy of Sciences.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Probe Plans and Status SciDAC Kickoff July, 2001 Dan Million Randy Burris ORNL, Center for.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY HPSS Features and Futures Presentation to SCICOMP4 Randy Burris ORNL’s Storage Systems Manager.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
NET100 … as seen from ORNL Tom Dunigan November 8, 2001.
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University.
Disk Farms at Jefferson Lab Bryan Hess
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 Enabling Supernova Computations by Integrated Transport and Provisioning Methods Optimized.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.
NET100 Development of network-aware operating systems Tom Dunigan
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY The Center for Computational Sciences 1 State of the CCS SOS 8 April 13, 2004 James B. White.
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
NOAA R&D High Performance Computing Colin Morgan, CISSP High Performance Technologies Inc (HPTI) National Oceanic and Atmospheric Administration Geophysical.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
OSSIM Technology Overview Mark Lucas. “Awesome” Open Source Software Image Map (OSSIM)
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Troubleshooting Laserfiche Systems
Introduction to System Administration. System Administration  System Administration  Duties of System Administrator  Types of Administrators/Users.
University of Chicago Department of Energy Applications In Hand:  FLASH (HDF-5)  ENZO (MPI-IO)  STAR Likely  Climate – Bill G to contact (Michalakas.
CCIN2P3 Site Report - BNL, Oct 18, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center.
Chapter 1 Computer Technology: Your Need to Know
Storage Area Networks Back up and Management.
NL Service Challenge Plans
SAM at CCIN2P3 configuration issues
Constructing a system with multiple computers or processors
University of Technology
Constructing a system with multiple computers or processors
TeraScale Supernova Initiative
Presentation transcript:

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Deployment, Deployment, Deployment March, 2002 Randy Burris Center for Computational Sciences Oak Ridge National Laboratory

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Overview of this presentation  Our goal: let scientists (our customers) do science without worrying about their computer environment  Our clientele:  Four disciplines (climate, astrophysics, genomics and proteomics, high-energy physics)  National labs and universities  Using resources all over the country  Residing all over the place  We must deploy the result (“Deploy or die”)

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Well, OK. But…deploy what?  Where are the commonalities in our space?  Security and trust – nonexistent to extreme  Network connectivity – dialup to OC12  File sizes – bytes to terabytes  File location – local unit to partitions around the world  Visualization – static to dynamic real-time  And so on.  We can’t do it all.  So exactly what are we going to deploy?  And how should we proceed?

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Achieving successful deployment  For each of the 4 projects, define basic steps:  Define target environment(s)  Characterize successful deployment (in each)  Prototype in a close-to-production environment  Deploy in production  In parallel with the above:  Produce documentation at every step  Develop tools for support staff  Start now.

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 1: Define target environment(s)  We cannot support all combinations.  Security – {DCE, Kerberos, PKI, gss}, firewalls, …  Compute resource – MPP, cluster, workstation,…  User platform – MPP, cluster, Unix/linux, Windows, …  Storage Storage resource – HPSS, PVFS, … ? User API for access to data NetCDF, HDF5, both, something else? HRM, pftp, GridFTP, hsi, …  Network WAN – GigE/jumbo, FastE, OC12, OC3, ESnet, hops, … LAN – GigE, FastE, iSCSI, FibreChannel, …  Visualization – CAVE, workstations, Palm Pilots, …  We will have to choose.

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize successful deployment  A. Correct operation in the security environment  B. Optimized performance in the target network environment  C. Rugged infrastructure  D. Unobtrusive infrastructure  E. Thorough documentation for users and support staff

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize A: Security  I believe we must define the environment into which we intend to deploy.  Starting now  Because it will take a long time and will almost certainly require development.  Questions to which we need answers:  Are we concerned with DOE sites or DOE+NSF+…?  Are there circumstances where clear-text passwords are OK? Where no security is OK?  Must we support authentication in pki, gsi, dce and/or Kerberos?  Will all of our infrastructure work with firewalls at one or both ends of a transfer? Whose firewalls, what filtering parameters, …

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize B: Network  On what network are the end nodes?  What is our target environment – ESnet, ESnet+Internet2, Grid, www, …  What throughput is needed for effective science?

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize C: Rugged  Must not crash (of course)  Must be in service when needed  Must be secure  Must have a support plan (which does not require an army of support people)  Must have trouble-resolution mechanism and resources  Must be survivable over normal maintenance  System software patches and upgrades  Equipment upgrades

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize D: Unobtrusive  User should need minimal knowledge  The deeper the infrastructure, the less the user should need to know  User should be protected from mistakes  Try not to let the user screw things up  Documentation and real-time warnings  Effective defaults

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 2: Characterize E: Documentation  White papers to inform larger community  For users: how-to-use documents  For system-admin staff:  How to install, debug, maintain, troubleshoot  For user-support staff  How to troubleshoot  Tuning knobs  For programmers  Overview documents to give context  Correct interface documents  Correct documentation for all appropriate platforms

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Step 3: Prototype in close-to- production environment  Example of deployment approach on Probe:  Deploy early prototypes in Oak Ridge and NERSC Use Probe, Probe HPSS, Production HPSS and supercomputers Use (and require) documented code and procedures  As development progresses, evaluate and address deployment issues such as security, network performance, system-admin documentation  As prototype becomes more robust, migrate more functions to Oak Ridge and NERSC production environments  Continue to evaluate and address deployment issues that now include user and user-support documentation  Iterate as necessary  When this sequence is done, you’re in production.

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Overview of ORNL Architecture, March 2002 Stingray RS/6000 S80 Marlin RS/6000 H70 STK Library 220 GB SCSI RAID 360 GB Sun FibreChannel RAID Origin 2000 Reality Monster STK Library IBM and Compaq Supercomputers and 64-node linux cluster Probe Production Gigabit Ethernet (jumbo frames) Production HPSS Probe HPSS Disk Cache CAVE Other Probe Nodes Disk Cache 360 GB FibreChannel RAID 600 GB SCSI JBOD External Esnet Router

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Example: How Terascale Supernova Initiative could be prototyped Stingray RS/6000 S80 Marlin RS/6000 H70 Origin 2000 Reality Monster External Esnet Router IBM and Compaq Supercomputers Probe Production Production HPSS Probe HPSS CAVE Other Probe Nodes Bulk storage Data reduction, pre- viz manipulation Rendering

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY We should start right away:  Select initial, intermediate and ultimate target environments  Including supported applications, platforms, security and target network  Describe in a white paper  Seek common elements in supported applications  Develop a deployment plan for common elements  Write white paper describing deployment plan Specify our approach to deploying support for those elements Identifying un-met requirements, and how to remedy Describing approach to ruggedness and unobtrusiveness  Address non-common elements in supported applications  Seek to minimize their impact  Specify our approach to deploying support for those elements  Develop deployment plans and describe them  Write white paper describing deployment plan

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY DISCUSSION?

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Serious questions for early resolution  What is the role of HPSS?  HPSS will never be pervasive – expensive.  Treat HPSS sites as primary repositories?  Which file transfer protocol(s) do we support?  GridFTP, pftp, his

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Probe – “Place to be” Overview of ORNL Probe Cell, February 2002 Stingray RS/6000 S80 Marlin RS/6000 H70 STK Silo 200 GB SCSI RAID Disks Sun E250 Compaq DS GB Sun FibreChannel Disks 360 GB STK FibreChannel Disks FibreChannel Switch GSN Switch Origin 2000 Reality Monster RS/6000 B80 External Esnet Router To NERSC Probe Sun Ultra 10 STK Silo IBM and Compaq Supercomputers 3494 Library GSN Bridge RS/ P-170 Probe Production Sun E450 IBM F50 SGI Origin 200 Gigabit Ethernet Intel Dual P-III Linux

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Backup slide 

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Technology on hand and available  Software  HPSS (unlimited instantiations) and HPSS development license  HDF5, NetCDF  R, ggobi  gcc suite  C on Solaris, AIX, IRIX and Tru64  Fortran on AIX  Oracle 8i and DB2 (current developer’s editions) on AIX  Globus 2.0/AIX and Solaris  HRM  Inter-HPSS hsi application  OPNET modeling product  MPI/IO testbed  18 nodes – IBM/AIX, Sun/Solaris, SGI/IRIX, Compaq/Tru64  GRID nodes (Sun/Solaris, IBM/AIX, possibly linux)  ESnet III OC12 externally, GigE jumbo and Fast Ethernet internally  Web100 and NET100 participation