R. Scott Studham, Associate Director Advanced Computing April 13, 2004 HPC At PNNL March 2004.

Slides:



Advertisements
Similar presentations
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
High Performance Computing Course Notes Grid Computing.
National Center for Atmospheric Research John Clyne 4/27/11 4/26/20111.
Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /
Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager ICCD ADH for Advanced Technology Lawrence Livermore.
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Bill Wrobleski Director, Technology Infrastructure ITS Infrastructure Services.
Plans for Exploitation of the ORNL Titan Machine Richard P. Mount ATLAS Distributed Computing Technical Interchange Meeting May 17, 2013.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Information Technology at Purdue Presented by: Dr. Gerry McCartney Vice President and CIO, ITaP HPC User Forum September 8-10, 2008 Using SiCortex SC5832.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
INFSO-RI Enabling Grids for E-sciencE EGEODE VO « Expanding GEosciences On DEmand » Geocluster©: Generic Seismic Processing Platform.
SIMPLE DOES NOT MEAN SLOW: PERFORMANCE BY WHAT MEASURE? 1 Customer experience & profit drive growth First flight: June, minute turn at the gate.
MURI Hardware Resources Ray Garcia Erik Olson Space Science and Engineering Center at the University of WI - Madison.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
9-Sept-2003CAS2003, Annecy, France, WFS1 Distributed Data Management at DKRZ Distributed Data Management at DKRZ Wolfgang Sell Hartmut Fichtel Deutsches.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
Genomes To Life Biology for 21 st Century A Joint Initiative of the Office of Advanced Scientific Computing Research and Office of Biological and Environmental.
HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program.
The CRI compute cluster CRUK Cambridge Research Institute.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Computer and Computational Sciences Division Los Alamos National Laboratory On the Feasibility of Incremental Checkpointing for Scientific Computing Jose.
Cloud Market Readiness Report Finance, Media, and Legal Sectors March 2014 Trend Consulting 2013.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Update IDC HPC Forum.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Advanced Research Computing Projects & Services at U-M
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Exscale – when will it happen? William Kramer National Center for Supercomputing Applications.
NCAR RP Update Rich Loft NCAR RPPI May 7, NCAR Teragrid RP Developments Current Cyberinfrastructure –5.7 TFlops/2048 core Blue Gene/L system –100.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY The Center for Computational Sciences 1 State of the CCS SOS 8 April 13, 2004 James B. White.
NOAA R&D High Performance Computing Colin Morgan, CISSP High Performance Technologies Inc (HPTI) National Oceanic and Atmospheric Administration Geophysical.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Tackling I/O Issues 1 David Race 16 March 2010.
Implementation of ISM within a R&D User Facility Mark Engelhard W.R. Wiley Environmental & Molecular Sciences Laboratory.
Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Predrag Buncic CERN Data management in Run3. Roles of Tiers in Run 3 Predrag Buncic 2 ALICEALICE ALICE Offline Week, 01/04/2016 Reconstruction Calibration.
Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing.
A Brief Introduction to NERSC Resources and Allocations
What is HPC? High Performance Computing (HPC)
with Computational Scientists
Biomolecular Networks Initiative
TeraScale Supernova Initiative
Presentation transcript:

R. Scott Studham, Associate Director Advanced Computing April 13, 2004 HPC At PNNL March 2004

2 HPC Systems at PNNL Molecular Science Computing Facility 11.8TF Linux based supercomputer using Intel Itanium2 processors and Elan4 interconnect A balance for our users: 500TB Disk, 6.8 TB memory PNNL Advanced Computing Center 128 Processor SGI Altix NNSA-ASC “Spray Cool” Cluster

3 William R. Wiley Environmental Molecular Sciences Laboratory Who are we? A 200,000 square-foot U.S. Department of Energy national scientific user facility Operated by Pacific Northwest National Laboratory in Richland, Washington What we provide for you Free access to over 100 state-of-the-art research instruments A peer-review proposal process Expert staff to assist or collaborate Why use EMSL? EMSL provides - under one roof - staff and instruments for fundamental research on physical, chemical, and biological processes.

4 HPCS2 Configuration Elan4 4 Login nodes with 4Gb-Enet 2 System Mgt nodes 1,976 next generation Itanium ® processors 11.8TF 6.8TB Memory … compute nodes 2Gb SAN / 53TB … Lustre Elan3 The 11.8TF system is in full operations now.

5 Who uses the MSCF, and what do they run? FY02 numbers Gaussian

6 More than 67% of the usage is for large jobs. Demand for access to this resource is high. Fewer users focused on Longer, Larger runs and Big Science. MSCF is focused on grand challenges

7 The world-class science is enabled by having systems that enable the fastest time-to-solution for our science Significant improvement (25- 45% for moderate number of processors) in time to solution by upgrading the interconnect to Elan4. Improved efficiency Improved scalability HPCS2 is a science driven computer architecture that has the fastest time-to-solution for our users science of any system we have benchmarked.

8 Accurate binding energies for large water clusters Accurate binding energies for large water clusters These results provide unique information on the transition from the cluster to the liquid and solid phases of water. Code: NWChem Kernel: MP2 (Disk Bound) Sustained Performance: ~0.6 Gflop/s per processor (10% of peak) Choke Point: Sustained 61GB/s of Disk IO and used 400TB of scratch space. Only took 5 hours on 1024 CPUs of the HP cluster. This is a capability class problem that could not be completed on any other system.

9 Energy calculation of a protein complex The Ras-RasGAP protein complex is a key switch in the signaling network initiated by the epidermal growth factor (EGF). This signal network controls cell death and differentiation, and mutations in the protein complex are responsible for 30% of all human tumors. Code: NWChem Kernel: Hartree-Fock Time for solution:~3 hours for one iteration on 1400 processors Computation of 107 residues of the full protein complex using approximately 15,000 basis functions. This is believed to be the largest calculation of its type.

10 Molecular dynamics of a lipopolysaccharide (LPS) Classical molecular dynamics of the LPS membrane of Pseudomonas aeruginosa and mineral Quantum mechanical/molecular mechanics molecular dynamics of membrane plus mineral HPCS1 HPCS2 HPCS3 Biogeochemistry: Membranes for Bioremediation

11 A new trend is emerging With the expansion into biology, the need for storage has drastically increased. EMSL users have stored >50TB in the past 8 months. More than 80% of the data is from experimentalists. Projected Growth Trend for Biology Log Scale! The MSCF provides a synergy between the computational and experimentalists.

12 Storage Drivers We support Three different domains with different requirements High Performance Computing – Chemistry Low storage volumes (10 TB) High performance storage (>500MB/s per client, GB/s aggregate) POSIX access High Throughput Proteomics – Biology Large storage volumes (PB’s) and exploding Write once, read rarely if used as an archive Modest latency okay (<10s to data) If analysis could be done in place it would require faster storage Atmospheric Radiation Measurement - Climate Modest side storage requirements (100’s TB) Shared with community and replicated to ORNL

13 PNNL's Lustre Implementation PNNL and the ASCI Tri-Labs are currently working with CFS and HP to develop Lustre. Lustre has been in full production since last Aug and used for aggressive IO from our supercomputer. Highly stable Still hard to manage We are expanding our use of Lustre to act as the filesystem for our archival storage. Deploying a ~400TB filesystem 660MB/s from a single client with a simple “dd” is faster than any local or global filesystem we have tested. We are finally in the era where global filesystems provide faster access

14 Open computing requires a trust relationship between sites. User logs into siteA and ssh’s to siteB. If siteA is compromised the hacker has probably sniffed the password for siteB. Reaction #1: Teach users to minimize jumping through hosts they do not personally know are secure (why did the user trust SiteA?) Reaction #2: Implement one-time passwords (SecureID) Reaction #3: Turn off open access (Earth simulator?) SecuritySecurity

15 Thoughts about one-time-passwords A couple of different hurdles to cross: We would like to avoid having to force our users to carry a different SecureID card for each site they have access to. However the distributed nature of security (it is run by local site policy) will probably end up with something like this for the short term. As of April 8 th the MSCF has converted over to the PNNL SecureID system for all remote ssh logins. Lots of FedEx’ed SecureID cards

16 SummarySummary HPCS2 is running well and the IO capabilities of the system are enabling chemistry and biology calculations that could not be run on any other system in the world. Storage for proteomics is on a super-exponential trend. Lustre is great. 660MB/s from a single client. Building 1/2PB single filesystem. We rapidly implemented SecureID authentication methods last week.