National Energy Research Scientific Computing Center (NERSC) PDSF at NERSC Thomas M. Langley NERSC Center Division, LBNL November 19, 2003.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

IMPLEMENTATION OF PCS FOR MONITORING THE GROUND INTEGRITY DEVICE Jennie Burns, College of Arts and Sciences, Creighton University, Omaha, NE
XenData SX-520 LTO Archive Servers A series of archive servers based on IT standards, designed for the demanding requirements of the media and entertainment.
High Performance Computing Course Notes Grid Computing.
Recent Discoveries in Neutrino Physics: Understanding Neutrino Oscillations 2-3 neutrino detectors with variable baseline 1500 ft nuclear reactor Determining.
Research Projects General Sciences Physical Sciences Energy Sciences Biosciences Ali Belkacem - Chemical Sciences MRC workshop: March 26, 2002.
24/04/2007ALICE – Masterclass Presentation1 ALICE Hannah Scott University of Birmingham.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Module 10 Configuring and Managing Storage Technologies.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Christina Markert Physics Workshop UT Austin November Christina Markert The ‘Little Bang in the Laboratory’ – Accelorator Physics. Big Bang Quarks.
Relativistic Nuclear Collisions (RNC) Group Nuclear Science Division (NSD), Lawrence Berkeley National Lab The Relativistic Nuclear Collisions (RNC) group.
Cluster currently consists of: 1 Dell PowerEdge Ghz Dual, quad core Xeons (8 cores) and 16G of RAM Original GRIDVM - SL4 VM-Ware host 1 Dell PowerEdge.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
IMPLEMENTATION OF SOFTWARE INPUT OUTPUT CONTROLLERS FOR THE STAR EXPERIMENT J. M. Burns, M. Cherney*, J. Fujita* Creighton University, Department of Physics,
High Energy Physics at UTA UTA faculty Andrew Brandt, Kaushik De, Andrew White, Jae Yu along with many post-docs, graduate and undergraduate students investigate.
Relativistic Nuclear Collisions (RNC) Group Nuclear Science Division (NSD), Lawrence Berkeley National Lab The Relativistic Nuclear Collisions (RNC) group.
Module 9: Preparing to Administer a Server. Overview Introduction to Administering a Server Configuring Remote Desktop to Administer a Server Managing.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
- PDSF - Nersc's Production Linux Cluster (26mar02 - MRC LBNL) PDSF NERSC's Production Linux Cluster Craig E. Tull HCG/NERSC/LBNL.
P5 and the HEP Program A. Seiden Fermilab June 2, 2003.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
…building the next IT revolution From Web to Grid…
Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Particle Physics Research - Birmingham group HEFCE academic staff 7 Research staff 13, Engineers 2, Technicians 2 PhD students 12.
Experimental Particle Physics Do you want to discover… What is the origin of mass ? Discover the Higgs boson with ATLAS Why is there more matter than anti-matter.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
FSU Experimental HEP Faculty Todd Adams Susan Blessing Harvey Goldman S Sharon Hagopian Vasken Hagopian Kurtis Johnson Harrison Prosper Horst Wahl.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
TrainingRegister® Training Management Software Maintain Permanent Training Records for Each Individual Monitor and Track Required Training Know Who Needs.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
National Energy Research Scientific Computing Center (NERSC) CHOS - CHROOT OS Shane Canon NERSC Center Division, LBNL SC 2004 November 2004.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
High Energy Accelerators Dennis Silverman Physics and Astronomy U. C. Irvine.
Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF
John Womersley 1/13 Fermilab’s Future John Womersley Fermilab May 2004.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Relativistic Nuclear Collisions (RNC) Group Nuclear Science Division (NSD), Lawrence Berkeley National Lab Large Hadron Collider (LHC) Spin physics program.
LBNL/NERSC/PDSF Site Report for HEPiX Catania, Italy April 17, 2002 by Cary Whitney
CPM 2012, Fermilab D. MacFarlane & N. Holtkamp The Snowmass process and SLAC plans for HEP.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Module 9: Preparing to Administer a Server
Stanford Linear Accelerator
XenData SX-550 LTO Archive Servers
High Energy Physics at UTA
Nuclear Physics Data Management Needs Bruce G. Gibbard
High Energy Physics at UTA
Particle Physics Theory
Stanford Linear Accelerator
Stanford Linear Accelerator
PLANNING A SECURE BASELINE INSTALLATION
Module 9: Preparing to Administer a Server
Data Management Components for a Research Data Archive
Presentation transcript:

National Energy Research Scientific Computing Center (NERSC) PDSF at NERSC Thomas M. Langley NERSC Center Division, LBNL November 19, 2003

PDSF: A Tool for Science What is PDSF? History Our clients and their science Configuration and Administration What is in the future for PDSF? Conclusion

PDSF: What is PDSF? PDSF is a large Linux cluster constructed in house from off the shelf components. Designed to support large numbers of applications large data capacity requirements. Tuned for serial processing. Cooperative effort between NERSC and the HEPNP communities. Provides a shared alternative to individual computing facilities for each project.

PDSF: History PDSF was initially developed 1991 at SSC National Laboratory. Composed of mid range workstations and shared disk. Acquired by NERSC in 1996 in collaboration with the HEPNP

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure STAR is an experiment using the RHIC facility at Brookhaven National Laboratory. The primary goal of this field of research is to re-create in the laboratory a novel state of matter, the quark-gluon plasma (QGP), which is predicted by the standard model of particle physics (Quantum Chromodynamics) to have existed ten millionths of a second after the Big Bang (origin of the Universe) and may exist in the cores of very dense stars. STAR, the largest PDSF client, uses PDSF for detector simulations and software development.

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The Collider Detector at Fermilab (CDF) experiment is committed to studying high energy particle collisions at the world’s highest energy particle accelerator. The goal is to discover the identity and properties of the particles that make up the universe and to understand the forces and interactions between those particles. www-cdf.fnal.gov

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The goal of the ATLAS Experiment for the Large Hadron Collider at the CERN Laboratory in Switzerland is to explore the fundamental nature of matter and the basic forces that shape our universe. The Atlas collaboration is using PDSF for detector simulation and software development. atlasexperiment.org

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The HyperCP (E871) collaboration at Fermilab is searching for asymmetries between matter and antimatter, or CP violation, in Lambda and Xi hyperon decays, as well as in charged kaon decays. They also have an extensive program of hyperon and kaon physics outside of CP violation. ppd.fnal.gov/experiments/e871/

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The ALICE (A Large Ion Collider Experiment) collaboration located at CERN is building a dedicated heavy-ion detector to exploit the unique physics potential of nucleus-nucleus interactions at LHC energies. The aim is to study the physics of strongly interacting matter at extreme energy densities, where the formation of a new phase of matter, the quark-gluon plasma. The Alice collaboration is using PDSF for detector simulations and software development. alice.web.cern.ch/Alice/AliceNew/

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The Sudbury Neutrino Observatory (SNO) is taking data that has provided revolutionary insight into the properties of neutrinos and the core of the sun. The detector, shown in the artist's conception below, was built 6800 feet under ground, in INCO's Creighton mine near Sudbury, Ontario. SNO is a heavy-water Cherenkov detector that is designed to detect neutrinos produced by fusion reactions in the sun. The SNO collaboration uses PDSF for software development and data analysis production. neutrino.lbl.gov/index.html

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure KamLAND, the "Kamioka Liquid Scintillator Anti- Neutrino Detector” is the largest scintillation detector ever constructed. PDSF is used process massive amounts of data generated by this experiment. KamLAND uses PDSF for data analysis and production.

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The AMANDA telescope consists of neutrino detectors buried between 1 and 1.5 miles beneath the snow surface of the geographic south pole. The primary objective of AMANDA is to discover sources of very- high-energy neutrinos from galactic and extragalactic sources.

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The DeepSearch experiment is affiliated with the Supernova Cosmology Project at Berkeley Lab. It’s goal is to search for distant supernovae. PDSF is used for data reduction and analysis. panisse.lbl.gov

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure The Nearby Supernova Factory (SNfactory) is designed to address a wide range of supernova issues using detailed observations of low-redshift supernovae. Snfactory relies heavily on PDSF for computational support and data storage. snfactory.lbl.gov

PDSF: Our clients and their science PDSF has a varied client community. Collider Facilities –STAR/RHIC –CDF/FNAL –ATLAS/CERN –E871/FNAL –ALICE/CERN Neutrino Experiments –Sudbury Neutrino Observatory (SNO) –KamLAND –AMANDA Astrophysics –Deep Search –Super Nova Factory –Large Scale Structure Modelling the formation and evolution of structure in the universe. The LSS group is using PDSF for simulations.

PDSF: Configuration and Administration PDSF has grown significantly since first installed Configuration 64 Nodes 224GB Disk 10Mb/sec network fabric. 100Mb/sec connection to ESNet.

PDSF: Configuration and Administration 1998 Configuration Added 20 Intel nodes for 84 total. Increased disk to 490GB 100Gb/sec network on new nodes.

PDSF: Configuration and Administration 1999 Configuration Removed HP and Sun nodes. Increased Intel based nodes to 48. Added 7 disk vaults with 64GB each. Added Sun E450 with 210GB Disk 100Gb/sec network throughout Cluster. Added FDDI connectivity to HPSS.

PDSF: Configuration and Administration 2000 Configuration Increased number of nodes to 152. Disk increased to 7.5TB Introduced GigE between high performance nodes, HPSS and Esnet.

PDSF: Configuration and Administration 2002 Configuration Increased number of nodes to 207. Introduced Athlon technology. Disk increased to 35TB including new hardware IDE raid. Added new Extreme 7i Switches

PDSF: Configuration and Administration 2003 Configuration Increased number of nodes to 414. Introduced Opteron technology. Total disk capacity now at 131.5TB. Installed first NAS on PDSF. Added lower cost Dell switches.

PDSF: Configuration and Administration Interactive and Compute Nodes Interactive nodes - Provide a point of entry into the cluster, user logins via SSH and job submission capability to LSF. Compute nodes - Are scheduled by LSF. Provide the bulk computing resource of the cluster. No interactive logins. Consist of regular nodes and high bandwidth nodes, the latter having faster processors and more local storage.

PDSF: Configuration and Administration Administrative Nodes Used to administer the cluster. Run the LSF manager, monitoring utilities, web server, etc. Logging Nodes Perform consolidated logging functions, gateway, etc. Development nodes Small pool of processors made available for testing of new system functions such as file systems. Grid nodes Globus gateways to the DOE science grid. Console Servers Provide remote access to each cluster node by way of serial connections.

PDSF: Configuration and Administration Data Nodes Provide 90TB of shared storage to the cluster. Small disk vaults - older technology. Use software raid and integrated IDE controllers. Raidzones - first hardware raid devices in PDSF. Use IDE drives. 3ware - current technology. Use IDE raid with 80 to 300GB drives. NAS - 10TB configurable storage. In addition, an additional 60TB is provided in locally attached disk on the compute nodes.

PDSF: Configuration and Administration RedHat Linux 7.3 installed on all nodes. Preparing to upgrade to RedHat 8.0. LSF provides batch scheduling and queue management. Each client group is allocated services by way of a share value. Similar to a percentage of available resources. Each user’s runtime environment is customized with the modules package to reflect their development environment. Permits several versions of software to be installed that would otherwise conflict with each other. High capacity, high speed backup is provided to NERSC’s HPSS system. Open source and in house written monitoring software provide 24 x 7 monitoring of the environment with appropriate alerts and notifications. Internally developed hardware database is manually and automatically updated to reflect changing hardware conditions due to equipment additions, deletions and failures. Permits component locating and tracking.

PDSF in the spotlight PDSF was one of the first operational Linux clusters for public use in the world. PDSF has been in continuous operation longer than any other Linux cluster. Using the capabilities of PDSF the SNO collaborative was able to determine that neutrinos have mass. The KamLAND collaborative performed analysis of their data on PDSF that verified SNO’s findings. Utilizing the large data handling capabilities and flexible environment of PDSF, the Supernova Factory was able to discover 43 new supernovae in its first year, an astounding record. STAR has published more than 30 physical review letters made possible because of work performed on PDSF.

PDSF: The Future Remain poised to take advantage of the newest technologies as they are made available. Continue to develop new tools to more effectively manage the cluster and reduce system outages. Investigate ways to foster cooperation with installations operating Linux clusters in the global community. Continue to expand the relationship with the astrophysics community. Look for additional interest outside of HEPNP. Grid will play an ever increasing role.

PDSF: Conclusion PDSF allows the groups to fully leverage the computational resources PDSF has allowed NERSC to examine Linux and evaluate different models and approaches to providing computing PDSF continues to evolve while maintaining production quality service PDSF provides a unique managed system where projects with very differing environmental requirements may operate concurrently With its large storage capacity, high number of compute nodes, very low cost and extraordinary availability, PDSF provides capabilities otherwise unobtainable for our scientific users. Great science is being done on PDSF!

PDSF: Contacts Thank You! For more information visit our website at: You may also contact us by PDSF Support Staff Tom Langley: Shane Cannon: Cary Whitney: PDSF User Support Iwona Sakrejda:

Earnest Orlando Lawrence Berkeley National Laboratory