 Formed in the mid-1990’s to provide centralized computing resources for the four RHIC experiments (BRAHMS, PHOBOS, STAR, PHENIX)  Role was expanded.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Assessment of Core Services provided to USLHC by OSG.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
1/8 Enhancing Grid Infrastructures with Virtualization and Cloud Technologies Ignacio M. Llorente Business Workshop EGEE’09 September 21st, 2009 Distributed.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
BNL Computing Environment EIC Detector R&D Simulation Workshop October 8-9 th, 2012 Maxim Potekhin Yuri Fisyak
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
Tools for collaboration How to share your duck tales…
Brookhaven Analysis Facility Michael Ernst Brookhaven National Laboratory U.S. ATLAS Facility Meeting University of Chicago, Chicago 19 – 20 August, 2009.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
1 Development of a High-Throughput Computing Cluster at Florida Tech P. FORD, R. PENA, J. HELSBY, R. HOCH, M. HOHLMANN Physics and Space Sciences Dept,
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
LHC Computing, CERN, & Federated Identities
U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.
Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
U.S. ATLAS Computing Facilities DOE/NFS Review of US LHC Software & Computing Projects Bruce G. Gibbard, BNL January 2000.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
DIRAC Distributed Computing Services A. Tsaregorodtsev, CPPM-IN2P3-CNRS FCPPL Meeting, 29 March 2013, Nanjing.
A Scalable and Resilient PanDA Service for Open Science Grid Dantong Yu Grid Group RHIC and US ATLAS Computing Facility.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
What is OSG? (What does it have to do with Atlas T3s?) What is OSG? (What does it have to do with Atlas T3s?) Dan Fraser OSG Production Coordinator OSG.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Cloud Computing R&D Proposal
LHC Data Analysis using a worldwide computing grid
Presentation transcript:

 Formed in the mid-1990’s to provide centralized computing resources for the four RHIC experiments (BRAHMS, PHOBOS, STAR, PHENIX)  Role was expanded in the late 1990’s to act as the US Tier-1 computing center for the ATLAS experiment at the LHC  Small but growing astrophysics presence (Daya Bay, LSST) RACF: an overview Located in Brookhaven Computing Facility 35 FTEs providing a full range of scientific computing services for more than 4000 users

RACF: setting the scale  RHIC  1200 Compute Servers (130 kHS06, 16k job slots)  7 PB of Distributed Storage on Compute Nodes, up to 16 GB/s between compute servers and distributed storage servers  4 Robotic Tape Libraries w/ 40 tape drives and 38k cartridge slots, 20 PB of active data  ATLAS  1150 Compute Servers (115 kHS06, 12k job slots)  90 Storage Servers driving 8500 disk drives (10 PB), up to 18 GB/s observed in production between compute and storage farm  3 Robotic Tape Libraries w/ 30 tape drives and 26k cartridge slots, 7 PB of active data  Magnetic Tape Archive  Data inventory of currently 27 PB managed by High Performance Storage System (HPSS), archive layer below dCache or xrootd  Up to 4GB/s tape/HPSS  dCache/xrootd throughput  Network  LAN – 13 enterprise switches w/ 5800 active ports (750 10GE ports), 160 Gbps inter-switch Bandwidth  WAN – 70 Gbps in production (20 Gbps to CERN and other ATLAS T1s, 10 Gbps dedicated for LHCONE) + 20 Gbps for US ATLAS T1/T2 traffic and up to 20 Gbps serving domestic and international data transfer needs

ATLAS Cloud support and R&D beyond Tier-1 Core Services  Grid job submission Deployment & operations of grid job submission infrastructure and pandamover (to serve T2 input datasets) Deployment & operations of AutoPyFactory (APF) for pilot submission in US and other regions - Includes work on PanDA job wrapper (Jose C.) and local pilot submission for T3s Condor-G performance improvements - In close collaboration w/ Condor developers  gLexec to run analysis payload using user proxy Lead effort ATLAS-wide and part of the OSG Software Integration activities  CREAM Computing Element (CE) replacing GT2 based CE As part of the OSG Software Integration activities  Primary GUMS (Grid Identity Mapping Service) developer (John Hover) Used OSG-wide (incl. US ATLAS (~600 accounts) and US CMS (~250 accounts))  Leading OSG Architecture (Blue Print), Native Packaging (RPMs replacing Packman) / Configuration Management OSG Integration and Validation effort  Support for T2 & T3 (DQ2 site services, FTS, LFC, Network optimization & monitoring  Coordination of worldwide Frontier deployment (ended in Nov 2011)  Worldwide ATLAS S/W installation and validation service (Yuri Smirnov)  Participation in ATLAS computing R&D projects Cloud computing Federated Xrootd storage system NoSQL Database evaluation (e.g. Cassandra, now moving to Hadoop)  Storage System performance optimization Linux & Solaris kernel, I/O driver and file system tweaks  Support for Tier-3 at BNL User accounts, interactive services, Fabric (all hardware components, OS, batch system), Xrootd and PROOF (in collaboration w/ Sergey and Shuwei)

Reprocessing from 11/02 – 11/14 MWT2 Included 11/06

Contribution to Simulation (Aug-Oct) (3843) (1867)(1762) (1067) (896) Avg # of fully Utilized cores ~1000 opportunistic job slots from NP/RHIC, ~ 2M CPU hours since August

Facility Operations during Hurricane Sandy

Facility Operations During Hurricane Sandy

Configuration Management ( Jason Smith et al)

Configuration management - Components

Benefits

RACF and OSG  RACF Staff is heavily engaged in the Open Science Grid Major contributor to Technology Investigation area and the architectural development of OSG’s Fabric of Services Member of the Management Team and represents BNL on the OSG Council - Committed to develop OSG as a National Computational Infrastructure, jointly with other providers like XSEDE ATLAS Tier-1 center fully integrated with OSG - Provides opportunistic cycles to other OSG VOs

OSG continuing for another 5 years  Besides focus on physics and the momentum of the LHC, there is a broad spectrum of different science applications making use of OSG Very strong support from DOE and NSF to continue Extend support to more stakeholders – communities (e.g. nuclear physics and astrophysics) and scientists local to the campuses - Partnership/Participation in NSF as an XSEDE Service Provider - XSEDE is a comprehensive, expertly managed and evolving set of advanced heterogeneous high-end digital services, integrated into a general-purpose infrastructure - XSEDE is about increased user productivity - Emergence of NSF Institutes – centers of excellence on particular cyberinfrastructure topics (e.g. Distributed High Throughput Computing (DHTC)) across all science domains - Evolution of DOE ASCR SciDAC program - Strong participant in Extreme Scale Collaboratories initiative

R&D at the RACF: Cloud Computing  ATLAS-wide activity - Motivation New and emerging paradigm in the delivery of IT services Improved approach to managing and provisioning resources, allowing applications to easily adapt and scale to varied usage demands New, increasingly competitive market offers cost effective computing resources; companies small and large already make extensive use of them By providing an “Infrastructure as a Service” (IaaS), clouds aim to efficiently share the hardware resources for storage and processing without sacrificing flexibility in services offered to applications BNL_CLOUD: Standard production Panda site with ~500 Virtual Machines (VM) Configured to use wide-area stagein/out, so same cluster can be extended transparently to commercial cloud (e.g. Amazon) or other public academic clouds Steadily running production jobs on auto-built VMs The key feature of the work has been to make all processes and configurations general and public, so they can be (re-)used outside BNL (e.g. at T2s to dynamically establish analysis facilities using beyond-pledge resources) Standard, widely used technology (Linux, Condor, OpenStack, etc.) is used. DOE/NSF Bi-Weekly Operations Meeting - US ATLAS 14

Flexible algorithms decide when to start and terminate running VMs Cloud hierarchies: programmatic scheduling of jobs on site-local -> other private -> commercial clouds based on job priority and cloud cost Cloud Integration in U.S. ATLAS Facilities DOE/NSF Bi-Weekly Operations Meeting - US ATLAS 15

Looking into Hadoop-based Storage Management  Hiro, Shigeki Misawa, Tejas Rao, and Doug helping w/ tests  Reasons why we are interested Internet Industry is developing scalable storage management solutions much faster than we will ever be able to - We just have to make them work for us - HTTP-based data access works well for them, why shouldn’t it for us? With ever increasing storage capacity/drive we expect performance bottlenecks - With disk-heavy WNs we could provide many more spindles which helps scaling up the I/O performance and improve resilience against failures  Apache Distribution Open Source Several significant limitations (performance & resilience)  Several commercial products Free/unsupported downloads besides value-added/supported licensed versions

MapR (.com)

Performance vs. CPU Consumption Maxing out a 10 Gbps Network Interface … at the expense of 10% of the server CPU for Write and ~5% for Read Operations