BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Steve Traylen Particle Physics Department Experiences of DCache at RAL UK HEP Sysman, 11/11/04 Steve Traylen
GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.
Computing Infrastructure
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
TeraPaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research Bruce Gibbard & Dantong Yu High-Performance Network Research.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.
LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
LCG Storage workshop at CERN. July Geneva, Switzerland. BNL’s Experience dCache1.8 and SRM V2.2 Carlos Fernando Gamboa Dantong Yu RHIC/ATLAS.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
1 Worker Node Requirements TCO – biggest bang for the buck –Efficiency per $ important (ie cost per unit of work) –Processor speed (faster is not necessarily.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.
LCG Storage Workshop “Service Challenge 2 Review” James Casey, IT-GD, CERN CERN, 5th April 2005.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Service Challenge Meeting “Review of Service Challenge 1” James Casey, IT-GD, CERN RAL, 26 January 2005.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Dynamic Extension of the INFN Tier-1 on external resources
“A Data Movement Service for the LHC”
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Status of the SRM 2.2 MoU extension
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
James Casey, IT-GD, CERN CERN, 5th September 2005
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Service Challenge 3 CERN
Southwest Tier 2.
Enabling High Speed Data Transfer in High Energy Physics
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing Facility Brookhaven National Lab

HEPiX Karlsruhe, Germany, May 09-13, Outline  Lessons learned from SC2.  Goals of BNL Service Challenges  Detailed SC3 planning  Throughput Challenge (Simple)  Network Upgrade Plan z USATLAS dCache system at BNL z MSS  Tier 2 Integration Planning  File Transfer System  Service Phase Challenge to include ATLAS applications (difficult)

HEPiX Karlsruhe, Germany, May 09-13, One day data transfer of SC2

HEPiX Karlsruhe, Germany, May 09-13, Lessons Learned From SC2  Four file transfer servers with 1 Gigabit WAN network connection to CERN.  Meet the performance/throughput challenges (70~80MB/second disk to disk).  Enabled data transfer between dCache/SRM and CERN SRM at openlab  Design our own script to control SRM data transfer.  Enabled data transfer between BNL GridFtp servers and CERN openlab GridFtp servers controlled by Radiant software.  Many components need to be tuned  250 ms RRT, high packet dropping rate, has to use multiple TCP streams and multiple file transfers to fill up network pipe.  Sluggish parallel file I/O with EXT2/EXT3, lot of processes with “D” state, more file streams, worse the performance on file system.  Slight improvement with XFS system. Still need to tune file system parameter

HEPiX Karlsruhe, Germany, May 09-13, Goals  Network, disk, and tape service  Sufficient network bandwidth: 2Gbit/sec  Quality of service: performance: 150Mbtype/sec to Storage, upto 60 Mbytes/second to tape, Has to be done with efficiency and effectives.  Functionality/Services, high reliability, data integration, high performance  Robust file transfer service  Storage Servers  File Transfer Software (FTS)  Data Management software (SRM, dCache)  Archiving service: tapeservers, taperobots, tapes, tapedrives,  Sustainability  Weeks in a row un-interrupted 24/7 operation  Involve ATLAS Experiment Applications

HEPiX Karlsruhe, Germany, May 09-13, BNL network Topology

HEPiX Karlsruhe, Germany, May 09-13, Network Upgrade Status and Plan  WAN connection OC-48.  Dual GigE links connect the BNL boarder router to the Esnet router.  Work on LAN upgrade from 1 GigE to 10 GigE, date to complete: Middle of June, 2005

HEPiX Karlsruhe, Germany, May 09-13, BNL Storage Element: dCache System TAllows transparent access to large amount of data files distributed on disk in dCache pools or stored on HPSS. q Provides the users with one unique name-space for all the data files.  Significantly improve the efficiency of connected tape storage systems, through caching, i.e. gather & flush, and scheduled staging techniques. TClever selection mechanism. q The system determines whether the file is already stored on one or more disks or on HPSS. q The system determines the source or destination dCache pool based on storage group and network mask of clients, also CPU load and disk space, configuration of the dcache pools. TOptimizes the throughput to and from data clients as well as balances the load of the connected disk storage nodes by dynamically replicating files upon the detection of hot spots. TTolerant against failures of its data servers. TVarious access protocols, including gridftp, SRM and dccp.

HEPiX Karlsruhe, Germany, May 09-13, Read pools DCap doors SRM door doors GridFTP doors doors Control Channel External Write Pools Internal write pools Data Channel DCap Clients Pnfs ManagerPool Manager HPSS GridFTP Clientsd SRM Clients Oak Ridge Batch system DCache System BNL dCache Architecture

HEPiX Karlsruhe, Germany, May 09-13, dCache System, Continued TBNL USATLAS dCache system works as a disk caching system as a front end for Mass Storage System TCurrent configuration: Total 72 nodes with 50.4 TB disks: q Core server nodes, database server q Internal/External Read pool: 65 x TB q Internal write pool nodes 4 x 532 GB q External write pool nodes 2 x 420 GB q dCache version: V q Access protocols: GridFTP, SRM, dCap, gsi-dCap

HEPiX Karlsruhe, Germany, May 09-13, Immediate dCache Upgrade  Existing dCache has 50 TB data storage.  288 new dual-CPU 3.4 Ghz dell hosts will be on-site on May/11/2005  2 x 250G SATA drives  2GB memory and dual Gigbit on-board ports  These hosts will be split into more than two dCache system.  One of system will be used to SC3. The disk pool nodes will be connected directly to ATLAS router which has 10 G uplink.  SL3 will be installed on all these dell hosts.  File System to be installed: XFS, need to tune to improve disk utilization per host.

HEPiX Karlsruhe, Germany, May 09-13, BNL ATLAS MSS  Two 9940B tape drivers. Data transfer rate is between 10MB~30MB/second. These two tape drives are saturated with daily USATLAS production.  200 GB tapes.  We need to borrow tape drives from other BNL in- house experiments on July to meet 60MByte/second performance target.

HEPiX Karlsruhe, Germany, May 09-13, File Transfer Service ATLAS sees benefits on trying gLite FTS as soon as possible  To see ASAP whether it meet data transfer requirements  Data transfer requires significant effort to ramp up, learn from SC2  Help debugging gLite FTS  Transfers between Tier 0, Tier 1 and a few Tier 2.  A real usage with ROME production data.  Uniform low-level file transfer layer to interface with several implementations of SRM: dCache/SRM, DPM, even vanilla GridFtp.  Xin deployed the FTS service. Xin has done successfully the data transfer test with FTS.  We are ready for the Prime Time of July 1,2005.

HEPiX Karlsruhe, Germany, May 09-13, Tier 2 Plans  Choose two USATLAS tier 2 sites.  Each site will deploy DPM server as storage element with SRM interface.  gLite FTS (file transfer service) will transfer data from BNL to each of two chosen sites in the speed of 75M byte/second.  Files will be kept in BNL Tier 1 dCache until they are read once to Tier 2 center.

HEPiX Karlsruhe, Germany, May 09-13, ATLAS and SC3 Service Phase  September  ATLAS release 11 (mid September)  Will include use of conditions database and COOL  We intend to use COOL for several sub-detectors  Not clear how many sub-detectors will be ready  Not clear as well how we will use COOL  Central COOL database or COOL distributed database  Debug scaling for distributed conditions data access calibration/alignment, DDM, event data distribution and discovery  Tier 0 exercise testing  A dedicated server is requested for the initial ATLAS COOL service  Issues on FroNtier are still under discussion and ATLAS is interested  Data can be thrown away.

HEPiX Karlsruhe, Germany, May 09-13, ATLAS & SC3 Service Phase  April-July: Preparation phase  Test of FTS (“gLite-SRM”)  Integration of FTS with DDM  July: Scalability tests (commissioning data; Rome Physics workshop data)  September: test of new components and preparation for real use of the service  Intensive debugging of COOL and DDM  Prepare for “scalability” running  Mid-October  Use of the Service  Scalability tests of all components (DDM)  Production of real data (MonteCarlo; Tier-0; …)  Later  “continuous” production mode  Re-processing  Analysis

HEPiX Karlsruhe, Germany, May 09-13, Conclusion  Storage Element and network go well with upgrade.  The whole chain of system will be tuned before the end of May.  Wait for FTS software to control data transfer.  Talk with USATLAS Tier 2 sites to participate SC3.  Discuss on how the experiment software can be involved.

HEPiX Karlsruhe, Germany, May 09-13, Thank You!