Atlas Tier 3 Overview Doug Benjamin Duke University.

Slides:

Advertisements

Similar presentations

Computing Infrastructure

Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.

Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.

Minerva Infrastructure Meeting – October 04, 2011.

CVMFS AT TIER2S Sarah Williams Indiana University.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

Tier 3g Infrastructure Doug Benjamin Duke University.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Tier 3(g) Cluster Design and Recommendations Doug Benjamin Duke University.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

Tier 3 Computing Doug Benjamin Duke University. Tier 3’s live here Atlas plans for us to do our analysis work here Much of the work gets done here.

UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Atlas Tier 3 Virtualization Project Doug Benjamin Duke University.

Tier 3 architecture Doug Benjamin Duke University.

OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

US Atlas Tier 3 Overview Doug Benjamin Duke University.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

Database CNAF Barbara Martelli Rome, April 4 st 2006.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

Cloud computing and federated storage Doug Benjamin Duke University.

ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

ATLAS Tier3s Santiago González de la Hoz IFIC-Valencia S. González de la Hoz, II PCI 2009 Workshop, Valencia, 18/11/2010 ATLAS Tier3s (Programa de Colaboración.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

Accessing the VI-SEEM infrastructure

Kevin Thaddeus Flood University of Wisconsin

Doug Benjamin Duke University On Behalf of the Atlas Collaboration

BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes

Bernd Panzer-Steindel, CERN/IT

ATLAS Sites Jamboree, CERN January, 2017

Presentation transcript:

Atlas Tier 3 Overview Doug Benjamin Duke University

Purpose of a Tier 3 Tier 3’s (and experiment computing in general) are tools to aid the Physicists in their work Work – analyzing the data to make measurements and scientific discoveries The computing is a tool and a means to an end Tier 3 Productivity The success of the Tier 3’s will measured by The amount of scientific output Papers written Talks in conferences Students trained (and theses written) Not in CPU hours or events processed

Tier 3 Types Tier 3’s are non pledged resources Does not imply that they should be chaotic or troublesome resources though Atlas examples include: Tier 3’s collocated with Tier 2’s Tier 3’s with same functionality as a Tier 2 site National Analysis facilities Non-grid Tier 3 (most common for new sites in the US and likely through Atlas) Very challenging due to limited support personnel Tier 3 effort now part of the ADC Doug Benjamin (technical coordinator)

Atlas Tier 3 Workshop Jan Organizers Massimo Lamanna, Rik Yoshida, DB Follow on to activities in the US the year before Showed the variety of Tier 3’s in Atlas Good attendance from all across Atlas 6 working groups formed to address various issues 1. Distributed storage(Lustre/GPFS and xrootd subgroups) 2. DDM – Tier3 link 3. Tier 3 Support 4. Proof 5. Software and Conditions DB 6. Virtualization

Tier 3 G ( most common Tier 3 in US)  Interactive nodes  Can submit grid jobs.  Batch system w/ worker nodes  Atlas Code available  Client tools used for fetch data (dq2-ls, dq2-get)  Including dq2-get + fts for better control  Storage can be one of two types (sites can have both)  Located on the worker nodes  Lustre/GPFS (mostly in Europe)  XROOTD  Located in dedicated file servers (NFS/ XROOTD)

Design a system to be flexible and simple to setup (1 person < 1 week) Simple to operate - < 0.25 FTE to maintain Scalable with Data volumes Fast - Process 1 TB of data over night Relatively inexpensive Run only the needed services/process Devote most resources to CPU’s and Disk Using common tools will make it easier for all of us Easier to develop a self supporting community. Tier 3g design/Philosophy

Local Tier 2 Cloud Tier 3g configuration Batch (Condor) User jobs Users Interactive node Pathena/Prun – to grid User’s home area Atlas/common software The Grid NFSV4 mounts User jobs Gridftp server + web server for Panda logs

Tier 3g – Interactive computing Users Interactive node Pathena/Prun – to grid User’s home area Atlas/common software File server Common User environment (next slide) Atlas software installed (two methods) manageTier3SW Web file system CVMFS The Grid NFSV4 mounts

Atlas Code installation NFS file server ManageTier3 SW package (Asoka DeSilva Triumf) Well tested straight forward to use

OSG All Hands Meeting Fermilab, March CVMFS 0.2 (v2)

No. Simultaneous Condor jobs: NFS47 min15 min60 min CVMFS2 7 min8 min11 min Rik Yoshida (ANL) Dell R710: 8 cores (16 hyperthreaded) NFS V4 vs CVMFS Comparison Athena Compilations

How data comes to Tier 3g’s US Tier 2 Cloud Sites in DDM ToA will tested frequently Troublesome sites will be blacklisted (no data) extra support load Xrootd/ Proof (pq2 tools) used to manage this Bestman Storage Resource Manager (SRM) (fileserver) Data will come from any Tier 2 site Two methods Enhanced dq2-get (uses fts channel) Data subscription SRM/gridftp server part of DDM Tiers of Atlas Gridftp server

Implications of ToA Tier 3 site will be both a data sink and data source Atlas DDM tests required to run at T3 site on a fixed cycle (appropriate for T3’s) File remove implies remove in database before remove at Site or Dark data Site can/will be black listed if you fail too many DDM tests. -> No data Must provide good quality of service

Tier 3g Storage issues  Monitoring of Data storage  Tier 3’s will have finite amount of storage - Storage will be like a cache – Need to monitor data usage patterns to determine Data longevity at site. (clean up old data)  Storage system performance monitoring  Xrootd and Proof master might help here  Need help from XrootD team and OSG for better packaging and installation scripts  Alice uses Mona Lisa to monitor its XROOTD systems  Efficient Data Access  Tier 3g sites typically have 1 Gbe between worker nodes and storage.. Want to minimize network traffic with cluster  Implies moving jobs to data if possible  Xrootd - perhaps HADOOP as a solution

Tier 3g Storage issues (cont)  Data security ( data safety)  In a Tier 3 want to maximize the amount of space available at minimum copy  Implies commodity hardware  Some data will be able to be fetched again from BNL cloud (US Tier 1 and Tier 2 sites)  This will put an added tax on wide area network between Tier 3 and Tier 1 & 2 sites… Why not think transfer among Tier 3 centers (Data cloud)  Some sites will store save multiple copies of data on different platforms (Raid NFS fileserver – Xrootd system)  Why not use a file system that can provide automatic replication (HADOOP)  Networking  We often take networking for granted – yet it needs to optimized for efficient transfers. - implies interaction with Campus Network admin.  Internet 2 has agreed to help us (Thanks!)

22-Sep IllinoisHEP T3gs Storage Atlas T3 with Grid Services Panda site –Production queue (IllinoisHEP-condor) –Analysis queue (ANALY_IllinoisHEP-condor) Software –Scientific Linux 5.5 (64 bit) –dCache (Chimera) installed via VDT Hardware –8 nodes (3 doors, 1 head, pNFS, and 3 pool nodes) –Dell R710 (E5540, 24GB) and Intel (E5345, 16GB) –H800/MD1200/2TB SAS disks (144TB raw) –12 Drives (1 Tray) per Raid 5 set with 512KB strip size, XFS file system –10Ge network (HP5400, Intel Dual CX4)

22-Sep IllinoisHEP T3gs Storage Good performance –Pool nodes are over 1GB/s read, 800MB/s write via dd, 600MB/s Bonnie++ –FTS transfers over 700MB/s Issues –dCache and -21 fixed many problems –Update using VDT package is very easy Network tuning very important –10Ge tuning different than 1Gb –Cards need to be in 8x PCIe slots (R710 has both 8x and 4x) –Much larger memory needs Problems seen with bad tuning –Broken network connections –Files transferred with errors (bad adler32 checksums)

22-Sep IllinoisHEP T3gs dCache Tweaks Some tweaking of dCache parameters recommended by T2 sites –Use 64bit java with memory increase to 2048/4096M –gsiftpMaxLogin=1024 –bufferSize= –tcpBufferSize= –srmCopyReqThreadPoolSize=2000 –remoteGsiftpMaxTransfers=2000 Use Berkley Database for meta data on pool nodes metaDataRepository=org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository

Conclusions Tier 3 sites come in different varieties The staffing levels at Tier 3 sites is smaller than that of Tier 2 sites Should keep things as simple as possible Need to be as efficient as possible The success of the Tier 3 program will be measured by physics productivity Papers written, Conference talks given and Student theses produced In the US Tier 3’s are being fund with ARRA funds - we can expect enhanced oversight… With collaboration w/ CMS and OSG we can make it a success.