Tier 3 architecture Doug Benjamin Duke University.

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Duke and ANL ASC Tier 3 (stand alone Tier 3’s) Doug Benjamin Duke University.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
CVMFS AT TIER2S Sarah Williams Indiana University.
Tier 3 Plan and Architecture OSG Site Administrators workshop ACCRE, Nashville August Marco Mambelli University of Chicago.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Tier 3(g) Cluster Design and Recommendations Doug Benjamin Duke University.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Tier 3 Storage Survey and Status of Tools for Tier 3 access, futures Doug Benjamin Duke University.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Tier 3 Computing Doug Benjamin Duke University. Tier 3’s live here Atlas plans for us to do our analysis work here Much of the work gets done here.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Atlas Tier 3 Virtualization Project Doug Benjamin Duke University.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
PROOF in Atlas Tier 3 model Sergey Panitkin 1 BNL.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
US Atlas Tier 3 Overview Doug Benjamin Duke University.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Cloud computing and federated storage Doug Benjamin Duke University.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
ATLAS Tier3s Santiago González de la Hoz IFIC-Valencia S. González de la Hoz, II PCI 2009 Workshop, Valencia, 18/11/2010 ATLAS Tier3s (Programa de Colaboración.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Atlas Tier 3 Overview Doug Benjamin Duke University.
Doug Benjamin Duke University On Behalf of the Atlas Collaboration
Virtualization and Clouds ATLAS position
Virtualisation for NA49/NA61
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
Virtualisation for NA49/NA61
Leigh Grundhoefer Indiana University
Presentation transcript:

Tier 3 architecture Doug Benjamin Duke University

2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production (T0) - remade periodically on T1 Produced outside official production on T2 and/or T3 (by group, sub-group, or Univ. group) Streamed ESD/AOD thin/ skim/ slim D 1 PD 1 st stage anal D n PD root histo ATLAS Analysis Model – analyzer view T2 T3 T0/T1 Jim Cochran’s slide about the Analysis Model Tier 3’s are where people will do a lot of their analysis. “Form must follow function”

Design a system to be flexible and simple to setup (1 person < 1 week) Simple to operate - < 0.25 FTE to maintain Scalable with Data volumes Fast - Process 1 TB of data over night Relatively inexpensive Run only the needed services/process Devote most resources to CPU’s and Disk Using common tools will make it easier to support Easier to develop a self supporting community. Tier 3g design/Philosopy

Tier 3 integration  Tier 3 installation instructions before Jan ’10  Starting to bring on line Storage Elements. ◦ Need know limits of Atlas DDM before the crunch ◦ The more sites the better  Tier 3’s are connected to the Tier 1-2 cloud through the Storage Elements – focus of integration with the US Atlas computing facilities.  Once ARRA computing funds arrive will focus effort on sites starting from scratch initially

Tier 3’s Integration phases Initial phase (now until Jan 2010) Develop design and initial instructions Instructions and description being placed on ANL wiki Rik is in the process organizing the wiki to make it easier to follow Goal to have vetted/tested (by many US Atlas members) instructions by January Next Phase (implementation phase) (~ March 2010) New sites – As ARRA funds arrive new sites will be assembled Existing sites - use the instructions to add services/grow

US Tier 2 Cloud Grid Storage Element - Bestman Storage Resource Manager (SRM) (fileserver) And/or Gridftp server Tier 3g configuration Batch (Condor) User jobs Users Interactive node Pathena/Prun – to grid User’s home area Atlas/common software Atlas data The Grid NFSV4 mounts User jobs

Tier 3g – Interactive computing Users Interactive node Pathena/Prun – to grid User’s home area Atlas/common software Atlas data File server Design Mostly done – Instructions on wiki – being refined Learning how people are using the instructions and adapting them accordingly The Grid NFSV4 mounts

How data comes to Tier 3g’s US Tier 2 Cloud 4 T3g’s in Tiers of Atlas (Duke, ANL, Penn and SMU) -Part of throughput testing -Asked other T3g’s to setup their SE’s (all are welcome/encouraged to setup SE) Recent through put test with ANL SE – ( > 500 Mb/s ) Shows $1200 PC (Intel i7 chip/ X58 chipset/ SL5.3) can be a SE for a small T3. Bestman Storage Resource Manager (SRM) (fileserver) Installation Instructions in wiki (your comments encouraged) Data will come from any Tier 2 site As T3 Storage Elements come online and are tested – will be added to Atlas Data Management System (DDM) as Tiers of Atlas – Perhaps – Gridftp sufficient

Storage Element installation/testing Instructions for Bestman-Gateway SR M Through put testing and instructions (testing graphs)

Tier 3g – Batch/ Distributed computing Users Interactive node User jobs Condor master Condor Design done – Instructions on wiki User jobs AOD analysis: 1 10Hz > 139 hrs Many cores needed  Common user interface to batch system simplifies users’ work ARCOND  ANL has developed such an interface ARCOND  Well tested on their system  Will need to be adapted for other Tier 3 sites  Latest verison of Condor has mechanism that can be used as a local site mover.

Batch system / storage configuration Condor installation/ configuration instructions DB/RK made two trips to Madison Wi. Worked with Condor and OSG team - Instructions developed during the trip. VDT will provide yum repository for Condor Condor Team and Duke/ANL will collaborate on testing of new file transfer code in Condor OSG wrote first pass of XRootD installation instructions NFSV4 instructions/scripts On web this weekend (have script to configure NFS mounts already)

Tier 3g – Data storage options File servers Worker nodes with little local storage Network Data flows Storage on worker nodes XRootD can be used to manage either type of storage Draft of XRootD installation instructions exist Storage on dedicated storage servers

Distributed data advantages (Data stored on worker nodes) Disk on worker node cheaper than in dedicated file servers Proof is perfect example of the advantages

Where to find details Tier 3 configuration wiki currently at ANL Rik has begun to reorganize the wiki Improve the flow of the instructions Make it easier for users to us In draft form (personal opinion – ready for comments and editing by users)

US Atlas provides ½ person for T3 Support Open Science Grid (OSG) will help Our software providers are helping (Condor Team, XrootD team) Rik and Doug just spent 2 days working with OSG and Condor team to setup and provide documentation for US Atlas Tier 3’s. Tier 3’s will be community supported US Atlas Hypernews - US Atlas Tier 3 trouble ticket at BNL Tier 3 Support/Help

Tier 3’s will be community supported US Atlas Hypernews - Tier 3 Hypernews

Tier 3 Self organization As a result of recent Tier 3 meeting – many people volunteered to help. Effort organized into 7 areas Documentation Jim Cochran and Iowa State will follow instructions at ISU Paul Keener configure things at Penn Tom Rockwell will start at MSU Craig Blocker (Brandeis) will use on test cluster at ANL Benchmarking (evaluate / develop) Waruna Fernando (OSU) Alden Stradling (UTA)

Tier 3 self organization (page 2) Hardware configuration Justin Ross (SMU) test blades Sergey Panitkin (BNL) Waruna Fernando (OSU) Tom Rockwell (MSU) Cluster Management Tom Rockwell (MSU) Rob Gardner (Chicago) Justin Ross (SMU) Marco Mambelli is surveying OSG sites for their solution

Tier 3 efforts (Grid Storage Management ) Grid Storage management (Do we need SRM + Gridftp or Gridftp alone) Duke, ANL, Penn and SMU are up. Recent FTS change caused the sites to fail until reconfigured. (need better notification mechanism) Iowa State and Ohio State are setting up SE Element Duke and ANL will test subscription. Iowa State has agreed to test Panda job output to Tier 3 SRM Working to add additional sites when possible.

Continue to investigate technologies to make Tier 3’s more efficient and easier to manage Virtual machine effort continues. Collaboration between BNL, Duke, OSU, LBL, UTA, CERN Bestman-gateway SRM running in VM at BNL as proof of concept. (Will tune it for performance) XRootD redirector next Condor part of CERNVM ANL will setup integration and test clusters Integration cluster – small Tier 3 – test code there before recommending upgrades in rest of Tier 3 cloud. (including two virtual clusters – Duke & Dell contributed hardware) Tier 3 Future…

Tier 3 efforts (Virtualization / Web file system ) Virtualization Based on Xen and CERNVM Will investigate other hypervisors if needed Virtualization will allow better standardization (reduce overall support load) Volunteers: Sergey Panitkin (BNL), Waruna Fernando (OSU), Yushuo Yao (LBNL), Alden Stradling (UTA), DB (Duke) Web file system Yushuo Yao (LBNL) has RPM – tested at Univ Washington. Looks promising for solution of Atlas software Works on VM or physical machine Need to evaluate space management issues Squid cache’s at Tier 3 will be required for best performance

 Tier 3’s are our tool for data analysis ◦ Encouraging user feedback on the design, configuration and implementation ◦ Several people have volunteered to help on tasks now.  Tier 3’s will be community supported ◦ We should standardize as much as possible  Use common tools (modify existing tools to make them more inclusive)  Next 5 months are planning on a flurry of Tier 3 activity. ◦ User involvement now will pay off very soon ◦ Thanks for excellent attendance Tier 3 meeting shows people are getting involved Conclusions