Efforts to Build a T3gs at Illinois Mark Neubauer, Dave Lesny University of Illinois at Urbana-Champaign US ATLAS Distributed Facilities Meeting LIGO Livingston.

Slides:

Advertisements

Similar presentations

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

Advertisements

Building on the BIRN Workshop BIRN Systems Architecture Overview Philip Papadopoulos – BIRN CC, Systems Architect.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa

OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Slide 1 Experiences with NMI R2 Grids Software at Michigan Shawn McKee April 8, 2003 Internet2 Spring Meeting.

Site Lightning Report: MWT2 Mark Neubauer University of Illinois at Urbana-Champaign US ATLAS Facilities UC Santa Cruz Nov 14, 2012.

Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.

Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.

OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Brookhaven Analysis Facility Michael Ernst Brookhaven National Laboratory U.S. ATLAS Facility Meeting University of Chicago, Chicago 19 – 20 August, 2009.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.

BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.

From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

Run II Review Closeout 15 Sept., 2005 FNAL. Thanks! …all the hard work from the reviewees –And all the speakers …hospitality of our hosts Good progress.

ATLAS Software Installation at UIUC “Mutability is immutable.” - Heraclitus ~400 B.C.E. D. Errede, M. Neubauer Goals: 1) ability to analyze data locally.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Alex Read, Dept. of Physics Grid Activities in Norway R-ECFA, Oslo, 15 May, 2009.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.

ATLAS Midwest Tier2 University of Chicago Indiana University Rob Gardner Computation and Enrico Fermi Institutes University of Chicago WLCG Collaboration.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.

April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.

U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -

DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

Atlas Tier 3 Overview Doug Benjamin Duke University.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

Computing Operations Roadmap

Bernd Panzer-Steindel, CERN/IT

Readiness of ATLAS Computing - A personal view

The LHCb Computing Data Challenge DC06

Presentation transcript:

Efforts to Build a T3gs at Illinois Mark Neubauer, Dave Lesny University of Illinois at Urbana-Champaign US ATLAS Distributed Facilities Meeting LIGO Livingston Observatory March 3, 2009

Tier-3 Task Force and Site Structure US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer2 Key recommendation of Tier-3 Task Force (T3TF): (my paraphrasing) US ATLAS computing infrastructure should be deeper than the Tier1+2’s and accommodate a set of 4 possible Tier-3 architectures: T3gs, T3g, T3w, and T3af. The goal is to utilize significant, additional computing resources to produce a system which is flexible and nimble enough to weather the realities of ATLAS analysis once the data starts flowing from CERN. The Tier-2’s should be considered a finite and precious resource that must be used efficiently with as much end-user analysis as feasible done on the local Tier-3’s

What is a T3gs? US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer3 T3gs: Tier-3 w/ Grid Services Cluster w/ software deployed for pAthena jobs, DQ2 services, and possibly LFC Expected to have “significant” CPU and storage resources: ≈ several hundred CPU cores ≈ several tens of TB Expected to a level of infrastructure (power, cooling), networking, and administrative support commensurate with available resources Functionally-equivalent to Tier-2 from a services perspective Distinct from Tier-2 in important ways: Size (not likely to be as large at Tier-2 in terms of available resources) Local policy control (resources funded at institutional-level)

Why bother trying to deploy a T3gs? US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer4 Value Added From a T3gs 1) Leverage University and Lab-based computing to increase overall capacity for production tasks like Data Format Production (e.g. D 1 PD  D 2 PD) and Monte Carlo production Given the specific T3 nature of T3gs, this is expected to be during intensive periods of critical need (e.g data reprocessing, MC before conference seasons) T3gs with their T2-like configuration and capabilities in best position to pitch in when needed 2) Offload from Tier-2’s computationally intensive jobs that do not need high- bandwidth access to data (e.g. ME calculations, MVAs, pseudo-expt generation for systematic evaluations). Generically true for any Tier-3 site (T3gs could also offload data-intensive jobs given its nature) See the T3TF Document for some quantitative arguments regarding 1) and 2) 3) Spread around the wealth (of ATLAS computing knowledge)

US ATLAS Distributed Facilities Meeting (3/3/2009) Mark Neubauer 5 ? My candid impression in trying to get our T3gs (T2-like) site together: Much of the documentation resides with a few experts (T2 site admins, etc). Some is contained within a disparate set of Twiki pages, NNs and disseminated when need arises through , HN, etc. -The experts have been very willing to help out when asked! - To an outsider, this can appear to be a very large, very black box (I know that most don’t need to know how this sausage is made..) Main point: there is some value for outsiders like us to try to replicate a T2-like site, to help shed some light on how all of this works (widen knowledge base for future T3 installations)

The IllinoisHEP Grid Site US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer6 us We’ve been participating OSG site for ≈ 1 yr We’ve been plugged into the Panda job mgmt on a similar time scale We’ve participated (not on purpose!) in M* reprocessing Done FDR data analysis on our site (on purpose) Up until a month or so ago, we’ve been peer-configured to utilize the IU SE We’ve basically been running as a T3g before we came up with the name!

IllinoisHEP in Action US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer7 Fully integrated into the pATHENA ATLAS job control infrastucture. Analysis job running over the full FDR dataset to the IllinoisHEP Tier-3 site

IllinoisHEP in Action (Ganglia Monitoring) US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer8

Infrastructure & Support at Illinois US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer9 There is a great deal of infrastructure on campus for scientific HPC National Center for Supercomputing Applications (NCSA) Enterprise-quality building in terms of space, power, cooling, networking Blue Waters in 2011: World’s 1 st sustained Petaflop computer (likely of limited use for our particular applications) 10 Gbps connectivity to ICCN, Internet-2, … w/ access from our Physics Building (Loomis Lab) and NCSA via a “Campus Research Network” designed to avoid bottlenecks in the campus firewalls for data-intensive research (we’re the poster child for this) Loomis Lab has sufficient space, power, cooling, networking for a few full racks of compute nodes and storage devices Senior Research Physicist (Dave Lesny): The one who put all of this together I have some past experience with scientific computing in putting together and operating the CDF CAF

Deployment Strategy US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer10 Note: We are not a large site at this point in time 16 CPU cores and 20 TB of disk Our focus has not been on deploying a large amount of hardware, but rather deploying the full set of services required to make our site a functional T3gs Some of our “utility” hardware needs to be upgraded/replaced We’ve deployed extensive monitoring (via Ganglia) to discover bottlenecks Leave scaling of CPU cores and storage until later We’ve chosen to deploy our IllinoisHEP site in Loomis Lab rather than NCSA in order to have “full” control over what we do in the development stage Once our T3gs is in production and scaled to a reasonable level, we would like to consider replicating this site to NCSA within a partnership that we are trying to foster (Neubauer NCSA faculty fellow) In other words, we would like to house a future “substantial” procurement in NCSA This depends upon a number of factors like a willing partnership with NCSA, demonstrated utility of the IllinoisT3gs, and external $$

osggums condor ganglia osgnx0osgnx1 osgx0 osgx1 pn01 wn00 wn01 osgx3 mysql00 fx0 1 fx0 2 Foundry FastIron SuperX pn00 se00 se01 osgx2 osgx4 ICCN, Internet2, … 10 Gbps Service Compute Element Storage Element DDM Worker GUMSNAT Routers /home/osg* System Monitoring LFC/DQ2 MySQL DBs LFC Server DQ2 Server dCache admin, dCache door SRM dCache door dCache PoolMgr pnfs server dCache Pools (20 TB) Worker nodes (16 CPU cores) 1 Gbps 2 × 1 Gbps Campus Research Network (public) Internal Network (private) HP 2900 Condor Master Globus Gatekeeper GridFTP Server IllinoisHEP T3gs

Some Baby Pictures US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer12

SE and LFC Tests US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer13 Dave Lesny has been performing stress tests of our Storage Element and LFC by using the SRM interfaces to write/read files and register the files into our LFC Hiro has also performed tests of our DDM configuration by doing simple FTS transfers between BNL and IllinoisHEP

Current IllinoisHEP T3gs Status/Plans US ATLAS Distributed Facilities Meeting (3/3/2009)Mark Neubauer14 As far as we know, we are fully configured as a T3gs site with the required services (Panda-aware, DQ2 site services, LFC, etc) There are a (small?) number of configuration questions still to be ironed out like our DQ2 site name (IllinoisHEP or UIUC) and whether or not the Grid can handle more combinations of UI, IU, UC, … At this point, I think we are ready for more high-level tests (e.g. DQ2 subscriptions and full chain w/ Panda jobs) There are a number of policy questions that need to be answered about how the T3gs and the other sites that the T3TF recommend should be utilized within US ATLAS. Hopefully some will get answered at this workshop It would be fantastic for US ATLAS if we had Tier-3 sites such that their resources could be dynamically allocated to accommodate local analysis needs and pitch in to help during times of intensive need for processing power within ATLAS. Happy sqrt() day to all!