Computing Operations Roadmap

Slides:



Advertisements
Similar presentations
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Advertisements

AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
FZU participation in the Tier0 test CERN August 3, 2006.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Alexei Klimentov : ATLAS Computing CHEP March Prague Reprocessing LHC beam and cosmic ray data with the ATLAS distributed Production System.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
John Gordon STFC-RAL Tier1 Status 9 th July, 2008 Grid Deployment Board.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
US ATLAS Computing Operations Kaushik De University of Texas At Arlington U.S. ATLAS Tier 2/Tier 3 Workshop, FNAL March 8, 2010.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
SC4 Planning Planning for the Initial LCG Service September 2005.
PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.
ADC Operations Shifts J. Yu Guido Negri, Alexey Sedov, Armen Vartapetian and Alden Stradling coordination, ADCoS coordination and DAST coordination.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
J. Shank DOSAR Workshop LSU 2 April 2009 DOSAR Workshop VII 2 April ATLAS Grid Activities Preparing for Data Analysis Jim Shank.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
T0-T1 Networking Meeting 16th June Meeting
Status: ATLAS Grid Computing
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Supporting Analysis Users in U.S. ATLAS
University of Texas At Arlington Louisiana Tech University
Virtualization and Clouds ATLAS position
Status Report on LHC_2 : ATLAS computing
Cluster Optimisation using Cgroups
U.S. ATLAS Grid Production Experience
U.S. ATLAS Tier 2 Computing Center
Data Challenge with the Grid in ATLAS
Database Readiness Workshop Intro & Goals
ATLAS activities in the IT cloud in April 2008
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
PanDA in a Federated Environment
Readiness of ATLAS Computing - A personal view
The ADC Operations Story
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Univ. of Texas at Arlington BigPanDA Workshop, ORNL
An introduction to the ATLAS Computing Model Alessandro De Salvo
Cloud Computing R&D Proposal
LHC Data Analysis using a worldwide computing grid
The ATLAS Computing Model
IPv6 update Duncan Rand Imperial College London
The LHCb Computing Data Challenge DC06
Presentation transcript:

Computing Operations Roadmap Kaushik De University of Texas At Arlington US ATLAS Tier 2/3 Meeting, UM May 27, 2008

Introduction Expect first data to arrive end of summer After 6+ years of preparing computing facilities and software systems – are we really ready? In this talk, I will review what is ready, but more importantly what remains to be done I will review summer plans (alas, not vacation plans!) Kaushik De May 27, 2008

Some Recent Achievements Excellent MC production capacity and availability Good completion rates recently (when jobs available!) – typically 0.5M fully simulated events/day (~5k CPU’s) Many Tier 3’s ready – more joining soon Impressive throughput during CCRC08 tests Tier 1 and Tier 2’s demonstrated prompt transfers Analysis sites ready But not much data availability at Tier 2’s Need more users! Overall, U.S. facilities in good shape Production hardened, analysis ready, SRM v2.2 (in record time) But need to complete 2008 procurement Kaushik De May 27, 2008

PanDA production (few weeks ago) Kaushik De May 27, 2008

Production – This Year Kaushik De May 27, 2008

CCRC Data Transfers – on May 21 Last 24h throughput (should be 1GB/s) Last 24 errors Last 24h CCRC throughput Last 24h throughput (should be 1GB/s) From Simone Campana Kaushik De May 27, 2008

Throughput Details The rate is almost what one would expect Rates(MB/s) TAPE DISK Total BNL 79.63 218.98 298.61 IN2P3 47.78 79.63 127.41 SARA 47.78 79.63 127.41 RAL 31.85 59.72 91.57 FZK 31.85 59.72 91.57 CNAF 15.93 39.81 55.74 ASGC 15.93 39.81 55.74 PIC 15.93 39.81 55.74 NDGF 15.93 39.81 55.74 Triumf 15.93 39.81 55.74 Sum 318.5 696.76 1015.28 From Simone Campana Kaushik De May 27, 2008 7

CCRC08 Tier 2 Netmon – Past Week SWT2 MWT2 WT2 GLT2 Kaushik De May 27, 2008

CCRC Plans – This Week Plan for realistic data taking exercise Data transfers T0->T1->T2, assuming 14h run Simultaneous MC production – all week Simultaneous reprocessing from tape (M5, M6, FDR data) Also involve T1->T1 transfers end of week Software validation & FDR-2 mixing will continue during this period! Finally, full scale exercise Except for distributed analysis load – chaotic but low so far Need help from everyone! Kaushik De May 27, 2008

Operating Facilities Mixture of Global and Local approach Global Operations Production shifts – many years of experience Works well – now integrated ATLAS-wide Primarily monitor and escalate Distributed Analysis support – being organized Task support – being organized at global ATLAS level (Nevski, Yu) Local Operations Tier 1 facilities support – primarily through global support Tier 1 support for Tier 2’s Tier 2 support – proactive site monitoring Communication works well within U.S. among various teams Kaushik De May 27, 2008

Are We Ready? Everything looks good – thanks to hard work by many of you But are we ready – not quite! Distributed analysis – never exercised at scale Distributed Data Management – still on critical path (more details in next talk) Reprocessing – critical test this week Core software – delayed and understaffed Several other areas are still understaffed – transformations, job definition, Panda core … LRC vs LFC – need resolution quickly Kaushik De May 27, 2008

Distributed Analysis All US sites are ready – but user demand not very high Need to implement load based brokering – now most jobs go to BNL (in computing model, most jobs go to Tier 2) Data management issues not fully resolved Storage quotas/data archiving/data lifetime issues Storage allocation for DA Have no experience (actually, no clue) what the scale will be – urgently need to plan for larger exercise Need work this summer Kaushik De May 27, 2008

Tier 3 Integration Kaushik De May 27, 2008

Distributed Data Management We can get data efficiently from T0 (mostly!) We can get data from some Tier 1’s – and send data Tools to manage data hierarchy within US not ready We recently decided not to rely on central operations U.S. cloud will be managed by us ATLAS-wide tools not fully ready – need new U.S. development This will be major activity during summer Many storage related issues Data distribution policies Data deletion strategies Allocation among various activities Proper mixture between central vs local management … Kaushik De May 27, 2008

Reprocessing Stand-alone tests at BNL & Triumf worked well But many steps manual – not final configuration Used pandamover for prestaging New tests past two weeks somewhat successful At all Tier 1’s – but few test jobs Trying to use DQ2 (pandamover still at BNL) Big tests planned this week New conditiondDB tarball Schema changes in Panda Much higher rate planned Tier 2’s will also participate Still on critical path Kaushik De May 27, 2008

Organization Issues I started as Operations Coordinator in April Main goal – better integration of applications and operations Hope to spend more time on planning, this summer! Need input from all of you! Deputy Operations Coordinator Armen Vartapetian – recently returned from CERN (after >2 years setting up ATLAS control room operations) Will start in new role, as my deputy, next week! Will be stationed at BNL I also plan to spend 5 weeks at BNL this summer Review overall operations organization and staffing Kaushik De May 27, 2008

Summary The glass is more than half full But we are not fully ready DDM, DA, software … some of the unknowns Expect busy summer And then finally data! Kaushik De May 27, 2008