Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday.

Slides:



Advertisements
Similar presentations
Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 GridPP 30, Glasgow, 26th March 2013.
Advertisements

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.
Edinburgh (ECDF) Update Wahid Bhimji On behalf of the ECDF Team HepSysMan,10 th June 2010 June-10 Hepsysman1Wahid Bhimji - ECDF  Edinburgh Setup  Hardware.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
RAL PPD Site Update and other odds and ends Chris Brew.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP35, Liverpool 11 Sep 2015.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Testing the UK Tier 2 Data Storage and Transfer Infrastructure C. Brew (RAL) Y. Coppens (Birmingham), G. Cowen (Edinburgh) & J. Ferguson (Glasgow) 9-13.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
Scientific Computing in PPD and other odds and ends Chris Brew.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Dave Newbold, University of Bristol14/8/2001 Testbed 1 What is it? First deployment of DataGrid middleware tools The place where we find out if it all.
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
Cambridge Site Report John Hill 20 June 20131SouthGrid Face to Face.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
Experience of Lustre at QMUL
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Experience of Lustre at a Tier-2 site
Luca dell’Agnello INFN-CNAF
Readiness of ATLAS Computing - A personal view
Oxford Site Report HEPSYSMAN
Edinburgh (ECDF) Update
CREAM-CE/HTCondor site
OffLine Physics Computing
The LHCb Computing Data Challenge DC06
Presentation transcript:

Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday

ECDF Grid Tier2  Cluster shared with other uni users: ~3000 cores in total  Storage just for gridPP : ~175 TB in DPM form on dense DELL boxes  Systems team who do most hardware /os Local computing  Run physics-wide Andy has to interact but I’m not going to talk much about hardware etc. much here Will talk a bit about ATLAS user running on it

 GridPP hardware cash: most spent on storage (DELL R MD1200) *3 + 2 R510s 175 -> 350 (+) delivered TB (by june) Also middleware servers – soon all on newish hw  DRI cash mostly spent by networking guys improving outgoing links to 10 Gig resilient links (matching investment from uni) Also 10gig switch for pool servers; new racks etc.

srif-kb1srif-at1 kb7at5 S-PoPS-PoP2 ACF King’s BuildingsAppleton Tower “SRIF” EdLAN Backbone JANET to Leeds (currently active) JANET to Glasgow (currently standby) srif-bush1 New network layout – old links grayed out

 Generally running smoothly  Bit of deployment: cvmfs; cream ces etc.  Lots (for us) of running jobs (up to ~1700)  Though should be able to run 3000

ECDF 99% APRIL ATLAS Availability

 In last week qs have been up and down Cmtsite timeout Other site see it – but seems to turn us off more?; and why now? Discuss?  Also datadisk blacklisting “alpha” site for ATLAS with decent no. of jobs means ATLAS expect to be able to place data We don’t actually have bigboy level disk … but do have some being commissioned

Involved in three areas for ATLAS multicore readiness:  AthenaMP performance measurement CPU/ memory usage, serialisation timing, optimum job length and number of input files  ATLAS queue validation Working with ATLAS distributed computing team to validate queue readiness for all sites wishing to run ATLAS multicore job  In UK: RAL, ECDF, Glasgow, Lancaster  Multicore scheduler simulation Developed testbed to simulate scheduler response to jobs requesting different numbers of CPU cores Used to identify resource contention that leads to loss of job throughput See Andy’s talk at GridPP – and aCHEP 2012 talk: "Multi-core job submission and grid resource scheduling for ATLAS AthenaMP"

 ATLAS users getting files locally with dq2 Get failures and running out of local disk space  So read interactively on desktop from T2 LOCALGROUPDISK instead Transfer in managed with atlas central Datri tool Files opened with rfio directly They can run on ecdf batch (or condor) to scale up Need few instructions for users but works fine  Users need to use ROOT TTreeCache for decent network read  Also Tree->GetEntriesFast to avoid slow jobs start if lots of small files  Obviously depends on network to T2.

ECDF running well (x-fingers)  Has allowed time for some other interesting stuff  We are getting quite a big site from ATLAS side: T2D Analysis jobs GROUP space for SOFT-SIMUL Multicore jobs  That brings responsibility and expectations on hardware