Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
6/2/2015 Michael Diesburg HCP Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Snapshot of the D0 Computing and Operations Planning Process Amber Boehnlein For the D0 Computing Planning Board.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Computing at CDF ➢ Introduction ➢ Computing requirements ➢ Central Analysis Farm ➢ Conclusions Frank Wurthwein MIT/FNAL-CD for the CDF Collaboration.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
19 February 2004SAMGrid Project Review SAMGrid: Future Plans CDF Accepts the Need for the Grid –Requirements D0 Relies on the Grid –Requirements How to.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
DCAF(DeCentralized Analysis Farm) for CDF experiments HAN DaeHee*, KWON Kihwan, OH Youngdo, CHO Kihyeon, KONG Dae Jung, KIM Minsuk, KIM Jieun, MIAN shabeer,
International Workshop on HEP Data Grid Aug 23, 2003, KNU Status of Data Storage, Network, Clustering in SKKU CDF group Intae Yu*, Joong Seok Chae Department.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Recent Evolution of the Offline Computing Model of the NOA Experiment Talk #200 Craig Group & Alec Habig CHEP 2015 Okinawa, Japan.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
DØ Computing & Analysis Model
DØ MC and Data Processing on the Grid
Proposal for a DØ Remote Analysis Model (DØRAM)
The LHCb Computing Data Challenge DC06
Presentation transcript:

Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many thanks to A.Boehnlein, W.Merritt,G.Garzoglio

Frank Wuerthwein, UCSD Computing Model Data Handling System Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data User Desktops Central Robotic Storage Remote Analysis Systems

Frank Wuerthwein, UCSD Vital Statistics Vital stats. Raw data (kB/evt) Reco data (kB/evt) Primary User data (kB/evt) Format for User Skims User skims (kB/evt) Reco time (Ghz-sec/evt) MC chain MC (Ghz-sec/evt) Peak data rate (Hz) Persistent Format Total on tape in 9/03 (TB) CDF Ntuple/DST param. & geant RootIO 480 D0 230(160) TMB param or full geant 1 or D0om/dspack 420

Frank Wuerthwein, UCSD Hardware Cost Drivers  CDF: User Analysis Computing  Many users go through 1e8 evts samples  Aggressive bandwidth plans  M$ CPU farms needed for FY04/5/6  D0: (Re-)Reconstruction  Small # of layers in tracker -> pattern rec.  3month, once a year reprocessing  Prorated cost: M$ CPU farms FY04/5/6  Purchase cost: M$ CPU famrs FY04/5/6

Frank Wuerthwein, UCSD Offsite Computing Plans MC production Primary reconstruction Reprocessing User level MC User Analysis Computing CDF today NO >FY06 FY04 FY05 D0 today NO FY04 NA ~FY05 Needs & fiscal pressure differ. => Focus differs as well. D0: WAN distributed DH CDF: WAN distributed user analysis computing

Frank Wuerthwein, UCSD Computing Model Data Handling System Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data User Desktops Central Robotic Storage Remote Analysis Systems

Frank Wuerthwein, UCSD Sequential Access via Meta-data  Flagship CD-Tevatron Joint project. Initial design work ~7 years ago. Pioneered by D0.  Provides a ''grid-like'' access to DO & CDF data  Comprehensive meta-data to describe collider and Monte Carlo data.  Consistent user interface via command line and web  Local and wide area data transport  Caching layer  Batch adapter support  Declare submission time -> optimized staging  Stable SAM operations allows for global support and additional development  CDF status: SAM in production by 1/04  CDF requires: dCache (DESY/FNAL) caching software

Frank Wuerthwein, UCSD DO SAM Performance 8TB per day ->

Frank Wuerthwein, UCSD CDF Data Handling  dCache  3 caching policies: permanent, volatile, buffer  100TB disk space; MB/s total read  Drawback: data runtime  Remote systems use SAM today.  Future Vision: SAM->dCache->Enstore  Storage & Cacheing via Enstore/dCache  Cache for Enstore tape archive  Virtualizing ''scratch'' space without backup  Quota, data movement, metadata via SAM

Frank Wuerthwein, UCSD CDF DH performance CDF Remote SAM Usage for the past month CDF CAF dCache Usage yesterday 900MB/sec At peak

Frank Wuerthwein, UCSD Wide Area Networking In/Out Traffic at the border router random week in June. Inbound Outbound OC(12) to ESNET, proposed OC(48) to Starlight (~1year) 100Mb/s->

Frank Wuerthwein, UCSD Databases  Both experiments use Oracle on SUN systems  Databases used for SAM, calibration, run and trigger table tracking, luminosity accounting  Different models for each experiment  DO uses a tiered architecture of user applications, db servers and databases  CDF uses direct db access and replicated databases  Neither system is ideal  Perhaps not exactly a technical issue  Both experiments have severely underestimated requirements, development time, use cases, integration time, and operational burden.  Table design usually non optimal for use.  A personal opinion as a non-expert: poor table design leads to massive DB needs -> early expert help worth it.

Frank Wuerthwein, UCSD SAM-Grid  Several aspects to SAM-Grid project, including Job and Information Monitoring (JIM)  Part of the SAM Joint project, with oversight and effort under the auspices of FNAL/CD and Particle Physics Data Grid.  JIM v1.0 is being deployed at several sites, learning experience for all concerned  Close collaboration with Globus and Condor teams  Integration of tools such as runjob and CAF tools  Collaboration/discussions within the experiments on the interplay of LCG with SAM-Grid efforts  Tevatron experiments working towards grid environment (interest in OSG).

Frank Wuerthwein, UCSD Computing Model Data Handling System Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data User Desktops Central Robotic Storage Remote Analysis Systems

Frank Wuerthwein, UCSD DO Farm Production  DØ Reconstruction Farm—13 M event/week capacity- operates at 75% efficiency—events processed within days of collection. 400M events processed in Run II.  DØ Monte Carlo Farms—0.6M event/week capacity-globally distributed resources. Running Full Geant and Reconstruction and trigger simulation  Successful effort to start data reprocessing at National Partnership Advanced Computing Infrastructure resources at University of Michigan.

Frank Wuerthwein, UCSD DO Remote Facilities Survey of guaranteed DO resources—Work in progress 300M event reprocessing targeted for fall

Frank Wuerthwein, UCSD Computing Model Data Handling System Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data User Desktops Central Robotic Storage Remote Analysis Systems

Frank Wuerthwein, UCSD Central Robotics TBLTO TB 30.7TB STK 9940b #tapes StoredLibrary Data to tape, June 25 Known data loss due to Robotics/ Enstore for DO 3 GB! TB9940b TB9940a #tapes StoredLibrary 20TB At peak CDF D0 CDF Drive usage

Frank Wuerthwein, UCSD Computing Model Data Handling System Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data User Desktops Central Robotic Storage Remote Analysis Systems

Frank Wuerthwein, UCSD CDF Central Analysis Facility Workers 600 CPUs 16 Dual Athlon 1.6GHz / 512MB RAM 48 Dual P3 1.26GHz / 2GB RAM 236 Dual Athlon 1.8GHz / 2GB RAM FE (11 MB/s) / 80GB job scratch each Servers (~180TB total, 92 servers): IDE RAID50 hot-swap Dual P3 1.4GHz / 2GB RAM SysKonnect 9843 Gigabit Ethernet card

Frank Wuerthwein, UCSD Use Model & Interfaces  Useage Model:  Develop & debug on desktop  Submit ''sandbox'' to remote cluster  Interface to Remote Process:  Submit, kill, ls, tail, top, lock/release queue  Access submission logs  Attache a gdb session to a running process  Monitoring:  Full information about cluster status  Utilization history for hardware & users  CPU & IO consumption for each job

Frank Wuerthwein, UCSD CAF utilization User perspective: ➢ 10,000 jobs launched/day ➢ 600 users total ➢ 100 users per day System perspective: ➢ Up to 95% avg CPU utilization ➢ Typically MB/sec I/O ➢ Failure rate ~1/2000 Scheduled power outage ->

Frank Wuerthwein, UCSD DO Central Analysis Systems SGI Origin 2000 (D0mino) with MHz processors used as 30TB file server for central analysis and as the central router. Central Analysis Backend GHz AMD machines. Desktop Cluster CLuED0 is also used as a batch engine And for development Analysis Effectiveness (includes Reco and all delivery losses) W/Z skim sample/Raw data available =98.2%

Frank Wuerthwein, UCSD DO Central Analysis Systems CAB usage in CPU days Event Size per analysis platform

Frank Wuerthwein, UCSD Computing Model Data Handling System Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data User Desktops Central Robotic Storage Remote Analysis Systems

Frank Wuerthwein, UCSD DO Remote Analysis  Projects run at offsite stations Jan-Apr 2003: 2196 (~25000 projects run on Fermilab analysis stations) d0karlsruhe,munich,aachen,nijmegen,princeton- d0,azsam1,umdzero, d0-umich,wuppertal, ccin2p3-analysis  No accounting as yet for what kind of analysis-SAM is the tracking mechanism. TMB data at IN2P3 Analysis Verification Plot, GridKa

Frank Wuerthwein, UCSD CDF Remote Analysis  CAF is packaged for offsite installation.  15 CAF : 7 active, 2 test, 6 not active  Not the only offsite option: GridKa, ScotGrid, various University clusters.  Still work to do before offsite installations are fully integrated.  Biggest issues are in support, training, operations.

Frank Wuerthwein, UCSD Interactive Grid (PEAC) CE GM SAM S PM CE S S Dcache User Root Session Declare DS Translate DS Request data Copy data Start proof Interactive session Proof Enabled Analysis Center

Frank Wuerthwein, UCSD Summary and Outlook  Both experiments have a complete and operationally stable computing model  CDF focused on user analysis computing  DO focused on WAN DH & reprocessing.  Both experiments refining computing systems and evaluating scaling issues  Planning process to estimate needs  DO costing out a virtual center to meet all needs  Strong focus on joint projects: SAM, dCache  Medium term goals: convergence with CMS.  Open Science Grid