Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

Slides:

Advertisements

Similar presentations

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.

The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

6/2/2015 Michael Diesburg HCP Distributed Computing at the Tevatron D0 Computing and Event Model Michael Diesburg, Fermilab For the D0 Collaboration.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.

November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.

Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,

Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.

3rd Nov 2000HEPiX/HEPNT CDF-UK MINI-GRID Ian McArthur Oxford University, Physics Department

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.

Snapshot of the D0 Computing and Operations Planning Process Amber Boehnlein For the D0 Computing Planning Board.

Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

Distribution After Release Tool Natalia Ratnikova.

Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.

DØ RAC Working Group Report Progress Definition of an RAC Services provided by an RAC Requirements of RAC Pilot RAC program Open Issues DØRACE Meeting.

SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.

DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.

DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu.

BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.

Spending Plans and Schedule Jae Yu July 26, 2002.

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

DØSAR a Regional Grid within DØ Jae Yu Univ. of Texas, Arlington THEGrid Workshop July 8 – 9, 2004 Univ. of Texas at Arlington.

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.

1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.

From DØ To ATLAS Jae Yu ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA Introduction DØ-Grid & DØRACE DØ Progress UTA DØGrid Activities Conclusions.

GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.

HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.

DØRACE Workshop Jae Yu Feb.11, 2002 Fermilab Introduction DØRACE Progress Workshop Goals Arrangements.

Offline Status Report A. Antonelli Summary presentation for KLOE General Meeting Outline: Reprocessing status DST production Data Quality MC production.

UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.

Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.

Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many.

Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

July 10, 2002DØRACE Status Report, Jae Yu DØ IB Meeting, OU Workshop 1 DØ RACE Status Report Introduction Software Distribution (DØRACE Setup) Hardware.

DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005.

Hall D Computing Facilities Ian Bird 16 March 2001.

Artem Trunov and EKP team EPK – Uni Karlsruhe

ALICE Computing Model in Run3

DØ MC and Data Processing on the Grid

DØ Internal Computing Review

DØ RAC Working Group Report

Proposal for a DØ Remote Analysis Model (DØRAM)

Presentation transcript:

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do? What is involved in each of the functionality? DØRAC Face-to-Face Meeting Apr. 25, 2002 Jae Yu

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 2 Why do we need a DØRAM? Total Run II data size reaches over multiple PB –300TB and 2.8PB for RAW in Run IIa and IIb –410TB and 3.8PB for RAW+DST+TMB –1.0x10 9 / 1.0x10 9 Events total At the fully optimized 10sec/event (40 specInt95) reco.  1.0x10 10 Seconds for one time reprocessing for Run IIa Takes 7.6Mo using MHz (40 specInt 95) machines at 100% CPU efficiency –1.5 Mo with 500 4GHz machines for Run IIa –7.5 to 9 Mos with 500 4GHz machines for Run II b –Time for data transfer occupying 100% of a gigabit (125Mbyte/s) network 3.2x10 6 /3.2x10 7 seconds to transfer the entire data set (A full year with 100% OC3 bandwidth)

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 3 Data should be readily available for expeditious analyses –Preferably disk resident so that time for caching is minimized Analysis processing compute power should be available without having the users relying on CAC MC generation should be transparently done Should exploit compute resources at remote sites Should exploit human resources at remote sites Minimize resource needs at the CAC –Different resources will be needed

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 4 Central Analysis Center (CAC) Desktop Analysis Stations DAS …. DAS …. RAC …. Regional Analysis Centers Store & Process 10~20% of All Data IAC …. Institutional Analysis Centers IAC …. IAC Normal Interaction Communication Path Occasional Interaction Communication Path Proposed DØRAM Architecture FNAL

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 5 What is a DØRAC? An institute with large concentrated and available computing resources –Many 100s of CPUs –Many 10s of TBs of disk cache –Many 100Mbytes of network bandwidth –Possibly equipped with HPSS An institute willing to provide services to a few small institutes in the region An institute willing to provide increased infrastructure as the data from the experiment grows An institute willing to provide support personnel if necessary

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 6 Chip’s W x-sec Measurement 3 4 2

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 7 What services do we want a DØRAC do? 1.Provide intermediary code distribution 2.Generate and reconstruct MC data set 3.Accept and execute analysis batch job requests 4.Store data and deliver them upon requests 5.Participate in re-reconstruction of data 6.Provide database access 7.Provide manpower support for the above activities

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 8 Code Distribution Service Current releases: 4GB total  will grow to >8GB? Why needed?: –Downloading 8GB once every week is not a big load on network bandwidth –Efficiency of release update rely on Network stability –Exploit remote human resources What is needed? –Release synchronization must be done at all RACs every time a new release become available –Potentially need large disk spaces to keep releases –UPS/UPD deployment at RACs FNAL specific Interaction with other systems? –Need administrative support for bookkeeping

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 9 Generate and Reconstruct MC data Currently done 100% at remote sites Why needed? –Extremely self-contained Code distribution done via a tar-ball –Demand will grow –Exploit available compute resources What is needed? –A mechanism to automate request processing –A Grid that can Accept job requests Packages the job Identify and locate the necessary resources Assign the job to the located institution Provide status to the users Deliver or keep the results Perhaps most undisputable task but do we need a DØRAC?

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 10 Batch Job Processing Currently rely on FNAL resources –D0mino, ClueD0, CLUBS, etc Why needed?: –Bring the compute resources closer to the user –Distribute the computing load to available resources –Allow remote users to process their jobs expeditiously –Exploit the available compute resources –Minimize resource load at CAC –Exploit remote human resources

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 11 Batch Job Processing cont’d What is needed? –Sufficient computing infrastructure to process requests Network CPU Cache storage –Access to relevant databases –A Grid that can: Accept job requests Packages the job Identify and locate the necessary resources Assign the job to the located institution Provide status to the users Deliver or keep the results This task definitely needs a DØRAC –What do we do with input? Keep them at RACs?

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 12 Data Caching and Delivery Currently only at FNAL Why needed? –Limited disk cache at FNAL Tape access needed Latencies involved, sometimes very long –Delivering data within a reasonable time over the network to all the requests is imprudent –Reduce resource load on the CAC –Data should be readily available to the users with minimal latency for delivery

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 13 Data Caching and Delivery cont’d What is needed? –Need to know what data and how much we want to store 100% TMB 10-20% DST? Any RAW data at all? What about MC? 50% of the actual data –Should be on disk to minimize data caching latency How much disk space? (10TB for Run IIa TMB) –Constant shipment of data to all RACs from the CAC Constant bandwidth occupation (14MB/sec for Run IIa RAW) Resources from CAC needed –A Grid that can Locate the data (SAM can do this already…) Tell the requester about the extent of the request Decide whether to move the data or pull the job over

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 14 Data Reprocessing Services These include: –Re-reconstruction of the data From DST? From RAW? –Re-streaming of data –Re-production of TMB data sets –Re-production of roottree – ab initio reconstruction Currently done only at CAC offline farm

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 15 Reprocessing Services cont’d Why needed?: –The CAC offline farm will be busy with fresh data reconstruction Only 50% of the projected capacity is used for this but … Going to be harder to re-reconstruct as more data accumulates –We will have to Reconstruct a few times (>2) to improve data Re-stream TMB Re-produce TMBs from DST and RAW Re-produce root-tree –It will take a many months to re-reconstruct the large amount of data 1.5 Mo with 500 4GHz machines for Run IIa 7.5 to 9 Mos for full reprocessing Run IIb –Exploit large resources in remote institutions –Expedite re-processing for expeditious analyses Cutting down the time by a factor of 2 to 3 will make difference –Reduce the load on CAC offline farm –Just in case the CAC offline farm is having trouble, the RACs can even help out with ab initio reconstruction

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 16 Reprocessing Services cont’d What is needed? –Permanently store necessary data, because it would take a long time just to transfer data DSTs RAW –Large date storage –Constant data transfer from CAC to RACs as we take and reconstruct data Dedicated file server for data distribution to RACs Constant bandwidth occupation Sufficient buffer storage at CAC in case network goes down Reliable and stable network –Access to relevant databases Calibration Luminosity Geometry and Magnetic Field Map

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 17 Database Access Service Currently done only at CAC Why needed? –For data analysis –For reconstruction of data –To exploit available resources What is needed? –Remote DB access software services –Some copy of DB at RACs –A substitute of Oracle DB at remote sites –A means of synchronizing DBs

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 18 Reprocessing Services cont’d –Transfer of new TMB and Roottrees to other sites –Well synchronized reconstruction code –A grid that can Identify resources on the net Optimize resource allocation for most expeditious reproduction Move data around if necessary –A dedicated block of time for concentrated CPU usage if disaster strikes –Questions Do we keep copies of all data at the CAC? Do we ship DSTs and TMBs back to CAC? This service is perhaps the most debatable one but I strongly believe this is one of the most valuable functionality of RAC.