Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Slides:



Advertisements
Similar presentations
Operating System.
Advertisements

CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
1 HLT – ECS, DCS and DAQ interfaces Sebastian Bablok UiB.
1 Operating Systems Ch An Overview. Architecture of Computer Hardware and Systems Software Irv Englander, John Wiley, Bare Bones Computer.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Alain Romeyer - 15/06/20041 CMS farm Mons Final goal : included in the GRID CMS framework To be involved in the CMS data processing scheme.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
+ discussion in Software WG: Monte Carlo production on the Grid + discussion in TDAQ WG: Dedicated server for online services + experts meeting (Thusday.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
Clara Gaspar, July 2005 RTTC Control System Status and Plans.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Status & development of the software for CALICE-DAQ Tao Wu On behalf of UK Collaboration.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
M.Frank, CERN/LHCb Persistency Workshop, Dec, 2004 Distributed Databases in LHCb  Main databases in LHCb Online / Offline and their clients  The cross.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Clara Gaspar, February 2010 DIM A Portable, Light Weight Package for Information Publishing, Data Transfer and Inter-process Communication.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Using the Grid for the ILC Mokka and Marlin on the Grid ILC Software Meeting, Cambridge 2006.
Operating System.
TPC Commissioning: DAQ, ECS aspects
Controlling a large CPU farm using industrial tools
Bernd Panzer-Steindel, CERN/IT
ProtoDUNE SP DAQ assumptions, interfaces & constraints
The LHCb High Level Trigger Software Framework
Proposal for a DØ Remote Analysis Model (DØRAM)
Production Manager Tools (New Architecture)
Presentation transcript:

Markus Frank (CERN) & Albert Puig (UB)

 An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2

Data logging facility Data logging facility Readout Network CPU Online cluster (event selection) Online cluster (event selection) Storage 3

4  ~16000 CPU cores foreseen (~1000 boxes)  Environmental constraints:  U boxes space limit  50 x 11 kW cooling/power limit  Computing power equivalent to that provided by all Tier 1’s to LHCb.  Storage system:  40 TB installed  MB/s Data logging facility Readout Network CPU Online cluster (event selection) Online cluster (event selection) Storage

 Significant idle time of the farm  During LHC winter shutdown (~ months)  During beam period, experiment and machine downtime (~ hours) 5 Could we use it for reconstruction?  Farm is fully LHCb controlled  Good internal network connectivity -Slow disk access (only fast for a very few nodes via Fiber Channel interface)

 Background information: + 1 file (2GB) contains events. + It takes 1-2s to reconstruct an event.  Cannot reprocess à la Tier-1 (1 file per core)  Cannot perform reconstruction in short idle periods:  Each file takes 1-2 s/evt * 60k evt ~ 1 day.  Insufficient storage or CPUs not used efficiently:  Input: 32 TB (16000 files * 2 GB/file)  Output: ~44 TB (16000 * 60k evt * 50 kB/evt)  A different approach is needed  Distributed reconstruction architecture. 6

 Files are split in events and distributed to many cores, which perform reconstruction:  First idea: full parallelization (1 file/16k cores)  Reconstruction time: 4-8 s  Full speed not reachable (only one file open!)  Practical approach: split the farm in slices of subfarms (1 file/n subfarms).  Example: 4 concurrent open files yield a reconstruction time of 30s/file. 7

8 ECS Control and allocation of resources Reco Manager Database Job steering DIRAC Connection to LHCb production system Storage Switch Storage Nodes Subfarms …… …… Farm See A.Tsaregorodtsev talk, DIRAC3 - the new generation of the LHCb grid software

9 ECS Control and allocation of resources Farm Storage Switch Subfarms …… …… Reco Manager Database Job steering DIRAC Connexion to production system Storage Nodes

10  Control using standard LHCb ECS software:  Reuse of existing components for storage and subfarms.  New components for reconstruction tree management.  See Clara Gaspar’s talk (LHCb Run Control System).  Allocate, configure, start/stop resources (storage and subfarms).  Task initialization slow, so tasks don’t restart on file change.  Idea: tasks sleep during data-taking, and are only restarted on configuration change.

subfarm 11 PVSS Control x 50 subfarms 1 control PC each 4 PC each x 8 cores/PC 1 Reco task/core Data management tasks Event (data) processing Event (data) management

Target Node 12 Consumer Producer Buffer manager Processing Node  Data processing block  Producers put events in a buffer manager (MBM)  Consumers receive events from the MBM Buffer Sender Receiver  Data transfer block  Senders access events from MBM  Receivers get data and declare it to MBM Source Node Buffer

13 Storage Reader Receiver Brunel Reco Storage Writer Sender Storage nodes Worker node 1 per core Output Events Sender Receiver Input

14 ECS Control and allocation of resources Reco Manager Database Job steering DIRAC Connection to LHCb production system Storage Switch Subfarms …… …… Farm Storage Nodes

15  Granularity to file level.  Individual event flow handled automatically by allocated resource slices.  Reconstruction: specific actions in specific order.  Each file is treated as a Finite State Machine (FSM)  Reconstruction information stored in a database.  System status  Protection against crashes PREPARED PROCESSING PREPARING TODO DONEERROR

16  Job steering done by a Reco Manager:  Holds the each FSM instance and moves it through all the states based on the feedback from the static resources.  Sends commands to the readers and writers: files to read and filenames to write to.  Interacts with the database.

17  The Online Farm will be treated like a CE connected to the LHCb Production system.  Reconstruction is formulated as DIRAC jobs, and managed by DIRAC WMS Agents.  DIRAC interacts with the Reco Manager through a thin client, not directly with the DB.  Data transfer in and out of the Online farm managed by DIRAC DMS Agents.

18  Current performance constrained by hardware.  Reading from disk: ~130 MB/s.  FC saturated with 3 readers.  Reader saturates CPU at 45MB/s.  Test with dummy reconstruction  Just copy data from input to output  Stable throughput 105 MB/s  Constrained by Gigabit network  Upgrade to 10 Gigabit planned Storage Reader Receiver Reco Storage Writer Sender Receiver  Resource handling and Reco Manager implemented.  Integration in LHCb Production system recently decided, not implemented yet.

19  Software  Pending implementation of thin client for interfacing with DIRAC.  Hardware  Network upgrade to 10 Gbit in storage nodes before summer.  More subfarms and PCs to be installed.  From current ~4800 cores to the planned

20  The LHCb Online cluster needs huge resources for data event selection of LHC collisions.  These resources have much idle time (50% of the time).  They can be used on idle periods by applying a parallelized architecture to data reprocessing.  A working system is already in place, pending integration in the LHCb Production system.  Planned hardware upgrades to meet DAQ requirements should overcome current bandwidth constraints.

21 OFFLINE NOT READY READY load configure start unload reset stop recover task dies

22 Storage Switch I/O Nodes Subfarms …… ……

23 PVSS Control and allocation of resources Reco Manager Database Job steering DIRAC Connexion to production system Storage Switch I/O Nodes Subfarms …… …… Farm

24 PVSS Control and allocation of resources Reco Manager Database Job steering DIRAC Connexion to production system Farm Storage Disk Reader Event Sender Storage Reader Receiver Brunel Storage Writer Sender

25 Storage Reader Receiver Brunel Reco Storage Writer Sender Storage node Worker node 1 per core Input Output Events Sender Receiver Disk Reader Event Sender I/O node Events FC