Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Similar presentations


Presentation on theme: "Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2."— Presentation transcript:

1 Markus Frank (CERN) & Albert Puig (UB)

2  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2

3 Data logging facility Data logging facility Readout Network CPU Online cluster (event selection) Online cluster (event selection) Storage 3

4 4  ~16000 CPU cores foreseen (~1000 boxes)  Environmental constraints:  2000 1U boxes space limit  50 x 11 kW cooling/power limit  Computing power equivalent to that provided by all Tier 1’s to LHCb.  Storage system:  40 TB installed  400-500 MB/s Data logging facility Readout Network CPU Online cluster (event selection) Online cluster (event selection) Storage

5  Significant idle time of the farm  During LHC winter shutdown (~ months)  During beam period, experiment and machine downtime (~ hours) 5 Could we use it for reconstruction?  Farm is fully LHCb controlled  Good internal network connectivity -Slow disk access (only fast for a very few nodes via Fiber Channel interface)

6  Background information: + 1 file (2GB) contains 60.000 events. + It takes 1-2s to reconstruct an event.  Cannot reprocess à la Tier-1 (1 file per core)  Cannot perform reconstruction in short idle periods:  Each file takes 1-2 s/evt * 60k evt ~ 1 day.  Insufficient storage or CPUs not used efficiently:  Input: 32 TB (16000 files * 2 GB/file)  Output: ~44 TB (16000 * 60k evt * 50 kB/evt)  A different approach is needed  Distributed reconstruction architecture. 6

7  Files are split in events and distributed to many cores, which perform reconstruction:  First idea: full parallelization (1 file/16k cores)  Reconstruction time: 4-8 s  Full speed not reachable (only one file open!)  Practical approach: split the farm in slices of subfarms (1 file/n subfarms).  Example: 4 concurrent open files yield a reconstruction time of 30s/file. 7

8 8 ECS Control and allocation of resources Reco Manager Database Job steering DIRAC Connection to LHCb production system Storage Switch Storage Nodes Subfarms …… …… Farm See A.Tsaregorodtsev talk, DIRAC3 - the new generation of the LHCb grid software

9 9 ECS Control and allocation of resources Farm Storage Switch Subfarms …… …… Reco Manager Database Job steering DIRAC Connexion to production system Storage Nodes

10 10  Control using standard LHCb ECS software:  Reuse of existing components for storage and subfarms.  New components for reconstruction tree management.  See Clara Gaspar’s talk (LHCb Run Control System).  Allocate, configure, start/stop resources (storage and subfarms).  Task initialization slow, so tasks don’t restart on file change.  Idea: tasks sleep during data-taking, and are only restarted on configuration change.

11 subfarm 11 PVSS Control x 50 subfarms 1 control PC each 4 PC each x 8 cores/PC 1 Reco task/core Data management tasks Event (data) processing Event (data) management

12 Target Node 12 Consumer Producer Buffer manager Processing Node  Data processing block  Producers put events in a buffer manager (MBM)  Consumers receive events from the MBM Buffer Sender Receiver  Data transfer block  Senders access events from MBM  Receivers get data and declare it to MBM Source Node Buffer

13 13 Storage Reader Receiver Brunel Reco Storage Writer Sender Storage nodes Worker node 1 per core Output Events Sender Receiver Input

14 14 ECS Control and allocation of resources Reco Manager Database Job steering DIRAC Connection to LHCb production system Storage Switch Subfarms …… …… Farm Storage Nodes

15 15  Granularity to file level.  Individual event flow handled automatically by allocated resource slices.  Reconstruction: specific actions in specific order.  Each file is treated as a Finite State Machine (FSM)  Reconstruction information stored in a database.  System status  Protection against crashes PREPARED PROCESSING PREPARING TODO DONEERROR

16 16  Job steering done by a Reco Manager:  Holds the each FSM instance and moves it through all the states based on the feedback from the static resources.  Sends commands to the readers and writers: files to read and filenames to write to.  Interacts with the database.

17 17  The Online Farm will be treated like a CE connected to the LHCb Production system.  Reconstruction is formulated as DIRAC jobs, and managed by DIRAC WMS Agents.  DIRAC interacts with the Reco Manager through a thin client, not directly with the DB.  Data transfer in and out of the Online farm managed by DIRAC DMS Agents.

18 18  Current performance constrained by hardware.  Reading from disk: ~130 MB/s.  FC saturated with 3 readers.  Reader saturates CPU at 45MB/s.  Test with dummy reconstruction  Just copy data from input to output  Stable throughput 105 MB/s  Constrained by Gigabit network  Upgrade to 10 Gigabit planned Storage Reader Receiver Reco Storage Writer Sender Receiver  Resource handling and Reco Manager implemented.  Integration in LHCb Production system recently decided, not implemented yet.

19 19  Software  Pending implementation of thin client for interfacing with DIRAC.  Hardware  Network upgrade to 10 Gbit in storage nodes before summer.  More subfarms and PCs to be installed.  From current ~4800 cores to the planned 16000.

20 20  The LHCb Online cluster needs huge resources for data event selection of LHC collisions.  These resources have much idle time (50% of the time).  They can be used on idle periods by applying a parallelized architecture to data reprocessing.  A working system is already in place, pending integration in the LHCb Production system.  Planned hardware upgrades to meet DAQ requirements should overcome current bandwidth constraints.

21 21 OFFLINE NOT READY READY load configure start unload reset stop recover task dies

22 22 Storage Switch I/O Nodes Subfarms …… ……

23 23 PVSS Control and allocation of resources Reco Manager Database Job steering DIRAC Connexion to production system Storage Switch I/O Nodes Subfarms …… …… Farm

24 24 PVSS Control and allocation of resources Reco Manager Database Job steering DIRAC Connexion to production system Farm Storage Disk Reader Event Sender Storage Reader Receiver Brunel Storage Writer Sender

25 25 Storage Reader Receiver Brunel Reco Storage Writer Sender Storage node Worker node 1 per core Input Output Events Sender Receiver Disk Reader Event Sender I/O node Events FC


Download ppt "Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2."

Similar presentations


Ads by Google