Presentation is loading. Please wait.

Presentation is loading. Please wait.

LHCb DataModel Nick Brook Glenn Patrick University of Bristol Rutherford Lab Motivation DataModel Options Future plans.

Similar presentations


Presentation on theme: "LHCb DataModel Nick Brook Glenn Patrick University of Bristol Rutherford Lab Motivation DataModel Options Future plans."— Presentation transcript:

1 LHCb DataModel Nick Brook Glenn Patrick University of Bristol Rutherford Lab Motivation DataModel Options Future plans

2 WHY NOW ? - DataGrid Architecture design process already started (ATF) - LHCb input to direction & need of WP8 (HEP applications)  Grid architecture meet LHCb requirements - how do we deal with the data - are we wanting “events” or “objects” or … (user/LHCb only superficially interested in files) Timescales: feedback from DataGrid work packages to ATF already begun end of June WP8 need to define their long term use requirements for EU

3 Philosophy - take a conservative viewpoint we want to perform analysis from day 1 ! a “simplistic” model i.e. less reliant on the Grid building on our Grid tools is easier to unravel than built- in sophistication if the Grid fails to meet expectations Time Hype Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity Trigger

4 Architecture of LHCb Computing Model - based on a distributed multi-Tier regional centre Processing of real data at CERN (production centre) National centre simulation production centre

5 Physics GeneratorDetector Simulation Generator Data Monte Carlo Raw Data DAQ systemL2/L3 Trigger Calibration System Calibration Data Reconstruction Event Summary Data (ESD) Event Tags Detector RawMC Data RawMC Tags Raw Tags Dataflow Model Forget MC needs at our peril !!! MC samples will be our greatest strain on resources... … not only CPU - storage, bandwidth...

6 ESD Reconstruction TagsAnalysis Object Data (AOD)Physics Tags First Pass Analysis Physics Analysis Private Data Analysis Workstation Physics resultsGenerator Data MC evts only Raw Data DataFlow Model How do we access the data? How often do we need to access ESD or RAW data ? What info. is available from AOD ? Need to address questions to both MC and real data

7 RAW per event150kB ESD per event100kB AOD per event20kB TAG per event1kB Nominal Event Sizes AOD factor of 5 smaller than ESD - if access needed to ESD for analysis it has large consequences on the amount of data that is moved around - re-visit how we view the data, perhaps better to breakdown an event in to constituent components

8 AOD Information AOD Information - 20kB/evt To minimise bandwidth requirements it is important that whole info. for analysis available on AOD. How do we access non AOD info. ? Above example: 3  ’s and K intermediate decay particles B-meson event information (e.g. tot event energy, pos n of primary vertex…) object info. e.g. quality of particle ID, error/quality of track measurement... no/limited info. on non B hadron candidates } “objects”

9 Real Data CERN Tier1 Data needs to get from CERN to regional centres CERN (as production centre) should trigger the distribution of this data to Tier 1’s AOD: 20TB/year/pass ESD: 100TB/year/pass Repeat passes over data and distribution - centralised control (as opposed to user triggered over Grid)

10 Real Data What data goes to where ? Working assumption AOD(+TAG) will be distributed Options: All ESD & AOD data  < 1PB of data - distributed by production centre in addition to physics TAG in database ( access problems ?) streamingevent header info. - partial distribution based on this info. match to national physics requirements ?

11 Real Data - only Tier 0-2 will have Grid enabled data - analysis performed, to first order, over national computing resources i.e. Tier 1-2 (+ 3-4 ?) - Grid advertised resources needed to make decision on job-to-data or data-to-job - analysis deals with “events” as the smallest working unit

12 CERN Tier1 Monte Carlo Data - complications: MC production centres scattered throughout LHCb How do we distribute - access MC data ? How do we manage MC needs ?

13 Monte Carlo Production distributed throughout collaboration priority & production co-ordinated centrally (as opposed to user triggered) production throughout Tier structure - down to Tier 4 ? (Local) Tier 1 centres will store MC production (including raw MC data )? availability of MC samples in distributed Tier 1

14 Monte Carlo Production changes in recons. code &/or AOD production ?  re-run over MC samples again “re-creation” of ESD/AOD should be centrally initiated, rather than re-creation triggered by user need  need to define obsolete MC samples “re-creation” performed at original national Tier 1  need to advertise new dataset

15 Monte Carlo Analysis - because of distributed nature of MC production not clear if we want to distribute AODs as with real data (current baseline sol n ) Alternatives: (a) AOD stay at “local” generation Tier 1 & analysis jobs are moved to data until some access threshold is reached which triggers transfer of AOD to user’s local Tier 1 (b) user trigger transfer (c) Tier2-Tier 1 caching i.e not moving MC data from Tier2 unless requested (d) Automatically ship AOD & ESD’s to all Tier 1’s

16 Version - the availability of s/w (version compatability) - what datasets exists, created with which s/w version Crucial clean “versioning” of s/w (including LHCb s/w) over distributed environment Grid will need to advertise not only available h/w resources and LHCb data BUT also the available s/w resources.

17 Requirements GRID data replication - fast,reliable, common tools stats. on data access (incl. Geographical parameters) uniform working env. info. services - incl. s/w availability meta-data services/cataloguing active & archive storage- migration/movement of data, restaging LHCb clean versioning system - data & s/w priority management - e.g recreating MC AODs centralised control of MC Def n of AOD & ESD data -i/p from physics group policy on obtaining “missing info.” - “tough luck” or automatic generation

18 What Next ? - examine the options and if poss. decide on a baseline (aim: LHCb Grid meeting, Bologna, June 13 th -15 th ) - initial working system: realistic, pragmatic, flexible - identify crucial elements and develop a backbone analysis Grid (BAG) now to run in parallel to the Monte Carlo Production Grid enable LHCb analysis framework & services to be developed intial testbed to explore how Tier 2 & Tier 3 centres fit into Grid topology initial working model - to attract effort and funding physics group effort needed for datamodel & analysis system design

19 BAG Requirements standardised replication tools - originally based on existing tools basic meta-data catalogues - based around simple LDAP implementation standard LHCb analysis environment - installation toolkit, interfaces to mass storage systems,... Interface between LHCb Gaudi analysis framework & Grid

20 Gaudi Services Application Manager Job Options Service Detector Description EventData Service Histogram Service Message Service Particle Property Service GaudiLab Service Grid Services Information Services Scheduling Security Monitoring Data Management Service Discovery Database Service? Meta Data Data Standard Interfaces & Protocols Logical DataStores Event Detector Histogram Ntuple Most Grid services are producers or consumers of meta-data GAUDI meets the Grid

21 Data Model depends on how we perform analysis. Can we really do analysis on AOD only? Major implications. Analysis of MC data is a vital element of the problem due to distributed nature & scale of production. Realistic “physics” analysis performed: (a) at CERN (b) external institute Need to begin work/studies on interfacing Gaudi to Grid Limitations on networking will have to feed into our approach to analysis Summary


Download ppt "LHCb DataModel Nick Brook Glenn Patrick University of Bristol Rutherford Lab Motivation DataModel Options Future plans."

Similar presentations


Ads by Google