Download presentation
Presentation is loading. Please wait.
Published byCory Austin Modified over 8 years ago
1
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios
2
Vincenzo Innocente, CERN/EPUser Collections2 CMS Data Analysis Model Detector Control Online Monitoring Environmental data store Request part of event Simulation store Data Quality Calibrations Group Analysis User Analysis on demand Request part of event Request part of event Store rec-Obj and calibrations Quasi-online Reconstruction Request part of event Store rec-Obj Persistent Object Store Manager Database Management System Event Filter Object Formatter PhysicsPaper
3
Vincenzo Innocente, CERN/EPUser Collections3 Forgive me the Obvious No simple solution to complex problems No Silver Bullet: Technology is a helper, not a solution by itself What counts is the global analysis efficiency (time to paper) Single job turn-around is just one component
4
Vincenzo Innocente, CERN/EPUser Collections4 Assumptions A CMS Generic “query” is too complex to be fully specified in a way different than several thousand lines of “code”. Including: Which files and which objects will be open (or not) Which objects will be created and stored and where A CMS job is composed by A CMS software configuration including the executable A set of user shared-libraries A configuration file defining User libraries to load Input event collection Output Dataset (including physical-clustering directives) Values for user-configurable parameters Heuristics do exist that allows to “map” the configuration file to file-sets
5
Vincenzo Innocente, CERN/EPUser Collections5 Some COBRA principles Developers and users are the same physicists Everything is expressed in code (C++) Unique development and running environment Minimal pre-requisites and pre-specifications Dependencies are implicit and expressed in the code COBRA discovers and self-adapts to the environment the configuration the data-product to materialize COBRA is able to navigate from MetaData to the Event-Data and back
6
Vincenzo Innocente, CERN/EPUser Collections6 Generic COBRA job Input Event collection is user-made Can be easily time ordered May contain events requiring different detector configurations With different materialized data products Reconstructed with different configurations COBRA discovers Materializable data-products From the existing configuration of the output dataset From the instantiated algorithms (i.e. loaded shared libs) Nothing prevents to load new libraries and instantiate new algorithms in the middle of the processing A data-product is materialized When an algorithms “dereference” it If an “equivalent version” is not present in the input event If the user explicitly asks for it Dependencies among data-products are know only a-posteriory
7
Vincenzo Innocente, CERN/EPUser Collections7 Grid Constraints Broker needs means to optimize resources in a competitive environment Compute cost (cpu, I/O bandwidth and volume) Identify input and output SE Identify most suitable CE Establish priorities Easy for bread-and-butter physics More difficult for discovery physics
8
Vincenzo Innocente, CERN/EPUser Collections8 Pre-emptive “Quasi-online” production Heuristics are usually easy Input event-collection is homogeneous Same Materialized data products Same configuration Output Dataset is homogeneous No fancy event selection Fixed, “stamped” configuration Predefined typed of data products to Materialized In steady state not difficult for production management (Human) to know and specify Which input data product will be used Which data product will be materialized and where
9
Vincenzo Innocente, CERN/EPUser Collections9 Reality strikes also production To go faster and keep up with input rate Some Data products will not be materialized Fancy selections will be put in place to fully reconstruct only some classes of events Configuration (and algorithms) will be modified almost online to get best resolution even from pass1 Input collection to reprocessing (pass 2) will usually not be as homogeneous as hoped…. Unless restart from scratch
10
Vincenzo Innocente, CERN/EPUser Collections10 Production of Analysis-Group AOD Input event collection Selected events from previous pre-emptive reconstruction Should contain (by definition) all required intermediate data-products Output DataSet Well defined configuration Only and all the group AOD Access to raw-data not required? Recalibration Fancy (slow) reconstruction algorithm in special cases
11
Vincenzo Innocente, CERN/EPUser Collections11 User end analysis Input event collection Group analysis AOD Output dataset Personal AOD Ntuple No access to raw-data No access to basic reconstructed objects BUT random full access to selected events for detailed (interactive) analysis and visualization purposes
12
Vincenzo Innocente, CERN/EPUser Collections12 A possible scenario Run a small (0.1/%) production test For each event in the full input collection Compare input configuration (and actual list of materialized data-products) with the output configuration from the test Produce a list of probable data-products to materialize and their cost For each data-product get the actual dependencies from production test Compile the list of input data-products Map each input data-product to a logical file Compile the list of required input files Rearrange and split input collection according to file location and cost of materialization Send jobs
13
Vincenzo Innocente, CERN/EPUser Collections13 Conclusion CMS can provide already today a COBRA application that Running 0.1% of the production Processing all Event Headers and metadata produces the list of most probable data-products to materialize and logical files to access for each event
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.