3rd November Richard Hawkings Luminosity, detector status and trigger - conditions database and meta-data issues How we might apply the conditions DB to some of the requirements in: Luminosity task force report Run structure report Meta-data task force report (draft) Data preparation/data quality discussions This talk: Reminder of conditions DB concepts relevant here Proposal for storage of luminosity, status and trigger information in CondDB Relation to TAG database, data flow through system Other meta-data related comments For more in-depth discussion, see document attached to agenda page ATLAS luminosity TF workshop, 3/11/06
3rd November Richard Hawkings Conditions DB - basic concepts COOL-based database architecture - data with an interval of validity (IOV) Online, Tier 0/1 COOL Relational Abstraction Layer (CORAL) Oracle DB MySQL DB Application Small DB replicas SQLite File Frontier web File-based subsets http-based Proxy/cache SQL-like C++ API C++, python APIs, specific data model IOV start IOV stop channel1(tag1)payload1 IOV start IOV stop channel2(tag2)payload2 COOL IOV (63 bit) can be interpreted as: Absolute timestamp (e.g. for DCS) Run/event number (e.g. for calibration) Run/LB number (possible to implement) COOL payload defined per ‘folder’ Ttuple of simple types 1 DB table row Can also be a reference to external data Use channels (int, soon string) for multiple instances of data in 1 folder COOL tags allow multiple data versions COOL folders organised in a hierarchy Athena interfaces, replication, … Indexed
3rd November Richard Hawkings Storage of luminosity block information in COOL Luminosity block information from the online system Start/end event number and timestamps per LB, {livetimes, prescales}/trigger chain How might this look in COOL - an example structure (RE=run/event) RE start RE stop LB value RE start RE stop LB value T start T stop LB value T start T stop LB value RLB start RLB stop event start event stop T start T stop other data … RLB start RLB stop channel= Trigger chain livetimeL1 prescale HLT prescale other data … RLB start RLB stop Channel= Lumi estimate Tag= version Lumi value Uncertaintyother data … /TDAQ/LUMI/LBRUN - LB indexed by run/event /TDAQ/LUMI/LBLB - LB information (start/stop event, time span) indexed by RLB /TDAQ/LUMI/TRIGGERCHAIN - trigger chain info identified by channel, indexed by RLB /TDAQ/LUMI/ESTIMATES - luminosity estimates versioned and indexed by RLB /TDAQ/LUMI/LBTIME - LB indexed by timestamp
3rd November Richard Hawkings Storage of detector status information in COOL Detector status from DCS - many channels, many folders; to be merged: Merge process combines folders, channels, derives set of IOVs for summary.. Involves ‘ANDing’ status over all channels, splitting/merging IOVs - > tool ? Similar activity for data indexed by run-event … have to correlate this somehow Final summary derived first as function of run-event (combining all information): Then map status changes to luminosity block boundaries (using LB tables): Status in an LB is defined as the status of the ‘worst’ event in the LB T start T stop Chan1TRT HV chan1 T start T stop Chan2TRT HV chan2 T start T stop Chan1Temp, gas property T start T stop Chan2Temp, gas property T start T stop Chan=TRTTag=pass1Traffic lightEfficiencyThrustBad-list RE start RE stop Chan=TRTTag=pass1Traffic lightEfficiencyThrustBad-list RLB start RLB stop Chan=TRTTag=pass1Traffic lightEfficiencyThrustBad-list /GLOBAL/STATUS/TISUMM - summary info (one channel per detector/physics), indexed by timestamp /GLOBAL/STATUS/RESUMM - summary info (one channel per detector/physics), indexed by run/evt /GLOBAL/STATUS/LBSUMM - summary info (one channel per detector/physics), indexed by RLB
3rd November Richard Hawkings Storage of trigger information in COOL Source for trigger setup information is the trigger configuration database Complex relational database - complete trigger configuration accessed by key Store trigger configuration used for each run LVL1 prescales may change per LB - stored in /TDAQ/LUMI/TRIGGERCHAIN In principle this is enough, but hard to access trigger config DB ‘everywhere’ Copy basic information needed for analysis/selection to condDB: ‘configured triggers’ Other information needed offline: efficiencies Filled in offline, probably valid for IOVs much longer than a run: RE start RE stop channel=Trigger chainEnabled?other data (chain ctr?) RE start RE stop Trigger config keyother data … RE start RE stop Channel= Trigger chain Tag= version Efficiency other data … /TDAQ/TRIGGEREFI - efficiency info (one channel per chain, versioned), indexed by run (/event) /TDAQ/TRIGGEREFI - efficiency info (one channel per chain), indexed by run (/event) /TDAQ/TRIGGER/CONFIG - efficiency info (one channel per chain) - trigger configuration (Run/event key, really spanning complete runs)
3rd November Richard Hawkings Relations to the TAG database TAG database contains event level ‘summary’ quantities For quickly evaluating selections, producing event collections (lists) for detailed analysis of subsample of AOD, ESD, etc… Need luminosity block and detector status information to make useful queries ‘Give me list of events with 2 electrons, 3 jets, from lumiblocks with good calo and tracking status and where the e25i and 2e15i triggers were active’ Various ways to make this information available in TAGs 1.Put all LB, status and trigger information in every event: make it a TAG attribute Wasteful of space, makes it difficult to update e.g. status information afterwards Hard to answer non-event-oriented questions (‘give me list of LBs satisfying condition’) 2.Store just the (run,LB) number of each event in TAGs, have auxiliary tables(s) containing LB and run-level information Tag database does internal joins to answer a query Need to regularly ‘publish’ new (versioned) status information from COOL to TAGs 3.Have TAG queries get LB/status/trigger info from COOL on each query Technically tricky, would have to go ‘underneath’ COOL API (or don’t use COOL at all) Solution 2 seems to be the best … try it ?
3rd November Richard Hawkings Data flow aspects Walk through the information flow from online to analysis Online data-taking: Luminosity, trigger, and ‘primary’ data quality written in COOL Calibration processing: Detector status information is processed to produce first summary status information Put this in COOL summary folders (tagged ‘pass1’); map to LB boundaries Bulk reconstruction: Process data, produce tags Detector quality information (‘pass1’) could be written to AODs and TAGs (per event) Upload LB/run level information from COOL to TAG DB at same time as TAG event data upload … users can now make ‘quality/LB aware’ queries on TAGs Refining data-quality: Subdetector experts look at pass1 reconstructed data, reflect, refine data quality information, enter it into COOL (‘pass1a’ tag) At some point, intermediate quality information can be ‘published’ to TAG DB Users can do new ‘pass1a’ TAG queries (LBs/events may come or go from selection) This can be done before a new processing of the ESD or AOD is done Estimating luminosity: Lumi experts estimate luminosities, fill in COOL Export this info to TAGs, allow luminosity calculations directly from TAG queries? Re-reconstruction: New data quality info ‘pass2’ in COOL, new AOD, new TAGs
3rd November Richard Hawkings A few comments Not all analyses will start from TAG DB and resulting event collection Maybe just a list of files/datasets - need access to status/LB/trigger chain information in Athena Make Athena IOVSvc match conditions info on RLB as well as run/event & timestamp AOD (and even TAG) can have detector status stored event-by-event Allows vetoing of bad-quality/bad-lumi block events even without Cond DB access With Cond DB access, can make use of updated (e.g. pass1a) status Overriding detector status stored in AOD files But Cond DB access may be slow for sparse events - no caching (need to test) Hybrid data selection scheme could also be supported: Use TAG database to make a ‘data qualiy/trigger chain selection’ and output a list of good luminosity blocks Feed this into Athena jobs running a list of files - veto any event from a LB not in list Maintaining ability to do detector quality selection without LBs implies: Correlation of event numbers with timestamps for each event (event index files?) Storing detector status info per event in TAG DB (difficult to do ‘pass1a’ update)
3rd November Richard Hawkings Comments on other meta-data issues Luminosity TF requires ability to know which LBs are in a file, without the file In case we lose / are unable to access file in our analysis Implies need for file level metadata - on a scale of millions of files… Who does this - DDM? AMI? New database? Should not be conditions DB? Definition of datasets The process by which files make it from online at SFOs to offline in catalogued datasets needs more definition What datasets are made for the RAW data? By run, by stream, by SFO? What metadata will be stored? Datasets defined in AMI and DDM? Files catalogued in DDM? What role would AMI play in selection of real data for an analysis? C.f. TAG DB ? What about ESD and AOD datasets - per run? per stream? What about datasets defined for the RAW/ESD sent to each Tier-1? The RAW/ESD dataset for each run will never exist on a single site?
3rd November Richard Hawkings Possible next steps If this looks like a good direction to go in … some possible steps Set up the suggested structures in COOL Look at filling them, e.g. with data from the streaming tests Explore size and scalability issues In Athena … Set up some access service and data structures to use the data E.g. for status information, stored In condDB and/or AOD, accessible from either with the same interface Make Athena IOVSvc ‘LB aware’ Look at speed issues - e.g. penalties for accessing status information from CondDB in every event in sparse data Work closely with efforts on luminosity / detector status in tag database First discussions on that (in context of streaming tests) have taken place this week