Tim Christiansen (CERN), Claudio Campagnari (UCSB), and Benedikt Hegner (CERN) for the Top-Physics Group AOD/PAT-tuples: Top-PAG Plans and Needs for the Startup Phase CMS AOD/PAT-tuple Workshop CERN, 4 September 2009
Page 2T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Some general words … Top group (as prob. other PA/OGs) want a ~100kB data format (tier) useful for 90% of the analysis phase space We also strongly prefer this to be the common denominator for ~all analyses, not just for top AOD (= RECO with dropped info) has been designed precisely with this in mind, but of course there are other options A PAT-tuple to fulfill all the above is likely to be as large as the AOD and contains frequently (at the beginning) changing info (calibrations, particle-ID, …) Most of the analyses ran -- so far -- from RECO, but all top analyses that we are aware of can run from AOD For the cases in the past where missing info was identified, we made sure to include this in the next version of AOD
Page 3T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Our Proposal Our preferred choice would be: Take AOD as the 90 % common data format for PAG analyses This means, we, CMS, have to maintain it centrally. If things are missing in version AOD-x, they need to be included in AOD-x+1. If necessary, groups can “privately” produce their AOD-x+1 samples and use them until production is ready to do this for all in the next iteration … CMS analysis should be done with PAT This can be PAT-on-the-fly or via intermediate PAT-tuples Request a useful maintained (and somewhat certified!) PAT configuration that analysts can use for their analysis. Common modifications of the analysts to this default PAT configuration will likely tailor it to the needs, i.e. dropping or switching off things that are not needed by the analysis (to safe time and or space). It is possible also to imagine sub-branches of this default PAT configuration per PAG (maintained by the group) Certification under the roof of PVT? Or PAT? Clearly this needs help from all PAGs & POGs.
Page 4T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Out Proposal, continued … This proposal does not exclude the production of PAT- tuples, in fact this is encouraged, but rather at sub- group level to start with (e.g. common to similar signatures/channels): PAT-tuples can then be organized in small’ish groups, for which it will also be easier and faster to agree, converge and react. This would then indeed give the chance of a real interactively- usable “tuple” of O(10kB)/evt.
Page 5T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Top Strategy for first data Disclaimer: Note that this is preliminary and still under discussion in the group, and it is not quite related to AOD/PAT-tuple discussion. We would like to postpone the need for skimming (on reco info) to as late as possible by an optimal use of Secondary Dataset (SD) definitions SDs are -- as only trigger info is filtered on -- immutable against re-reconstructions SDs can be commonly used by >1 PAG, and thus a real candidate for central production and efficient use of resources No extra skimming means also no extra layer of production (be it pro or private) Currently, We are working on proposals for SD definitions for a high-p T mu+X SD (almost done) and a high-p T e+X SD (more complicated, coming soon) Possibly add multi-jet SD later (mainly for monitoring)
Page 6T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Example: High-p T Muon SD The main difference to other proposals we have seen so far is the efficient use of trigger info available in the data: In addition to filtering on trigger bits, we cut on HLT-object information, i.e. further reducing the rate in a flexible way without the need to introduce a whole new trigger bit Caveat: only the p4-vector of trigger objects are available in RECO/AOD OK for muon, but not that optimal for electron + X A preliminary draft for a high-pT muon SD is currently being circulated: Tailored for high-p T +X analyses high-p T means 20 GeV this is the lowest threshold for top analyses and most of EWK & SUSY (except for possibly low-mass DY and some multi-lepton SUSY) May also includes di-muon triggers with somewhat lower thresholds on trigger-object p T
Page 7T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Backup
Page 8T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Example: High-p T Muon SD II Conditions for (similarly for ): Evt must be in the muon PD Any of OpenHLT_Mu3 OpenHLT_Mu5 OpenHLT_Mu9 OpenHLT_IsoMu3 OR any of OpenHLT_DoubleMu0 OpenHLT_DoubleMu3 Why 18 GeV? This is expected to be nearly 100% efficient for 20 GeV selection used in the analysis (should be close to 100% efficient). AND the p T of the corresponding HLT object (L3-muon) fulfills p T >18 GeV AND (one of the 2 L3 muons has p T >18 GeV OR both fulfill p T >10 GeV) Preliminary!
Page 9T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009 Some Numbers: In 100/pb, a SD with a cross-section of 110 nb would result in 11 M events. The event size in RECO is about 440 kB. This results in a RECO SD size of about 4.5 TB. In the case of AOD (~130 kB/event), the size of the SD is only about 1.3 TB. This is something that can be easily stored at a Tier 2 and does not necessitate further skimming. This statement holds even including a safety factor of ~ a few on the openHLT cross-section for high-p T muons. Note: The SD is complete for our mu+X analyses, in the sense that it contains everything the analysts should need, i.e. both the signal as well as the control samples for BG determination. (Note: original mu-PD rate is ~ 25 Hz) (F. Golf, J. Ribnik)