David Nathan Brown, LBNL New Analysis Model Why Revise BaBar’s Analysis Model? Conceptual Overview of the New Model Requirements Summary Implementation.

David Nathan Brown, LBNL New Analysis Model Why Revise BaBar’s Analysis Model? Conceptual Overview of the New Model Requirements Summary Implementation Development Conclusions

David Nathan Brown2 Computing Model Review 6 December, 2002 Why Revise the Analysis Model? April 2002 Computing Review recommended revising the A.M.  The Micro-DST does not meet the design requirements  The Micro is bigger and slower than required  Pieces are missing  composite candidates, user data, different track fits, …  The Mini-DST is not integrated into the current A.M.  The Mini is smaller, faster, and more versatile than expected  The Mini is now the direct output of reconstruction Current A.M. makes inefficient use of resources  eg many AWG ntuples are simply copies of the Micro  Costly in terms of disk space, manpower, cpu, risk  Users have chosen the optimal way of using the A.M. for physics  Improvements must come by fixing the underlying problems in the A.M. Experts believe current A.M. won’t scale to 5X the current data  Not enough manpower or financial resources Current A.M. has been very successful  Many physics results have been produced using it  Radical changes are not required, are not good for BaBar

David Nathan Brown3 Computing Model Review 6 December, 2002 Conceptual Overview Support analysis on both Mini and (New) Micro-DSTs  Mini is too large (~4X Micro) for every-day analysis  Mini is an essential link in the analysis chain  Useful for detailed analysis of small samples, event scanning, systematic error estimates, updating to recent calibrations, …  Source for event skimming (and perhaps Micro-DST production) Improve (replace) the Micro-DST format  Support customization  Composite candidate and user data persistence, content reduction, …  Support Framework and ROOT/CINT access from the same files  Eliminate the need for ntuple copies of the Micro  Maintain backwards-compatibility with existing analysis code Expand the role of Skims  Support deep-copy output  Improve disk-access efficiency and portability  Support customization of skim output  New skims should directly replace ‘production’ ntuples  Increase the frequency of production skimming  Reduce the time to correct mistakes and add features or streams

David Nathan Brown4 Computing Model Review 6 December, 2002 Requirements: Mini The Mini should be accessible from Tier-A sites  Both BaBar data and MC samples should be accessible  Caveats  Mini data are not required to be disk-resident  Mini data are not required to be resident at all Tier-A sites  Small sample access should be ‘immediate’ (< 1 hour)  Supports event scanning  Large or very sparse sample access might require waiting  Worst case: looping over all files should take at most 2 weeks Creating and distributing Mini subsamples should be easy The Mini should support deep copy customization  Composite Candidate persistence  Removal of unused content (low-level hit data)

David Nathan Brown5 Computing Model Review 6 December, 2002 Requirements: (New) Micro The Micro should accessible at Tier-A sites The Micro should support dual access  Analysis framework and ROOT/CINT Framework access to the Micro should be fast  ~1KHz is the target  Requires improving in the analysis interface (Beta) The Micro should be compatible with the Mini  Can Produce Micro output when reading the Mini  Can Easily locate Mini data for a given Micro event  Cleanly associate data objects between Micro and Mini The Micro should support content customization  Composite candidate persistence  User-added data persistence  Removal of unused content (detector information)

David Nathan Brown6 Computing Model Review 6 December, 2002 Requirements: Skimming Skims should be accessible at Tier-A sites Skims should support deep copy and pointer output  Deep copy is preferred  Resource limits may force very large skims to use pointers Skims may output either Micro or Mini format  Mini output may be appropriate for very small samples  Implicitly requires Skim production be run on the Mini Skim production should be run regularly  Target having a production run every 3 months  High frequency allows quick bug-fixes and improvements  Not every skims should be remade every 3 months!  Changing data samples every 3 months would disrupt analysis  In steady-state ~1/4 of the skims are remade each production run

David Nathan Brown7 Computing Model Review 6 December, 2002 Implementation: Mini Core Mini functionality exists  Complete mini is being produced in 12-series processing  Full data set will be available in Mini format early in 2003  Automatic staging access is working at SLAC and IN2P3  Disk allotments will need adjustment as use patterns evolve  Several analyses have already been ported to BetaMini  Reliability and performance problems are being addressed Content reduction is now supported  Reduce size of deep copies by removing unused objects  Save a factor of ~3 in file size by removing Dirc and Tracking hits  Implementation complete, will be released in 12.4.0 Composite BtaCandidate persistence  Implementation mostly done, should be released in January 2003 Port of the Mini to ROOTIO  Not intrinsically difficult, given the modular design of the Mini  Has been started, see migration plan for details

David Nathan Brown8 Computing Model Review 6 December, 2002 Implementation: New Micro Dual-access proof of principle exists  Modified version of Kanga (Roo) by Eric Charles  Can be read by existing Kanga executables  Provides a CINT interface to cand lists and data  Eg: plot the Pt of tracks on the GoodTracksLoose list  root [6] t->Draw("BtaMapV.p4().Pt()","BtaMapV.hasBit(7)");  File size is ~20% smaller than Kanga  More logical definition of columns allows better compression Final design development is starting  Workshop is scheduled for January 13-14 at RAL  A ‘Reduced Mini’ is a likely choice  Better packing than the micro  Structure well suited to AM requirements  Mini port to ROOTIO is being done using Eric’s Roo base classes

David Nathan Brown9 Computing Model Review 6 December, 2002 Development Tasks RootIO file server for New Micro data at Tier-A sites  Experts believe NFS doesn’t scale to our ultimate data size  Existing problems will grow worse  Integration with HPSS is needed at SLAC and IN2P3  ‘Objectivity’ AMS is a working model of a scalable server  Roughly 1 year’s work to develop, deploy, and tune  Requires highly skilled computer scientists  Not required to deploy the A.M.  NFS will work for now Analysis Interface (Beta) re-implementation  The existing code is largely un-optimized  The result of organic growth over many years  A rewrite should greatly improve analysis efficiency  Estimate basic list access rate could be improved by a factor of 5-10  This will be a major effort and is just now getting organized  Groundwork has been laid by BetaMini development  Not required to deploy the A.M.  The existing interface works

David Nathan Brown10 Computing Model Review 6 December, 2002 Development Tasks (cont.) Coherent staging (data train) for Mini data  Synchronize jobs to read prestaged data  Requires integrating batch system, bookkeeping, staging,...  Probably requires 2-years to complete  Not required to deploy the A.M.  Existing staging should work for the next year Tag reduction  Remove unused parts of the Tag  Jim Smith has already done the necessary research  Remove candidate-specific data from the tag  This information should be accessed from the (new) Micro  Not required to deploy the A.M.  Tag storage and access is not resource intensive Skim production  Requires AWG efforts to exploit new functionality  see migration plan for details

David Nathan Brown11 Computing Model Review 6 December, 2002 Conclusions We propose a new Analysis Model for BaBar which:  Integrates the Mini and the Micro  Addresses the computing problems of the current A.M.  Is (largely) backwards-compatible with existing code  Increases the functionality and (hopefuly) the utility of centrally-produced analysis data  Expands the options available to users doing analysis The implementation looks feasible  The major pieces exist today (at least as prototypes)  The required development work looks tractable Integration, Resource implications and Migration will be discussed in subsequent presentations

David Nathan Brown, LBNL New Analysis Model Why Revise BaBar’s Analysis Model? Conceptual Overview of the New Model Requirements Summary Implementation.

Similar presentations

Presentation on theme: "David Nathan Brown, LBNL New Analysis Model Why Revise BaBar’s Analysis Model? Conceptual Overview of the New Model Requirements Summary Implementation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

David Nathan Brown, LBNL New Analysis Model Why Revise BaBar’s Analysis Model? Conceptual Overview of the New Model Requirements Summary Implementation.

Similar presentations

Presentation on theme: "David Nathan Brown, LBNL New Analysis Model Why Revise BaBar’s Analysis Model? Conceptual Overview of the New Model Requirements Summary Implementation."— Presentation transcript:

Similar presentations

About project

Feedback