PanDA in a Federated Environment Kaushik De Univ. of Texas at Arlington CC-IN2P3, Lyon September 13, 2012
Outline Overview Concrete plans Speculative ideas Federated data access/stageout for fault tolerance Federated data transfer for managed production Federated data access for distributed analysis Speculative ideas Data caching Event caching Cache aware brokerage Kaushik De
PanDA FAX Status Last year, I talked about local federations Direct access through local redirectors are in use by PanDA at SLAC and SouthWest Tier 2 – working well for many years This year, the emphasis has been on global federations Global redirectors have been set up and tested in ATLAS Changes were implemented in the PanDA pilot to enable these global redirectors in the default workflow But progress has been somewhat slow PanDA under continuous use in ATLAS Development activities not related to LHC data have been minimal Kaushik De
FAX for Fault Tolerance Phase I goal If input file cannot be transferred/accessed from local SE, PanDA pilot currently fails the job after a few retries We plan to use Federated storage for these (rare) cases Start with file staging/transfers using FAX Implemented in recent release of pilot, works fine at two test sites Next step – wider scale testing at production/DA sites Phase 2 Once file transfers work well, try FAX Direct Access Phase 3 Try FAX for transfer of output files, if default destination fails Next few slides from Tadashi/Paul Kaushik De
Kaushik De
FAX for Managed Production Managed production has well defined workflow PanDA schedules all input/output file transfers through DQ2 DQ2 provides dataset level callback when transfers are completed FAX can provide alternate transport mechanism Transfers handled by FAX Dataset level callback provided by FAX Dataset discovery/registration handled by DQ2 File level callback Recent development – use activeMQ for file level callbacks On best effort basis for scalability – dataset callbacks still used FAX can use same mechanism Work in progress Kaushik De
FAX for Distributed Analysis Most challenging and most rewarding Currently, DA jobs are brokered to sites which have input datasets This may limit and slow the execution of DA jobs Use FAX to relax constraint on locality of data Use cost metric generated with Hammercloud tests Provides ‘typical cost’ of data transfer between two sites Brokerage will use ‘nearby’ sites Calculate weight based on usual brokerage criteria (availability of CPU…) plus transfer cost Jobs will be dispatched to site with best weight – not necessarily the site with local data or available CPU’s Cost metric already available (see Ilija/Rob talks) Kaushik De
Implementation Schedule FAX for fault tolerance Phase 1 (FAX transfers) – done, test for few months Phase 2 (FAX Direct Access) – before year end Phase 3 (FAX output) – before year end FAX for central production Within 6 months Maybe sooner – activeMQ is already under testing FAX in brokerage Cost metric already available Few months to setup and test in PanDA database Next year – enable a few sites for high throughput tests Kaushik De
Data Caching Local data caching for WAN access Maybe not for PanDA – can federation do it transparently? Various alternatives were discussed in WAN meeting at CERN PanDA could keep site level cache Not guaranteed file catalog – best effort list Use FAX to fetch again if file if no longer available Kaushik De
Event Cache Long term PanDA goal – event service Granularity of data processing in PanDA – datasets and files But events are really the atomic unit for HEP PanDA event service will change current processing model Challenges of event service Scalability – keeping track of 100’s of billions of events Fault tolerance – processing all events without data loss Chaining of data processing Efficient use of WAN vs storage Kaushik De
Kaushik De
Kaushik De
Conclusion Wide array of FAX plans for PanDA Schedule depends on availability of effort during LHC run Do not foresee technical challenges for short/medium term Long term – many open ideas, some quite challenging Kaushik De