Presentation is loading. Please wait.

Presentation is loading. Please wait.

PanDA in a Federated Environment

Similar presentations


Presentation on theme: "PanDA in a Federated Environment"— Presentation transcript:

1 PanDA in a Federated Environment
Kaushik De Univ. of Texas at Arlington CC-IN2P3, Lyon September 13, 2012

2 Outline Overview Concrete plans Speculative ideas
Federated data access/stageout for fault tolerance Federated data transfer for managed production Federated data access for distributed analysis Speculative ideas Data caching Event caching Cache aware brokerage Kaushik De

3 PanDA FAX Status Last year, I talked about local federations
Direct access through local redirectors are in use by PanDA at SLAC and SouthWest Tier 2 – working well for many years This year, the emphasis has been on global federations Global redirectors have been set up and tested in ATLAS Changes were implemented in the PanDA pilot to enable these global redirectors in the default workflow But progress has been somewhat slow PanDA under continuous use in ATLAS Development activities not related to LHC data have been minimal Kaushik De

4 FAX for Fault Tolerance
Phase I goal If input file cannot be transferred/accessed from local SE, PanDA pilot currently fails the job after a few retries We plan to use Federated storage for these (rare) cases Start with file staging/transfers using FAX Implemented in recent release of pilot, works fine at two test sites Next step – wider scale testing at production/DA sites Phase 2 Once file transfers work well, try FAX Direct Access Phase 3 Try FAX for transfer of output files, if default destination fails Next few slides from Tadashi/Paul Kaushik De

5 Kaushik De

6 FAX for Managed Production
Managed production has well defined workflow PanDA schedules all input/output file transfers through DQ2 DQ2 provides dataset level callback when transfers are completed FAX can provide alternate transport mechanism Transfers handled by FAX Dataset level callback provided by FAX Dataset discovery/registration handled by DQ2 File level callback Recent development – use activeMQ for file level callbacks On best effort basis for scalability – dataset callbacks still used FAX can use same mechanism Work in progress Kaushik De

7 FAX for Distributed Analysis
Most challenging and most rewarding Currently, DA jobs are brokered to sites which have input datasets This may limit and slow the execution of DA jobs Use FAX to relax constraint on locality of data Use cost metric generated with Hammercloud tests Provides ‘typical cost’ of data transfer between two sites Brokerage will use ‘nearby’ sites Calculate weight based on usual brokerage criteria (availability of CPU…) plus transfer cost Jobs will be dispatched to site with best weight – not necessarily the site with local data or available CPU’s Cost metric already available (see Ilija/Rob talks) Kaushik De

8 Implementation Schedule
FAX for fault tolerance Phase 1 (FAX transfers) – done, test for few months Phase 2 (FAX Direct Access) – before year end Phase 3 (FAX output) – before year end FAX for central production Within 6 months Maybe sooner – activeMQ is already under testing FAX in brokerage Cost metric already available Few months to setup and test in PanDA database Next year – enable a few sites for high throughput tests Kaushik De

9 Data Caching Local data caching for WAN access
Maybe not for PanDA – can federation do it transparently? Various alternatives were discussed in WAN meeting at CERN PanDA could keep site level cache Not guaranteed file catalog – best effort list Use FAX to fetch again if file if no longer available Kaushik De

10 Event Cache Long term PanDA goal – event service
Granularity of data processing in PanDA – datasets and files But events are really the atomic unit for HEP PanDA event service will change current processing model Challenges of event service Scalability – keeping track of 100’s of billions of events Fault tolerance – processing all events without data loss Chaining of data processing Efficient use of WAN vs storage Kaushik De

11 Kaushik De

12 Kaushik De

13 Conclusion Wide array of FAX plans for PanDA
Schedule depends on availability of effort during LHC run Do not foresee technical challenges for short/medium term Long term – many open ideas, some quite challenging Kaushik De


Download ppt "PanDA in a Federated Environment"

Similar presentations


Ads by Google