Data Transfer Efficiency - leave no byte unchurned Jens Jensen Rutherford Appleton Laboratory GridPP26, U Sussex, March 2011
Background GridPP’s data grid –Distributed Storage Elements –Data movers (FTS, PhEDEx et al) –Catalogues (usu. replica) e-Infrastructure (aka cyberinfrastructure) (Presentation at ISGC)
The Data Grid WLCG is primarily a data grid –Computation can (in principle) be redone Jobs go to where data is –Moving a job is quicker than moving data
Premature Optimisation is the Root of All Evil
Postmature non-optimisation is the root of some evil The role of infrastructure code –Scientist as a programmer –“Bad” code moves up the stack? –“Bad” code improves over time? Doofers stay in prod’n
Efficiencaciousness Goals Service Availability Performance Grows as needed Robust (no SPoF?) People (Effective) support Training Expertise Availability of…
Approaches Philosophy –Get it done – WLCG –Get it done right – EGI? –Do It Perfectly The First Time… Evolutionary (control system) vs revolutionary –Proactive vs reactive
Efficiencaciousness Issues Failures –Sites – BDII, network –Elements – storage –Components – disk servers Timeouts DDoS
Efficiencaciousness Issues Overall effort –Funded, contributed, external Availability of expertise –Single Point of Knowledge Decoherence 2 nd Law of Thermodynamics Learning from incidents
Efficiencaciousness Issues Primary communication –Sites –Users: large VOs, small VOs, single users –PMB Secondary –WLCG –NGS
Efficiencaciousness Issues Sites –There Is Always A Bottleneck Somewhere –Site dependent –Usage dependent Information –Freshness –Accuracy (“spped is substute fo accurcy”)
Efficiencaciousness Issues Usage patterns –C.f. Wahid’s talk yesterday –WAN vs LAN (WN) traffic Technology –In the narrow sense (drives, controllers) –And the wider sense: dist’d filesystems Support: Upstream (EGI), Fabric
Efficiencaciousness Issues Overheads –Complexity of use of stack (see next) –Infrastructure is complex –But Complexity Has To Go Somewhere Time-to-production –Testing, troubleshooting, monitoring, tweaking, tuning
With apologies to the OSI stack
PROGRESS Particular Pain Point Principle
Progressing Forward What is progress How to measure progress
The Good News We’ve come a long way Don’t think there is a skills gap –But some SPoKs
Graeme’s talk “Get the best out of what we can afford to buy” Proactive sites better Standards are good
E[GM]I involvement EMI data roadmap –Support for dCache, DPM, StoRM –Support for standards (NFS4, CDMI) But then –StoRM=INFN, dCache=DESY, DPM=CERN
The Cloud View Supplement resources with on-demand Agile CDMI is superset of SRM –But using ReST+JSON, not SOAP
(Open) Standards Standards promote interoperation and stability Interoperation Multiple (independent) implementations –Both Java and (C or C++)
The Case for Non-HEP Data Benefit from non-HEP data –Outreachy stuff –Benefit to society (eg saving lives) NGI interop (at compute) Others…
SUMMARY
Efficiencaciousness Goals Service Availability Performance Grows as needed Robust (no SPoF?) People (Effective) support Training Expertise Availability of…