Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut Status Report PSI PaNDaaS2 meeting Grenoble 12 – 13 December 2016.

Similar presentations


Presentation on theme: "Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut Status Report PSI PaNDaaS2 meeting Grenoble 12 – 13 December 2016."— Presentation transcript:

1 Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut
Status Report PSI PaNDaaS2 meeting Grenoble 12 – 13 December 2016

2 Current projects at PSI
Data Analysis Service Data Policy Remote access Metadata catalogue Petabyte archive Remote data transfer PSI, PSI, 10. April 2019 10. April 2019

3 Covering larger parts of the life cycle

4 Project Overview: Data Analysis Service
SUK Project Project Manager: Dr. Stephan Egli, Dr. Derek Feichtinger, Paul Scherrer Institut Partner: ETHZ Project Duration: Financing Support (50% matching funds): CHF 1'618'000 Team members: 16 (including 3 new positions financed by project) Workpackages WP1: Common Tools and Services WP2: Data Analysis Environments for major use cases WP3: Identity Management, DUO, Authentication and Authorization WP4: Integration and development of scientific analysis codes WP5: Procurement, installation, operation of analysis cluster infrastructure WP6: Infrastructure sharing with other institutions WP7: Project Management

5 DaaS Project Status Main purpose: provide an integrated solution for all SLS Users to do offline data analysis for data taken at SLS (and later SwissFel) Cluster of moderate size (~900 Cores, 2 PB Storage) Hired 3 persons dedicated to this project . Currently about 50% of the foreseen hardware installed and in operation Now in test phase with invited external users and internal users Adjusting the system and software according to concrete use cases of these users. So far very good feedback Planning for Storage upgrade up to a total of about 3 PB until mid 2017 Option for extending the cluster also with “dedicated” resources (for paying customers), but within the same infrastructure and using centrally provided hardware choices

6 Data Policy Status Data Policy based on PaNdata framework
Policy has been adopted by Directorate in October 2016 Policy applies to not only to the large research facilities at PSI, but to all research activities Embargo period 3 years, with easy extension to 5 years Implementation will be a long term effort, stepwise implementation per facility and beamline.

7 Remote Access Usecases: online and offline analysis, remote measurements, shift operation,sharing of sessions for support tasks , Sharing of sessions for collaboration Support for 3D Hardware Acceleration Access to the beamlines and to the DaaS Cluster through a common gateway Architecture based on separation of “server” and “node” processes of the Nomachine Software Version 5 Added graphical management tool to define (time based) access to beamline resources and offline compute cluster, with role based delegated management to resource responsible

8 Data Catalogue Decision for approach based on NoSQL document databases (MongoDB), taking advantage of recent developments for middleware (Loopback) and component based graphical user interfaces (Angular2) Need to cover extended set of use cases and long term evolution at PSI and ESS and therefore flexibility of a solution is mission critical Currently preparing the production environment for data ingestion and recruiting developer position(3 years ). First 3 beamlines should be connected within DaaS project timeframe within Spring 2017 This is also a decision for continued collaboration with the ICAT community ! E.g. working on a common API to aim for interoperability of current and future products Evaluate potential to develop software (components) which can be used in a ”product” independent fashion Open to further suggestions…

9 Interactive and Batch data Analysis
Support for doing interactive (e.g. Matlab) data analysis on the cluster, nodes can be reserved for interactive work. Standard Batch processing based on Slurm

10 Petabyte Archive PSI must prepare for the archiving of high amounts of data being expected for SLS and SwissFEL over the next decades. Strategic collaboration of PSI with the Swiss National Supercomputing Center (CSCS) in Lugano for building a Petabyte Tape Archive solution at CSCS Project initiated by a PoC within the DaaS project Volume increase driven by detector and instrumentation advances. Planning to leverage IBM Spectrum Scale (GPFS) AFM technology for the asynchronous data transfers between the sites. Dataflow orchestration and packaging tools are being evaluated, elected candidate is Arema from IBM Definition of interfaces from and to data catalogue ongoing

11 Remote Data Transfer Support for rsync/scp and gridftp (Globus Online)
Also evaluated Aspera solution from IBM. Could be added, but only if (paying) customer would request for it The integration with the longterm archive will create additional requirements

12 Wir schaffen Wissen – heute für morgen
My thanks go to Stephan Egli Derek Feichtinger Gerd Mann


Download ppt "Mirjam van Daalen, (Stephan Egli, Derek Feichtinger) :: Paul Scherrer Institut Status Report PSI PaNDaaS2 meeting Grenoble 12 – 13 December 2016."

Similar presentations


Ads by Google