18/03/2013, Villigen Development plan in WP19 MTA SZTAKI Robert Lovas MTA SZTAKI Laboratory of Parallel and Distributed Systems
2 Background ELI is a European Project, involving nearly 40 research and academic institutions from 13 EU Members Countries, forming a pan-European Laser facility that aims to host the most intense lasers world-wide. ELI consists of four main scientific “pillars”. It has been implementing as a distributed research infrastructure, with the first three facilities to be built in the Czech Republic (ELI-Beamline), Hungary (ELI- ALPS) and Romania (ELI-NP). The first three ELI facilities are still at an early or intermediate stage of development and therefore the ELI-ALPS (the particularly addressed pillar by SZTAKI) computing model for providing support for ELI-ALPS data processing and analysis has not been designed in details. Concerning the Hungarian pillar, ELI-ALPS the strategic IT plan is currently under development. On the other hand, ELI-ALPS is also looking forward to learning from the experience of more advanced non-laser RIs having experience and expertise in the setting up and operation of data management systems for user access. The first preliminary IT model together with some requirements was defined in the 2 nd half of 2012, and submitted to the CRISP IT&DM workshop held in January 2013.
3 Brief overview of the planned IT infrastructure Research IT infrastructure consists of four infrastructure areas: laser core, laser lab, laser research and information archive. –Hardware components are to be bought from the vendors. –A part of the software components are to be built. The main applications of ELI-ALPS are the Experiment and research management and Research and analysis software suites for internal and external researchers. The software have to be developed, tailored and integrated to plan, manage and conduct all the experiment requests, research schedules and resources of the facility. The tasks will include new software development, tailoring current software and integrating all the tools to be accessible from one interface for the different research groups of ELI and its collaborators. Basic IT infrastructure (Network, virtualization, data center). Similarly to the General IT infrastructure; it is to be bought.
4
5 Contribution to T3 and T4 tasks Timeframe and main focus SZTAKI’s development plans for the 2 nd half of the project were made taking the needs of two phase implementation of the ELI-ALPS IT architecture into account with special attention on the above described Research and analysis tools belonging to the Research and the External Research infrastructures: –Phase I: until end of 2015: 57% of total IT budget –Phase II: until end of 2017: 43% of total IT budget In the broader sense, SZTAKI activities will deal with the feasible management of computing power together with storage space that can be reallocated between containments as needed by the Experiment, research and resource management systems (using e.g. virtualization and other DCI techniques) in the form of studies, roadmap and recommendation documents. SZTAKI will contribute particularly the T3 - Data Management and the T4 - Evolution of distributed e-Infrastructures tasks in WP19 IT & DM: Distributed data infrastructure. A set of the following steps will be carried out according to the current plans.
6 Step 1 – Further requirement analysis Some estimation concerning the requirements is already available and reported to CRISP IT&DM workshop held in January 2013: –T he research containment (technically a data center) hosts the information center of ALPS. A large „big data” capable data warehouse stores the structured experimental data, and provides the dedicated data marts for different researches. –The target area (TA) laboratory containment hosts the experiment devices (subject material, sensor arrays, cameras, and monitoring system). The lab IT devices are to capture and store the experiment data. Since the sensors produce enormous amount of raw data, a real time filtering and compression computing module is planned to reduce the magnitude of the experiment data to manageable size. To capture the filtered and compressed raw data a high speed, parallel writing capable store device is needed. However, further analysis of ELI-ALPS requirements are needed with special focus on –research and analysis tools, –external research, where SZTAKI will commit efforts to. The prerequisite of this step is the ELI-ALPS strategic IT plan to be ready in H1/2013 by the ELI-ALPS.
7 Step 2 – Research and (data) analysis tools: Science Gateways According to the preliminary plans some planned tasks of ELI-ALPS will include “new software development, tailoring current software and integrating all the tools to be accessible from one interface for the different research groups of ELI.” From several users’ (researchers) point of view the research and analysis tools play crucial role, i.e. single access point to the laser facility, and their software development related recommendations and roadmaps will be addressed in the following way in CRISP. Science Gateway Primer On one hand, SZTAKI will investigate the feasible ways of applying Science Gateways for processing data from ELI-ALPS data warehouse and also from external sources, e.g. other ELI partners (related to external research) relying on the final version of Science Gateway Primer document to be ready at the EGI Community Forum (May 2013).
EGI-InSPIRE RI Science Gateway Primer In May 2012 the Science Gateway Primer Virtual Team launched in the frame of the EGI-Inspire project with the main aim to direct more focus on and help the science gateway developer community. Science gateway: a selection of community-specific tools, applications, or data collections that are integrated together via a web portal, a desktop, or a mobile application, providing access to large number of resources and services e.g. from the European Grid Infrastructure. The main motivation of the Primer document is to help developers, such as the ELI IT team, identify easier the most suitable set of technologies, collect and apply best practices and solutions to have a science domain specific gateway. More information on 8
9 Joint development plans: SCI-BUS The SCI-BUS (SCIentific gateway Based User Support) project creates a generic-purpose gateway technology that provides seamless access to major European DCIs including clusters, supercomputers, grids, desktop grids, academic and commercial clouds. SCI-BUS elaborates an application-specific gateway building technology and a customisation methodology based on which user communities can easily develop their customised gateways.SCI-BUS Associated partnership is to be offered to ELI partners with all of its benefits. The short list of benefits (commitments) from the SCI-BUS project: training, dissemination, joint events, support for designing/developing the science gateway of the associated partner.Associated partnership Additional person-months and efforts will be committed from CRISP by SZTAKI for the discussion and creation of the related long-term roadmaps and recommendations.
10 The SCI-BUS Infrastructure Created 11 gateways in the 1 st project year
11 Joint development plans: ER-FLOW The ER-flow project builds a European Research Community to promote workflow sharing and to investigate interoperability of the scientific data in workflow sharing. It targets major research communities that use (or intend to investigate and use) workflows to run their experiments on a regular basis, such as ELI.ER-flow Currently the project includes four major research communities: Astrophysics, Computation Chemistry, Heliophysics and Life Sciences. The project indent to strongly collaborate with (among others) ELI and CRISP as well in order to identify and involve further major research communities which either already use workflows or which are perspective workflow users. SZTAKI plans to follow the ER-flow supported method with ELI; i.e. the ELI research community will select workflows which can be used as pilot workflows in their particular research area of their particular research community to demonstrate how to develop, use and share workflows. The partners will port these pilot workflows to the simulation platform, etc. CRISP efforts will be committed from SZTAKI in order to help partners from ER- flow, CRISP, and ELI-ALPS (and later ELI) compile a study outlining the above mentioned requirements, protocols and standards and make recommendations how to achieve interoperability of scientific data in the workflow domain.
12 Step 3 – Expanding/integration of ELI data processing layer with DCIs EGI-related plans o The exploitation of EGI results is to be studied in a new Virtual Team that would bring together the main stakeholders from CRISP, ELI, SZTAKI and National Grid Initiatives contacted through EGI. o In EGI-Inspire, a Virtual Team project is relatively short (up to 6 month) project with multiple NGIs involved, and its setup is carried out through NGI International Liaisons and EGI.eu. More information on Virtual Teams is available at EGI WIKI.EGI WIKI o The following VT projects support Research Infrastructures from different aspects similar to ELI: NGI - ELIXIR ESFRI collaboration, Technology study for CTA ESFRI, Towards a Chemistry, Molecular & Materials Science and Technology Virtual Research Community, Science Gateway Primer o The VT would study Cloud based solutions as well based on the expertise collected by EGI Cloud Federation Task force and SZTAKI Cloud.EGI Cloud Federation Task force SZTAKI Cloud o The work together with EGI-Inspire ensures that the proposed approaches and solutions to ELI (in a form of Roadmap) will be in-line with the long term vision of key stakeholders of the European e-Infrastructure. o The VT related efforts of SZTAKI will be carried out as CRISP activity.
13 Step 3 – Expanding/Integration of ELI data processing layer with DCIs Desktop Grid related plans –Another option to facilitate the targeted external research with distributed computation infrastructure is using a low-cost and green desktop grid based DCI, such as a city, a global volunteer or a campus-wide desktop grid. –The main objective of the ongoing EU FP7 ‘International Desktop Grid Federation – Support Project’ (IDGF-SP) can be summarised as follows: Involve and engage in long-term significantly more citizens and new communities (such as ELI) in the volunteer and private (campus-wide or enterprise) DCIs by supporting the rapid creation, efficient operation, and dynamic expansion of this type of DCIs for e-Science. Coordinate and synchronise the dissemination and support activities of major European stakeholders of volunteer and Desktop Grids with focus on the International Desktop Grid Federation. –ELI is considered as a new scientific community and to be co-supported by IDGF; IDGF membership is to be offered to ELI with its benefits. The plan: Join IDGF as an organizational memberorganizational member Create recommendation and Roadmap for ELI partners based on the IDGF Roadmap document (deployment, application porting, media campaign, launch similarly to –The work together with IDGF ensures that the approaches and solutions proposed to ELI (in a form of Roadmap) will be in-line with the long term vision of key stakeholders of the volunteer computing communities. –The Roadmap/recommendation related efforts will be carried out in CRISP.
14 IDGF-SP 07/03/ International Desktop Grid Federation - S upport P roject Fostering interoperability, dissemination, and sustainability of DCIs International Desktop Grid Federation - S upport P roject Fostering interoperability, dissemination, and sustainability of DCIs -Bridged to other DCIs -Marketing: ‘ambassadors’ and citizen-scientists -Sustainability by self- maintained resource pool from volunteers (not FP7/H2020 funds), studies on green aspects & cost-efficiency -Roadmap available -Bridged to other DCIs -Marketing: ‘ambassadors’ and citizen-scientists -Sustainability by self- maintained resource pool from volunteers (not FP7/H2020 funds), studies on green aspects & cost-efficiency -Roadmap available Hundreds of thousands of volunteers Dozens of applications Bridged more than 1 Million CPU hours/month Hundreds of thousands of volunteers Dozens of applications Bridged more than 1 Million CPU hours/month (HPC part)
15 Summary Collaboration/development plans and opportunities: Science Gateways Grid / Cloud technologies Volunteer computing We are open to support other experiments/facilities as well. ( NA2 Joint events / direct support MoUs Associated membership Organisational membership )