Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some ideas on possible INDIGO participation to the EINFRA call

Similar presentations


Presentation on theme: "Some ideas on possible INDIGO participation to the EINFRA call"— Presentation transcript:

1 Some ideas on possible INDIGO participation to the EINFRA-21-2017 call
DS et al. 16/12/2016 RIA

2 Ideas for the EINFRA-21 Call
2 areas: Secure and agile data and distributed computing e- infrastructures and Access and preservation platforms for scientific information. I will focus here only on the first one. Funds operation and integration of services at TRL8+. No JRA-type development. Focus on a production-ready EOSC (builds on the INFRADEV-4 call – EOSCPilot) The EC expects one or at most two proposals (the single proposal option seems clearly preferred) It should integrate the results of INDIGO-DataCloud, EGI-Engage, EUDAT2020 Budget per proposal: 30M€ (one proposal) or 2x15M€ (two proposals) 30/11/2016 Ideas for the EINFRA-21 Call

3 Ideas for the EINFRA-21 Call
2 areas: Support to Public Procurement of innovative HPC systems and Research and Innovation Actions for e-Infrastructure prototypes. The latter has the following two points (total budget for both 20M€): Universal discoverability of data objects and provenance and Computing e-infrastructure with extreme large dataset. I will focus here only on the second one. Funds development of service prototypes at TRL6+. Bring to TRL8 and include in the EOSC in Service prototypes should follow common interfaces to access and analyse underlying data collected/stored in different platforms, formats, locations and e- infrastructures […] tested against requirements of very large or highly heterogeneous research data sets. Budget per proposal: 2.5-3M€ (per the call text). The EC is expecting to receive several proposals with innovative services that will show complementarity with the production-level services of an EINFRA-12 proposal. It is therefore not advisable to exceed the suggested budget. 30/11/2016 Ideas for the EINFRA-21 Call

4 Ideas for the EINFRA-21 Call
INDIGO Developments Given the positive results and the visibility of INDIGO-DataCloud, we’d like to retain the INDIGO brand (in some form) in EINFRA-21, and propose [TRL6+] services that may be seen as an evolution of the current INDIGO offerings – although there is no preclusion for topics coming from elsewhere. This is not trivial, for several reasons. For example: “brand ownership” and budget constraints (both could be mitigated if we as INDIGO partners all play well in EINFRA-12 and EINFRA-21) A 3M€ proposal should probably have no more than 6-7 partners (INDIGO has 26) – depending on their involvement of course. There are many possible technical ideas – here I’ll describe some of them. 30/11/2016 Ideas for the EINFRA-21 Call

5 A possible general structure
Given the budget constraints in EINFRA-21 (3M€ per proposal), it may be worthwhile to think about a strategy centered around 3 pillars: Most if not all INDIGO partners should participate to EINFRA-12, where we are (and will keep on) trying to get official recognition for INDIGO as one of the three key players of a joint INDIGO-EGI-EUDAT proposal. Operations and integration efforts or TRL8+ INDIGO services, as well as concrete use cases brought by WP2 communities, should go here. We should then prepare two complementary EINFRA-21 proposals, and define them clearly articulating their main topics: An EINFRA-21 proposal centered on “data” – let’s call it INDIGO2-Data here. An EINFRA-21 proposal centered on “compute” – let’s call it INDIGO2-Compute here. The participation of the current INDIGO [JRA] partners to the two EINFRA- 21 proposals should be defined based on concrete contributions. 30/11/2016 Ideas for the EINFRA-21 Call

6 Summary of possible technical topics for the two proposals
INDIGO2-Data Intelligent & Automated Dataset Distribution Data ingestion preprocessing Data access patterns / management Smart caching Filesystem and DB integration Advanced metadata management INDIGO2-Compute Hybrid Cloud Support Enhanced networking Interaction with bare-metal resources Evolution of user interfaces Big data analytics Multiple PaaS and Monitoring as a Service 30/11/2016 Ideas for the EINFRA-21 Call

7 Ideas for the EINFRA-21 Call
INDIGO2-Data High-level topics 30/11/2016 Ideas for the EINFRA-21 Call

8 INDIGO2-Data: Key points
General objective of the proposal: To provide an high-level distributed platform capable of dealing with very large scientific datasets. The platform will support features addressing both common and advanced use cases in an infrastructure-independent way, so that scientific communities can easily adopt and integrate a complete set of services usually implemented only by very large collaborations. On the other hand, future and extremely demanding data-centered collaborations will be able to integrate and federate very large and heterogeneous data sources, global infrastructures and effectively manage data and related analysis among those sources. Relevance for EINFRA-21: Added value: Provide key missing services for data-centric scientific communities, i.e. a platform automatizing management and analysis of extremely large datasets. Specific impact during and after the project: Work package structure: WP1 -> Management WP2 -> User requirements gathering; definition of test applications; integration and tests on the applications WP3 -> SW management/exploitation and pilot testbed WP4 -> JRA Activities: Intelligent & Automated Dataset Distribution; Data access patterns / data management; APIs for external services/platforms WP5 -> JRA Activities: Data ingestion preprocessing; Smart caching; Filesystem + DB integration; Flexible Metadata management 30/11/2016 Ideas for the EINFRA-21 Call

9 Intelligent & Automated Dataset Distribution
Evolving from the INDIGO data solutions, a solution based on FTS + the INDIGO Orchestrator Service + Dynafed aimed to automate dataset distribution. Right now, we take a list of datasets and by hand we ask FTS to transfer them somewhere. We could automate this so that it is done via FTS over a Dynafed endpoint using some TOSCA template for automation/task description. ”Simple” use cases: Keep a certain dataset distribution consistent with some policies, e.g. there must always be 2 replicas (e.g. when some sites are not available for some time) Or make sure that given some (real time or quasi-real time) data producing centers there is always a mirror. 30/11/2016 Ideas for the EINFRA-21 Call

10 Data ingestion preprocessing
Evolving the INDIGO data solutions, we could add the capability of managing the import of big amounts of data through some pre- processing. During the data ingestion process, or maybe later in a scheduled way, we would like to give end users the capability to run predefined preprocessing tasks. We should provide the feature at the infrastructure level, in a form that is pluggable with virtually any user-based application/algorithm. The user community should take care of the application that will be executed. Examples: experiment-independent quality check before storing data, data skimming, metadata management, indexing. 30/11/2016 Ideas for the EINFRA-21 Call

11 Data access patterns / data management
Today, we lack the management layer of data movement / data federation, beyond the orchestration part: e.g. something that performs some analysis on actual data usage. So, for example, depending on actual data usage / access patterns, we may want to automatically move data to long-term storage (“glacier- like”), exploiting the current INDIGO Storage QoS solutions. This will happen at the infrastructure level. It could certainly be intra- site, but more ambitiously it could also be an inter-sites service. The feature set here might also explore prediction mechanisms for the exploitation of data popularity. 30/11/2016 Ideas for the EINFRA-21 Call

12 Ideas for the EINFRA-21 Call
Smart caching Implement / support smart caching mechanisms based on flexible algorithms with the dynamic extension of computing centers to “remote sites”. Part of “data access patterns / data management” and ”hybrid clouds”? Connection to the HNSciCloud project Explore collaborations with NRENs, e.g. alongside a CDN-like paradigm 30/11/2016 Ideas for the EINFRA-21 Call

13 Filesystem + DB integration
This is a use case that must be validated, but we are currently lacking ways to integrate storage solutions not only for Posix filesystems, but also for DB access. For example, seeing a mysql DB as if it were a file system. This is not to be confused with presenting object storage chunks (e.g. S3) as if they were a Posix filesystem (which is something INDIGO is already working on, e.g. with Onedata). Integration with existing and evolving persistent identifier technologies might also be explored. This topic fits in the call, since the text explicitly mentions “heterogeneous datasets” and the need to “access and analyse underlying data collected/stored in different platforms, formats”, but we should clearly understand whether there are communities that require this integration. 30/11/2016 Ideas for the EINFRA-21 Call

14 Flexible Metadata management
To be evaluated Provide the capability of searching for data exploiting a metadata service that can distribute queries exploiting big data analytics tools to let users’ queries scale also on very large datasets. This in first approximation could lead to an easier way of finding the data that should be processed/analyzed The typical workflow will be: first search for data fulfilling the metadata query and than ask the algorithms to be executed on those data. The definition of the metadata should be based on both RDF and JSON formats. Be careful that this topic does not overlap with the other EINFRA-21 area titled “Universal discoverability of data objects and provenance”. 30/11/2016 Ideas for the EINFRA-21 Call

15 APIs for external services/platforms
All the services implemented by this project will provide APIs to programmatically exploit the provided features. This will make it possible also to other computing platforms to easily exploit the features provided by INDIGO2-Data and integrate them in more complex workflows. The APIs will be agnostic both on what and where data should be moved/cached/etc… 30/11/2016 Ideas for the EINFRA-21 Call

16 INDIGO2-Data: Tentative Consortium Partners
ID Partner Country Community Role/main focus 1 INFN (Lead) IT Orchestration -> implement Intelligent & Automated Dataset Distribution, Data ingestion preprocessing, Data access patterns / data management 2 DESY DE TBD 3 CERN CH WLCG 4 Cyfronet PL Multi-zone data distribution, DB integration, Smart Caching, Intelligent & Automated Dataset Distribution 5 EBI? UK Elixir 6 CSIC ES Lifewatch 7 CNRS FR 30/11/2016 Ideas for the EINFRA-21 Call

17 Ideas for the EINFRA-21 Call
INDIGO2-Compute High-level topics 30/11/2016 Ideas for the EINFRA-21 Call

18 INDIGO2-Compute: Key points
General objective of the proposal: Relevance for EINFRA-21: Added value: Specific impact during and after the project: Work package structure: 30/11/2016 Ideas for the EINFRA-21 Call

19 Ideas for the EINFRA-21 Call
Hybrid Cloud Support We still do not fully automate or support cloud bursting to hybrid cloud infrastructures, and this is likely to be a potential common use case. We need to investigate/develop some high-level (i.e. not SDN-level) networking technologies to connect seamlessly to hybrid clouds. We need some fine-grained resource description and orchestration with TOSCA + Orchestrator. For example, define and support TOSCA-related blocks with different requirements. BTW there is probably not much sense in cloud bursting for HPC apps, this is likely to be more useful for HTC apps. Currently, the INDIGO Orchestrator goes to a single IaaS. We could expand this to address multiple resource providers (link e.g. to HNSciCloud). 30/11/2016 Ideas for the EINFRA-21 Call

20 Ideas for the EINFRA-21 Call
Enhanced Networking For the purpose of enhancing the capabilities of geographical network connections, we could provide: TOSCA-based network orchestration at the level of a single IaaS. Providing capabilities to implement complex networking set-up via TOSCA templates. For example, choosing network addresses, the amount of private/public networks and eventually the routing among them. TOSCA-based network orchestration at the level of multiple IaaS. Building a software-based VPN that enables seamless connection across different IaaS. TOSCA-based network orchestration for connecting external clouds with on- premise resources. Building a software-based VPN that enables seamless connection across external IaaS and local data centers. 30/11/2016 Ideas for the EINFRA-21 Call

21 Better interaction with bare-metal resources (esp. with HPC)
Currently, we have difficulties in interacting with bare metal resources. We could investigate/develop how Mesos or Kubernetes can be used or expanded to orchestrate dynamic clusters over HPC resources (which is something we don’t support well today). Today, the only interface we support for apps to create virtual clusters is Chronos over Mesos. But we would like to be able e.g. to support jobs requiring e.g. 200 cores and InfiniBand and GPUs over some infrastructures, that do not necessarily have OpenStack installed (as is the case e.g. for practically all HPC installations). In other words, we would like to shift management of HPC resources to the PaaS level. Today, the most we can do is to try to manage HPC resources at the IaaS level (if OpenStack is installed!). I.e. today we only support VMs, where we then need to install MPI etc. manually. 30/11/2016 Ideas for the EINFRA-21 Call

22 Better interaction with dedicated hardware on bare metal resources
One specific area of the call is focused on dedicated hardware resources that could be exploited to support big data analytics. In INDIGO we already have some “primitives” that could be used to deploy a Spark-like solution We could address more specifically this in EINFRA-21: Implementing scalability at the geographical level in order to dynamically exploit data where they are Instantiating the services, on-demand and in an automated way, upon requests from the users Scheduling data analysis tasks in a smart way on the available hardware resources 30/11/2016 Ideas for the EINFRA-21 Call

23 User interfaces and high level APIs
Providing the capability to build simple of complex TOSCA Templates easily via web GUI (e.g. for a workflow systems). Today, we only generate TOSCA templates manually. Providing the capability to deploy an application on a Cloud starting directly from the source code. For example, integrate standard IDE (e.g. Eclipse) to the PaaS platform in order to deploy an application automatically on the cloud, potentially defining also scalability and monitoring requirements. 30/11/2016 Ideas for the EINFRA-21 Call

24 Ideas for the EINFRA-21 Call
Big data analytics To provide seamless deployment of algorithms for big data analytics facilities that are build and managed transparently by the PaaS layer. The configuration/scaling/auto-provisioning of resources is managed by the PaaS in a shared or private approach depending on users requirements. The users should only take care of writing the code with an IDE (e.g. Eclipse, see also above) and providing information about computational and data requirements. Also the data ingestion phase into the big data analysis tool should be implemented (whenever this is possible) in the PaaS layer, hiding the complexity to the end user. 30/11/2016 Ideas for the EINFRA-21 Call

25 Ideas for the EINFRA-21 Call
Multi-PaaS Today, in INDIGO we only support one PaaS type, i.e. Mesos. This is currently not a strong limit, but depending on use cases we may want to also support other PaaS frameworks. With the INDIGO PaaS we already support: Long-running services Automated IaaS Execution of containerized apps via Chronos We need to check whether we have other use cases requiring things beyond that. In that case, we may need to support more PaaS systems. 30/11/2016 Ideas for the EINFRA-21 Call

26 Monitoring as a Service
Implementing and providing to end users the capability to instantiate a “Monitoring-as-a-Service” for their own application/services. The user could receive alerts/notification in case of event of his interest e.g. for applications that he deploys (via TOSCA templates, for example) into the cloud. This could be integrated into mobile applications. Could this cover accounting as well? 30/11/2016 Ideas for the EINFRA-21 Call

27 Exploiting external platform/services
The services developed within INDIGO2-Compute will be capable of integrating also external APIs to exploit data management services provided by external componets. This is particularly important as the INDIGO2-Compute services can then focus on dealing with the computational part and leave to external component data management features, such as: Data movement Data distribution Data Staging/Import 30/11/2016 Ideas for the EINFRA-21 Call

28 INDIGO2-Compute: Tentative Consortium Partners
ID Partner Country Community Role/main focus 1 CSIC (Lead) ES TBD 2 CESNET CZ Network technologies 3 INFN IT Orchestration -> Hybrid Cloud Support, Bare Metal, Enhanced Networking, User Interfaces 4 LIP PT 5 KIT DE 6 PSNC PL 7 UPV 8 EGI.eu NL 9 Slovak A.S. SK GPU support and developments? 30/11/2016 Ideas for the EINFRA-21 Call

29 Ideas for the EINFRA-21 Call
Other topics High-level topics 30/11/2016 Ideas for the EINFRA-21 Call

30 Smart Resource Brokering
Increase the capabilities of the already available brokering facility in INDIGO PaaS Core: Implementing a policy looking the “cost” of the resources: Economical cost “Data access” cost “Network” cost This could be a very complex task. Maybe this is best suited for an ICT call (where industry participation is essential). 30/11/2016 Ideas for the EINFRA-21 Call

31 Ideas for the EINFRA-21 Call
Next steps High-level topics 30/11/2016 Ideas for the EINFRA-21 Call

32 Ideas for the EINFRA-21 Call
Topics validation For all topics: Provide a detailed description Specify which products we are starting from (remember TRL6+) Identify concrete use cases and interested scientific communities For the “other topics”: Decide whether to drop or retain. If retain, define in which proposal 30/11/2016 Ideas for the EINFRA-21 Call

33 Consortia and proposals
Check that the Consortia are balanced and that no key JRA INDIGO partners are left out. Check whether the topics in the two proposals are balanced and especially verify possible overlaps or conflicts. Work out a common structure and timeframe for the preparation of the proposals. Integrate with topics in EINFRA-12. 30/11/2016 Ideas for the EINFRA-21 Call


Download ppt "Some ideas on possible INDIGO participation to the EINFRA call"

Similar presentations


Ads by Google