Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual Research Environments as-a-Service

Similar presentations


Presentation on theme: "Virtual Research Environments as-a-Service"— Presentation transcript:

1 Virtual Research Environments as-a-Service
Pasquale Pagano, CNR EGI Community Forum 10-13 November 2015 Bari, Italy

2 … as a Service Capabilities Virtual Research Environment
Outline Context E-Infrastructure History D4Science … as a Service Capabilities Virtual Research Environment gCube Features Numbers

3 e-Infrastructure An operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and people and organizational structures needed to support research efforts and collaboration in the large

4 Testbed: Virtual Research Environment
Genealogy DILIGENT Testbed: Virtual Research Environment D4Science Operational: several use cases (fisheries), gCube became an open source project D4Science-II Operational Ecosystem: use cases (marine biodiversity use cases), D4Science born to go beyond project lifetime iMarine Operational HDI: exploit D4Science, iMarine CoP, >1500 active users

5 D4Science operates VREs for …
+2000 scientists in 44 countries, integrating +50 heterogeneous data providers, executing +20,000 processes/month; providing access to over a billion quality records in repositories worldwide, with 99,7% service availability. D4Science hosts +40 VREs

6 Born to serve user needs
I need to host my applications in a secure and scalable environment I need to maintain my database I need to backup my data I need to securely delivery my data to a set of known people I want to offer a flexible sharing, storage, reporting, search and retrieval tool I need to manage and analyze data I need to manage the full data life-cycle from import to validation, curation, harmonization and publication I need to offer to my team a powerful tool to manage code-lists I need to reduce the costs of data maintenance of my dept. Capacities Applications I need to access authoritative data I need to simplify the access to my data I need to mash-up statistical and geospatial data I need to analyse my big datasets I need to validate my datasets and provide a standard access to them Data

7 Distinguishing capabilities of the e-infrastructure
D4Science

8 The D4Science infrastructure
Hybrid Data Infrastructure combining over 500 software components into a coherent and centrally managed system of hardware, software, and data resources

9 D4Science enables e-infrastructure by ...
Integrating geographically distributed computing infrastructure Overcoming administrative boundaries Exploiting private and commercial providers Providing service allocations, deployment, monitoring, and operation Ensuring uniform resource and data access Operation Built on SLAs Support monitoring, auditing, reporting, and notification Trust Privacy, governance, and attribution Security, trusted network

10 to host and maintain data
Storage as Service to host and maintain data Database Cloud Storage Geographical DB High-availability Standard Ready-to-use Scalable Reliable Secure Policies Standard Privacy and Attribution

11 Applications as a Service
to curate and manage data Metadata Generation Geospatial Data Biodiversity Data Statistical Data Textual Data Harmonization Disambiguate Validate Integrate and Consistency Check Data Exchange OGC protocols DarwinCore SDMX DublinCore

12 to process and extract knowledge
Computing as Service to process and extract knowledge Scalable Easy to Manage Across Boundaries Tailored Elastic Assignment of Computing Assignment of Processors Virtual Research Environment Heterogeneous High Throughput Map-Reduce Parallel R

13 Computational Engine Not another cloud computer platform but
a platform where executions can be repeated, compared, discussed, logged Not another computational engine but a platform where interdisciplinary tools and services can be easily contributed by the communities

14 Two exploitation models
Dispatcher Tools (R, Java, …) must be uploaded to the storage Executable is deployed on the worker nodes assigned to the VRE Data are made accessible to the worker nodes according to the specification provided Monitoring, accounting, failures management, partial re-execution, sharing, and repeatability are granted Application Framework Predefined data splitting models are provided A large array of models and algorithms can be exploited to define custom workflows Large array of algorithms to compare results are provided

15 Virtual Research Environment
to access, share and collaborate Share Database Tables Workflow Files Communicate Post Favourite Connection Organize Dynamic Secure Policy Driven

16 Virtual Research Environment
a distributed and dynamically created environment where subset of resources (data, services, computational, and storage resources) regulated by tailored policies (e.g. data encryption with VRE specific key, quota on service calls and storage usage, …) are assigned to a subset of users via interfaces for a limited timeframe at little or no cost for the providers of the participatory data e-infrastructures L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12

17 Metadata Applications Data Configuration VRE Definition
Simple and effective process to define a new environment Data Configuration

18 Applications vs Services
Logical View Applications Data Registry Hardware Configuration Physical View Software, Tools, Services Data

19 Application Bundles AppsCube BiolCube ConnectCube To develop applications interfacing gCube facilities To aid modelling and analysing of distribuition data, comparing checklists, and producing maps To facilitate data publication with appropriate tools including semantic technologies GeosCube StatsCube IceCube To assist tabular data validation, data enrichment ad efficient analytical tools To support deployment, operation & mgmt of a gCube-based infrastructure To properly access, consume and produce geospatial information

20 VRE Exploitation Exploited for Public VREs (used to offer an application environment to a subset of users of a community) and Private VREs (used for experiments, data access and preparation, and data analytics) Fully operational VRE available in one hour Software deployment and hardware setup completely hidden Evolving needs of its users completely supported

21 Entity as Resource Entity Server, Storage Container Software Data As a resource Publication/Discovery Lifecycle management Failure management Authorization-accounting As a service Access Orchestrate Reference Software as Resource: transforms servlets-based applications/services in e-Infrastructure resource Container as Resource: transforms standard servlets-based container in e-Infrastructure resource Federated Sources as Resource: transforms external DBs and Repositories in e-Infrastructure resource Algorithm as Resource: for any new algorithm, model, procedure, workflow, … it is possible to manage policies and assign dedicated Hardware and Storage resources Dataset and single product as Resource: for any dataset, map, timeseries, code list, …. It is possible to manage policies and monitor their exploitation

22 SmartGears “a set of Java libraries that turn Servlet-compliant containers and applications into infrastructure resources, transparently.” gCube Wiki turn software and containers into resources what does it mean ?

23 Software-as-Resource Container-as-Resource Actual Solution
SmartGears [cont.] Software-as-Resource Container-as-Resource Actual Solution Zero constraints software and nodes we can discover use without hardcoded knowledge monitor and control take actions when not operational dedicate to user groups change policies, assign roles human solutions not practical, often impossible automated solutions local enabling software, remotely controlled management tasks compile and publish descriptions track and change status enforce policies

24 gCube: One stable open-source platform
gCube enables the D4Science HDI Statistics form openhub.net/p/gCube

25 Multi-tenant Delivery Model
Infrastructure as a Service Dynamic deployment Hosting Resource Lifecycle Monitoring Accounting Security Software as a Service VRE BiolCube ConnectCube GeosCube StatsCube Platform as a Service FeatherWeightStack SmartGears ApplicationSupportLayer SOA3

26 References / Links D4Science: http://www.d4science.org Policies
Procedures gCube: Catalogue of Applications Software Key Features Developer Guide FeatherWeightStack SmartGears gCube APIs Administration Guide

27 Thank you for your attention
Questions?


Download ppt "Virtual Research Environments as-a-Service"

Similar presentations


Ads by Google