Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,

Slides:



Advertisements
Similar presentations
D4Science: a Data Infrastructure Ecosystem for Science DL.org Autumn School – Athens, 3-8 October 2010 Leonardo Candela 6 th October 2010.
Advertisements

Infrastrutture per Biblioteche e “Repositories” Digitali Donatella Castelli.
D4Science Project (DILIGENT For Science) Donatella Castelli CNR-ISTI DRIVER Summit January 2008 Gottingen (Germany)
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
A Social Networking Research Environment for Scientific Data Sharing: The D4Science Offering M. Assante, L. Candela, D. Castelli, F. Mangiacrapa, P. Pagano.
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Massimiliano Assante – Leonardo Candela – Donatella Castelli – Pasquale Pagano Fourteenth International Conference on Grey Literature An Environment Supporting.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Introduction to iMarine and it’s challenges Alexandros Antoniadis (NKUA) John Gerbesiotis (NKUA)
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
IMarine and our contribution 1 Presentation methodology: PechaKucha 20x20 Andrea Manzi (CERN) Nick Drakopoulos (CERN) IT GT.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
A Technical Overview Bill Branan DuraCloud Technical Lead.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Building Scientific Workflows for the Fisheries and Aquaculture Management Community based on Virtual Research Environments Pedro Andrade (CERN)
On the D4Science Approach Toward AquaMaps Richness Maps Generation Pasquale Pagano - CNR-ISTI Pedro Andrade.
Managing Virtual Research Environments in Hybrid Data Infrastructures Pasquale Pagano (CNR, Italy) iMarine Technical Director
User scenario on Marine Biodiversity AquaMaps Pasquale Pagano National Research Council (CNR) – ISTI Italy.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Managing deployment and activation of Web Applications in a distributed e-Infrastructure EGI Technical Forum September 2011 Lyon
Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
The EUBrazilOpenBio-BioVeL Use Case in EGI Daniele Lezzi, Barcelona Supercomputing Center EGI-TF September 2013.
IMarine: Accessing and Managing Biodiversity Data Pasquale Pagano (CNR) iMarine Technical Director
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI UMD Roadmap Steven Newhouse 14/09/2010.
D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference.
Virtual multidisciplinary EnviroNments USing Cloud infrastructures Data Management at VENUS-C Ilja Livenson KTH
Open Data and Cloud Computing e-Infrastructure for Biodiversity Daniele Lezzi Barcelona Supercomputing Center International Workshop on Science Gateways.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Virtual Research Environments as-a-Service Donatella Castelli CNR-ISTI EGI Conference 2016, 6-8 April.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Daniele Lezzi Execution of scientific workflows on federated multi-cloud infrastructures IBERGrid Madrid, 20 September 2013.
1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Towards a scientific cloud for Europe Åke Edlund, PhD KTH/CSC/PDC Cloud Group Lead Leader of VENUS-C WP2 – Scientific and International.
Enhancements to Galaxy for delivering on NIH Commons
Accessing the VI-SEEM infrastructure
Experience in managing service portfolio: OpenAIRE, BlueBridge
Ecological Niche Modelling in the EGI Cloud Federation
The BlueBRIDGE project
Pasquale Pagano (CNR-ISTI) Project technical director
Discovering and accessing data from a distributed network of data centres S. Mazzeo (ESA)
EOSC MODEL Pasquale Pagano CNR - ISTI
Virtual Research Environments as-a-Service
Pasquale Pagano CNR – ISTI (Pisa, Italy)
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Pasquale Pagano CNR, Italy
Donatella Castelli CNR-ISTI
Flanders Marine Institute (VLIZ)
Brief introduction to the project
PROCESS - H2020 Project Work Package WP6 JRA3
Introduction to D4Science
Virtual Research Environments as-a-Service
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
Presentation transcript:

Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli, Pasquale Pagano (ISTI- CNR) ISGC 2014 Taipei, 25 March 2014

Outline The D4Science InfrastructureThe Supporting ProjectsThe Infrastructure constituentsThe Virtual Research EnvironmentsVREs Examples 2

Geographically Distributed Computing Infrastructure Across administrative boundaries Across private and commercial providers Service Allocations, Deployment, Monitoring, and Operation Uniform resource and data access 3 The D4Science Infrastructure Production level infra deployed and maintained during D4Science (2007) and D4Science II (2009) projects

Hybrid Data Infrastructure An HDI is an IT Infrastructure where research resources (HW, SW, Data) can be shared and exploited on-demand built on existing systems, infrastructures and repositories supporting an innovative application-delivery-model computing, storage, data and software are made available as-a-Service Hybrid Data Infrastructure Application #1 Application #2 Application #N Infrastructure /system A Infrastructure /system B Infrastructure /system Z … … data server service apps 4

Supporting two models of provision For end-users – A GUI-centric approach focusing on visual interfaces for accessing Data Infrastructure facilities via a Web Browser For service providers – An API-Centric approach focusing on comprehensive set specifications and methods for accessing HDI facilities in a programmatic way 5

Operate a large-scale HDI supporting the Ecosystem Approach to Fishery and Conservation of Marine Living Resources Exploit D4Science infrastructure and its interface with existing grid (EGI) and cloud (MSAzure via VENUS-C platform) infrastructures and data sources (via SDMX, TAPIR, DiGIR, …) Manage the entire data lifecycle where data can be from any domain: from species observations to socio-statistical data, documents and environmental monitoring data Support ecological niche modelling, temporal and spatial data harmonization, statistical data analysis with R, data mining. Serve statisticians, fishery biologists, marine ecologists, economists, lawyers and enforcement bodies (customs, coast guards), conservationists 6

Operate a large-scale HDI serving the Biodiversity Science in Europe and Brazil Exploit D4Science infrastructure and its interface with existing cloud (MSAzure and COMPSs via VENUS-C platform) infrastructures and data sources (via SDMX, TAPIR, DiGIR, …) Provide open access to existing grid & cloud resources and software platforms across continents Combine the Biodiversity Science and the Open Access Movement Integrate Regional & Global Taxonomies Support biodiversity scientists willing to build, test and project models of species distribution 7

The D4Science infra is powered by gCube Enabling Technology 8

Software Platform 9 Enabling Layer Information System Resource management Workflow Engine

10 Registration Discovery Notification Monitoring Inspection Assignment Accounting A scalable and reliable framework – supporting an extensible notion of resource ( HW, Data, services) – open to modular extensions at runtime by arbitrary third parties Enabling Layer: Information System

11 A distributed framework managing a trusted resource network Dynamic Deployment remote deployment of resources across the infrastructure Resource lifetime management running of the lifetime of resources ranging from creation and publication to discovery, access and consumption Virtual Research Environment Management Cost effective creation, operation and maintenance of Virtual Research Environments Interoperability, openness and integration at software level third-parties software can be added to the Data e-Infrastructure at runtime - Web Applications (Running in Tomcat); Web Services (Running in service containers, e.g. JAX-WS, Axis); Executable (e.g. pojo, shell script, …) Enabling Layer: Resource Management

Based on adaptors for the execution on internal or external resources: – JDLAdaptor - parses a Job Description Language (JDL) definition block and translates the described job or DAG of jobs into an Execution Plan which can be submitted to the ExecutionEngine (gCube) for execution. – GridAdaptor - constructs an Execution Plan that can contact an EMI UI node, submit, monitor and retrieve the output of a grid job. – CondorAdaptor - constructs an Execution Plan that can contact a Condor gateway node, submit, monitor and retrieve the output of a condor job. – HadoopAdaptor - constructs an Execution Plan that can contact a Hadoop UI node, submit, monitor and retrieve the output of a Map Reduce job. 12 Enabling Layer: Workflow Engine

Infrastructure Constituents: Technologies The D4Science infrastructure hosts a set of components on top of different technologies to make available a large variety of services for managing, manipulating and processing data and metadata within an autonomously- managed infrastructure: – MS Azure – EGI – VENUS-C COMPSS PMES – u.store – openModeller – MongoDB, Cassandra, Hadoop, – GeoNetwork – ElasticSearch –.. 13

Infrastructure Constituents: Services and Data The D4Science infra leverages existing data sources ranging from species data (species names, synonyms, taxonomical classifications, spatial occurrences ) to literature, images – OBIS, – MyOcean, – Catalogue of Life, – FishBase, – speciesLink, – Biodiversity Heritage Library, – Bioline International, – Global Biodiversity Information Facility (GBIF), Catalogue of Life 14

Virtual Research Environment (VRE) is a distributed and dynamically created environment where subset of data, services, computational, and storage resources regulated by tailored policies are assigned to a subset of users via interfaces for a limited timeframe L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12 Virtual Research Environment (VRE) 15

Virtual Research Environment (VRE) Cost-effective creation and management Definition Creation Configuration 16

The Social Extension: Workspace It is a virtual drive in which you can upload and download the files needed for the services and the results Files can be organized into folders (sharing) Support for public URIs Support for WebDAV 17

The Social Extension: Messages and Notifications s in the Cloud Customizable Alerts 18

The Social Extension: News Feed Share News User-shared News Application-shared News 19

Outline VREs Examples 20

ICIS VRE - Tabular Data Analysis 21 Import CodeLists Validate Datasets Analyse And Project

Presence Points (FishBase + Obis) Density Based Clustering DBSCAN Other methods are also available … K-Means X-Means ScalableDataMining VRE - Features Clustering 22

AquaMaps VRE - Ecological Modeling access to external databases extensible with predictive algorithms exploit several computational back-end use several storage technologies (RDBMS, Column Store, Blob) publish distribution to Geospatial Web services 23

SpeciesLab VRE - Cross-Mapper Detecting and reporting differences between species checklists 24

MarineSearch VRE - Information Retrieval Entity Enrichment Semantic post-processing 25 Search over several OAI-PMH repositories

Summary The D4Science Infrastructure implementing the HDI approach enables heterogeneous resource sharing between cross- domain infrastructures Collects under a common environment resources coming from several e- infrastructures Successfully hosts Virtual Research Environments for members of different user communities Sustainability plan is under development for future EU funding and/or exploitation of public-private partnership 26

Landscape D4Science e-Infrastructure gCube Framework gCube Apps Discussion Thanks for your attention i-marine.d4science.org eubrazilopenbio.d4science.org Questions?