Presentation is loading. Please wait.

Presentation is loading. Please wait. D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference.

Similar presentations

Presentation on theme: " D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference."— Presentation transcript:

1 D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference “Distributed Platforms for e-Learning: How can educators and Scholars benefit from the Open Science Commons?” 22 May 2015, Lisbon

2 2 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science D4Science is an Hybrid Data Infrastructure combining over 500 software components providing access to more than 25k datasets serving more than 1300 jobs a day D4Science EGI FedCloud Research Centers … … Software as Service Platform as Service Data as Service +1700 users in 44 countries

3 3 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science Multi-tenant Delivery Model Infrastructure as a Service Provisioning Hosting Lifecycle Mgmt. Monitoring Accounting Security Software as a Service BiolCube ConnectCube GeosCube StatsCube IceCube AppsCube Platform as a Service FeatherWeightStack SmartGears ApplicationSupportLayer SOA3

4 4 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for Data Providers Common Approach Registration / Import Harmonization Generation of Metadata Publication in Standard Format Specialized Implementation Geospatial Data Biodiversity Data Statistical Data Registration Harmonization Generation of Metadata Publication in Standard Format

5 5 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for Data Providers and Consumers Registry Validatio n Enriching Processi ng Sharing OBIS WoR MS WoR DS GBIFCoLITIS IRM NG NCBI MyO cean WOA Euro Stat Data. FAO …

6 6 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for Data Providers and Consumers Ontologies and Data Warehouses Biological and Ecological Data GeoSpatial Data Statistical Data Documents DarwinCore / ISO19139 >35 M Observations (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) > 600 K Scientific Names (ITIS) > 40 K Species Maps DarwinCore / ISO19139 >35 M Observations (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) > 600 K Scientific Names (ITIS) > 40 K Species Maps SDMX *  FAO CodeLists  IRD CodeLists  FAO datasets  Eurostat  … SDMX *  FAO CodeLists  IRD CodeLists  FAO datasets  Eurostat  … ISO19139 (OGC W*S)  10 years Chemical and Physical variables in 2D space  Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, Phytoplankton as carbon, Salinity, Temperature, …  On-demand Chemical and Physical variables in 3D space  Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, … ISO19139 (OGC W*S)  10 years Chemical and Physical variables in 2D space  Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, Phytoplankton as carbon, Salinity, Temperature, …  On-demand Chemical and Physical variables in 3D space  Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, … > 350 variables OAI-PMH, OpenSearch  FAO Facksheets  Aquatic Commons  Bioline International  Biodiversity Heritage  OceanDocs… OAI-PMH, OpenSearch  FAO Facksheets  Aquatic Commons  Bioline International  Biodiversity Heritage  OceanDocs… RDF, OWL  FAO FLOD  Marine Top Level Ontology  IRD Ecoscope  FactForge, Yago2 RDF, OWL  FAO FLOD  Marine Top Level Ontology  IRD Ecoscope  FactForge, Yago2

7 7 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science Integrated Platform for Scientists SPD (BiolCube) ecological and biological data GeoExplorer (GeosCube) geospatial data Tabular Data (StatsCube) statistical and reference data Cotrix reference data (ConnectCube) Statistical Manager (StatsCube) data analytics for interdisciplinary research

8 8 EGI 2015 Conference, Lisbon, 17-22 May 2015 Scientific Exploitation: Fisheries CMSY: estimates Maximum Sustainable Yield from catch statistics ICES workshop Combine mathematics, biological science, ict, information and environmental engineering Ranked as the “most frequent best performer”* CatchSeries Analysis: mining catch time-series estimate the effect of piracy on catch trends Identify outliers, estimate trend, forecasting, periodicity detection, seasonality, … Combine fisheries, biological science, ict, information engineering

9 9 EGI 2015 Conference, Lisbon, 17-22 May 2015 Scientific Exploitation: Biodiversity Trendylyzer: detects common and rare species in marine areas Novel workflow based on clustering and mining techniques Combine mathematics, biological science, ict, information and environmental engineering BiOnym: a flexible and powerful search engine in large taxonomic trees Customizable workflow engine for taxonomic analysis Combine taxonomy and bilogical science, ict and engineering

10 10 EGI 2015 Conference, Lisbon, 17-22 May 2015 Scientific Exploitation: Environment Rasterization: transforms vectorial datasets into raster datasets XYZT - DataExtraction: extracts data with uniform geographic projection MaxEntropy: finds the correlation between one phenomenon and N environmental variables Exploitable in workflow with ANN Accept as-reference input data in a plethora of standard formats Need not worry about independence between features

11 11 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for end-users A single place to Get status and updates from applications and other users they are interested in; Get notifications about messages, jobs completion, new generated products, etc. Share Updates User news feed VREs user is member of

12 12 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for end-users [cont.] A single place to Manage data, store and preserve them Share data

13 13 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for end-users [cont.] Notification to a user Discussion topic Members page Member profile Member contacts

14 14 EGI 2015 Conference, Lisbon, 17-22 May 2015 BOOSTING EDUCATION AND KNOWLEDGE Listening our communities …

15 15 EGI 2015 Conference, Lisbon, 17-22 May 2015 Requirements from the iMarine Board “Running a model can become a challenge, especially for students in ecology with limited IT skills but can be a problem as well for teachers to configure the machines, give access to datasets and sufficient machine resources. Some models can be left beside just because of configuration issues happening just after downloading the sources.” Julien Barde, IRD “Currently, experiments are run on our servers sequentially, which takes about 40 minutes. We lack the capacity to handle demand. The classroom can become a collaborative platform through the creation of a virtual research environment with the allocation of the required resources”. Rainer Froese, GEOMAR

16 16 EGI 2015 Conference, Lisbon, 17-22 May 2015 Requirements from the iMarine Board “Running a model can become a challenge, especially for students in ecology with limited IT skills but can be a problem as well for teachers to configure the machines, give access to datasets and sufficient machine resources. Some models can be left beside just because of configuration issues happening just after downloading the sources.” Julien Barde, IRD “Practical classroom usage would open up new opportunities. A complex analysis to estimate Maximum Sustainable Yields takes several minutes for one user but our institute lacks the capacity to handle demand. Currently, experiments are run on our servers sequentially, which takes about 40 minutes. We lack the capacity to handle demand. The classroom can become a collaborative platform through the creation of a virtual research environment with the allocation of the required resources”. Rainer Froese, GEOMAR Deployment and configuration of the environment Availability of data and models Sufficient computational and storage capacity Collaboration between students (students and teachers)

17 17 EGI 2015 Conference, Lisbon, 17-22 May 2015 OPENING D4SCIENCE Experience done so far

18 18 EGI 2015 Conference, Lisbon, 17-22 May 2015 Type 1: Training for Scientists (Biologists) 3 Master degree courses at the UPMC - Sorbonne University in Paris and Villefranche-sur- Mer 66 students in Oceanography and Computational Biology Main topics: 1.Biodiversity experiments in the data infrastructure era. 2.Retrieving species data, occurrences, and taxonomic information 3.Using data to predict species presences in the oceans 4.Habitats discovering 5.Estimating habitats similarities

19 19 EGI 2015 Conference, Lisbon, 17-22 May 2015 Type 2: Training for scientific tools developers PhD course in Computer Engineering at the University of Pisa 20 computer scientists, computer engineers, bio-engineers, telecom engineers Main topics: 1.Geospatial data visualization and representation 2.Statistical models for distribution modelling 3.Accessing to large heterogeneous catalogs 4.Signal processing of biodiversity-related observations 5.Machine Learning applied to observation records 6.Lexical search in large taxonomic trees 7.Cloud computing applied to biodiversity analyses

20 20 EGI 2015 Conference, Lisbon, 17-22 May 2015 Example of practicing workflow of an intern at CNR: 1.Collecting and curating data (e.g. for the giant squid) 2.Producing maps using different Bayesian approaches and environmental features 3.Developing in R specialized data analysis models 4.Importing maps from reference literature approaches 5.Executing and tuning the new model 6.Comparing the results 7.Producing an article: 304380015001222 Type 3: Thesis & Internship

21 21 EGI 2015 Conference, Lisbon, 17-22 May 2015 Feedback in using a data infrastructure for training o Flexibility in adapting to different students/scientists o Remotely hosted computational/storage facilities o Large availability of data and models o Sharing and social networking o Experiments reproducibility o Putting students in communication with scientists o Possibility to use it after the course: degree thesis, PhD etc.

22 22 EGI 2015 Conference, Lisbon, 17-22 May 2015 The experience continues: BlueBRIDGE To support capacity building in interdisciplinary research communities actively involved in increasing scientific knowledge about resource overexploitation, degraded environment and ecosystem with the aim of providing a more solid ground for informed advice to competent authorities and to enlarge the spectrum of growth opportunities as addressed by the Blue Growth Societal Challenge. INFRA-9-2015 e-Infrastructures for virtual research environments Planned start date: September 2015

23 23 EGI 2015 Conference, Lisbon, 17-22 May 2015 Blue Skill: Objectives Developing and deploying VREs essential in the area of protection and management of marine resources for boosting education knowledge bridging between research and innovation Volume Thematic Geographical reach Configuration Constraints

24 24 EGI 2015 Conference, Lisbon, 17-22 May 2015 Blue Skill: target users  Scientists operating on marine related and biological conservation topics (e.g. stock assessment)  Scientists presenting their models at workshops and at hands-on meetings  Scholars that intend to broaden their knowledge on marine conservation  Students in the marine and oceanographic domain Supporting reproducibility

25 25 EGI 2015 Conference, Lisbon, 17-22 May 2015 Planned courses International Council for the Exploration of the Sea (ICES), Denmark Institut de recherche pour le développement (IRD), France Univ. of Kiel, Univ. of Pisa, Agrocampus-Ouest, Paris Sorbone Univ., Technological Educational Institute of Western Greece

26 26 EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science makes this possible Virtual Research Environment platform a distributed and dynamically created environment where subset of resources (data, services, computational, and storage resources) regulated by tailored policies are assigned to a subset of users via interfaces for a limited timeframe at little or no cost for the providers of the participatory data e- infrastructures L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12

27 27 EGI 2015 Conference, Lisbon, 17-22 May 2015 Q&A Time Thank you for your attention

28 28 EGI 2015 Conference, Lisbon, 17-22 May 2015 References / Links D4Science Web site: http://www.d4science.org gCube Web site: http://www.gcube-system.org Software Key Features  Developer Guide  FeatherWeightStack  SmartGears  gCube APIs  Administration Guide 

Download ppt " D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference."

Similar presentations

Ads by Google