Download presentation
Presentation is loading. Please wait.
Published byMatthew Crawford Modified over 8 years ago
1
www.d4science.eu D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference “Distributed Platforms for e-Learning: How can educators and Scholars benefit from the Open Science Commons?” 22 May 2015, Lisbon www.d4science.eu
2
2 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science D4Science is an Hybrid Data Infrastructure combining over 500 software components providing access to more than 25k datasets serving more than 1300 jobs a day D4Science EGI FedCloud Research Centers … … Software as Service Platform as Service Data as Service +1700 users in 44 countries
3
3 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science Multi-tenant Delivery Model Infrastructure as a Service Provisioning Hosting Lifecycle Mgmt. Monitoring Accounting Security Software as a Service BiolCube ConnectCube GeosCube StatsCube IceCube AppsCube Platform as a Service FeatherWeightStack SmartGears ApplicationSupportLayer SOA3
4
4 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for Data Providers Common Approach Registration / Import Harmonization Generation of Metadata Publication in Standard Format Specialized Implementation Geospatial Data Biodiversity Data Statistical Data Registration Harmonization Generation of Metadata Publication in Standard Format
5
5 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for Data Providers and Consumers Registry Validatio n Enriching Processi ng Sharing OBIS WoR MS WoR DS GBIFCoLITIS IRM NG NCBI MyO cean WOA Euro Stat Data. FAO …
6
6 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for Data Providers and Consumers Ontologies and Data Warehouses Biological and Ecological Data GeoSpatial Data Statistical Data Documents DarwinCore / ISO19139 >35 M Observations (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) > 600 K Scientific Names (ITIS) > 40 K Species Maps DarwinCore / ISO19139 >35 M Observations (OBIS) ≈ 120 K Observed Species (OBIS) ≈ 500 K Taxa (WoRMS) > 600 K Scientific Names (ITIS) > 40 K Species Maps SDMX * FAO CodeLists IRD CodeLists FAO datasets Eurostat … SDMX * FAO CodeLists IRD CodeLists FAO datasets Eurostat … ISO19139 (OGC W*S) 10 years Chemical and Physical variables in 2D space Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, Phytoplankton as carbon, Salinity, Temperature, … On-demand Chemical and Physical variables in 3D space Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, … ISO19139 (OGC W*S) 10 years Chemical and Physical variables in 2D space Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, Phytoplankton as carbon, Salinity, Temperature, … On-demand Chemical and Physical variables in 3D space Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, … > 350 variables OAI-PMH, OpenSearch FAO Facksheets Aquatic Commons Bioline International Biodiversity Heritage OceanDocs… OAI-PMH, OpenSearch FAO Facksheets Aquatic Commons Bioline International Biodiversity Heritage OceanDocs… RDF, OWL FAO FLOD Marine Top Level Ontology IRD Ecoscope FactForge, Yago2 RDF, OWL FAO FLOD Marine Top Level Ontology IRD Ecoscope FactForge, Yago2
7
7 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science Integrated Platform for Scientists SPD (BiolCube) ecological and biological data GeoExplorer (GeosCube) geospatial data Tabular Data (StatsCube) statistical and reference data Cotrix reference data (ConnectCube) Statistical Manager (StatsCube) data analytics for interdisciplinary research
8
8 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Scientific Exploitation: Fisheries CMSY: estimates Maximum Sustainable Yield from catch statistics ICES workshop Combine mathematics, biological science, ict, information and environmental engineering Ranked as the “most frequent best performer”* CatchSeries Analysis: mining catch time-series estimate the effect of piracy on catch trends Identify outliers, estimate trend, forecasting, periodicity detection, seasonality, … Combine fisheries, biological science, ict, information engineering
9
9 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Scientific Exploitation: Biodiversity Trendylyzer: detects common and rare species in marine areas Novel workflow based on clustering and mining techniques Combine mathematics, biological science, ict, information and environmental engineering BiOnym: a flexible and powerful search engine in large taxonomic trees Customizable workflow engine for taxonomic analysis Combine taxonomy and bilogical science, ict and engineering
10
10 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Scientific Exploitation: Environment Rasterization: transforms vectorial datasets into raster datasets XYZT - DataExtraction: extracts data with uniform geographic projection MaxEntropy: finds the correlation between one phenomenon and N environmental variables Exploitable in workflow with ANN Accept as-reference input data in a plethora of standard formats Need not worry about independence between features
11
11 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for end-users A single place to Get status and updates from applications and other users they are interested in; Get notifications about messages, jobs completion, new generated products, etc. Share Updates User news feed VREs user is member of
12
12 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for end-users [cont.] A single place to Manage data, store and preserve them Share data
13
13 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science for end-users [cont.] Notification to a user Discussion topic Members page Member profile Member contacts
14
14 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 BOOSTING EDUCATION AND KNOWLEDGE Listening our communities …
15
15 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Requirements from the iMarine Board “Running a model can become a challenge, especially for students in ecology with limited IT skills but can be a problem as well for teachers to configure the machines, give access to datasets and sufficient machine resources. Some models can be left beside just because of configuration issues happening just after downloading the sources.” Julien Barde, IRD “Currently, experiments are run on our servers sequentially, which takes about 40 minutes. We lack the capacity to handle demand. The classroom can become a collaborative platform through the creation of a virtual research environment with the allocation of the required resources”. Rainer Froese, GEOMAR
16
16 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Requirements from the iMarine Board “Running a model can become a challenge, especially for students in ecology with limited IT skills but can be a problem as well for teachers to configure the machines, give access to datasets and sufficient machine resources. Some models can be left beside just because of configuration issues happening just after downloading the sources.” Julien Barde, IRD “Practical classroom usage would open up new opportunities. A complex analysis to estimate Maximum Sustainable Yields takes several minutes for one user but our institute lacks the capacity to handle demand. Currently, experiments are run on our servers sequentially, which takes about 40 minutes. We lack the capacity to handle demand. The classroom can become a collaborative platform through the creation of a virtual research environment with the allocation of the required resources”. Rainer Froese, GEOMAR Deployment and configuration of the environment Availability of data and models Sufficient computational and storage capacity Collaboration between students (students and teachers)
17
17 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 OPENING D4SCIENCE Experience done so far
18
18 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Type 1: Training for Scientists (Biologists) 3 Master degree courses at the UPMC - Sorbonne University in Paris and Villefranche-sur- Mer 66 students in Oceanography and Computational Biology Main topics: 1.Biodiversity experiments in the data infrastructure era. 2.Retrieving species data, occurrences, and taxonomic information 3.Using data to predict species presences in the oceans 4.Habitats discovering 5.Estimating habitats similarities
19
19 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Type 2: Training for scientific tools developers PhD course in Computer Engineering at the University of Pisa 20 computer scientists, computer engineers, bio-engineers, telecom engineers Main topics: 1.Geospatial data visualization and representation 2.Statistical models for distribution modelling 3.Accessing to large heterogeneous catalogs 4.Signal processing of biodiversity-related observations 5.Machine Learning applied to observation records 6.Lexical search in large taxonomic trees 7.Cloud computing applied to biodiversity analyses
20
20 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Example of practicing workflow of an intern at CNR: 1.Collecting and curating data (e.g. for the giant squid) 2.Producing maps using different Bayesian approaches and environmental features 3.Developing in R specialized data analysis models 4.Importing maps from reference literature approaches 5.Executing and tuning the new model 6.Comparing the results 7.Producing an article: http://www.sciencedirect.com/science/article/pii/S0 304380015001222 Type 3: Thesis & Internship
21
21 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Feedback in using a data infrastructure for training o Flexibility in adapting to different students/scientists o Remotely hosted computational/storage facilities o Large availability of data and models o Sharing and social networking o Experiments reproducibility o Putting students in communication with scientists o Possibility to use it after the course: degree thesis, PhD etc.
22
22 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 The experience continues: BlueBRIDGE To support capacity building in interdisciplinary research communities actively involved in increasing scientific knowledge about resource overexploitation, degraded environment and ecosystem with the aim of providing a more solid ground for informed advice to competent authorities and to enlarge the spectrum of growth opportunities as addressed by the Blue Growth Societal Challenge. INFRA-9-2015 e-Infrastructures for virtual research environments Planned start date: September 2015
23
23 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Blue Skill: Objectives Developing and deploying VREs essential in the area of protection and management of marine resources for boosting education knowledge bridging between research and innovation Volume Thematic Geographical reach Configuration Constraints
24
24 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Blue Skill: target users Scientists operating on marine related and biological conservation topics (e.g. stock assessment) Scientists presenting their models at workshops and at hands-on meetings Scholars that intend to broaden their knowledge on marine conservation Students in the marine and oceanographic domain Supporting reproducibility
25
25 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Planned courses International Council for the Exploration of the Sea (ICES), Denmark Institut de recherche pour le développement (IRD), France Univ. of Kiel, Univ. of Pisa, Agrocampus-Ouest, Paris Sorbone Univ., Technological Educational Institute of Western Greece
26
26 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 D4Science makes this possible Virtual Research Environment platform a distributed and dynamically created environment where subset of resources (data, services, computational, and storage resources) regulated by tailored policies are assigned to a subset of users via interfaces for a limited timeframe at little or no cost for the providers of the participatory data e- infrastructures L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
27
27 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 Q&A Time Thank you for your attention
28
28 www.d4science.org EGI 2015 Conference, Lisbon, 17-22 May 2015 References / Links D4Science Web site: http://www.d4science.orghttp://www.d4science.org gCube Web site: http://www.gcube-system.orghttp://www.gcube-system.org Software Key Features https://gcube.wiki.gcube-system.org/gcube/index.php/GCube_Features https://gcube.wiki.gcube-system.org/gcube/index.php/GCube_Features Developer Guide https://gcube.wiki.gcube-system.org/gcube/index.php/Developer%27s_Guide https://gcube.wiki.gcube-system.org/gcube/index.php/Developer%27s_Guide FeatherWeightStack https://gcube.wiki.gcube-system.org/gcube/index.php/Featherweight_Stack https://gcube.wiki.gcube-system.org/gcube/index.php/Featherweight_Stack SmartGears https://gcube.wiki.gcube-system.org/gcube/index.php/SmartGears https://gcube.wiki.gcube-system.org/gcube/index.php/SmartGears gCube APIs https://gcube.wiki.gcube- system.org/gcube/index.php/GCube_Application_Programming_Interface https://gcube.wiki.gcube- system.org/gcube/index.php/GCube_Application_Programming_Interface Administration Guide https://gcube.wiki.gcube-system.org/gcube/index.php/Administrator%27s_Guide https://gcube.wiki.gcube-system.org/gcube/index.php/Administrator%27s_Guide
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.