IMarine: Accessing and Managing Biodiversity Data Pasquale Pagano (CNR) iMarine Technical Director

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

D4Science: a Data Infrastructure Ecosystem for Science DL.org Autumn School – Athens, 3-8 October 2010 Leonardo Candela 6 th October 2010.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
HPC Pack On-Premises On-premises clusters Ability to scale to reduce runtimes Job scheduling and mgmt via head node Reliability HPC Pack Hybrid.
Infrastrutture per Biblioteche e “Repositories” Digitali Donatella Castelli.
A Social Networking Research Environment for Scientific Data Sharing: The D4Science Offering M. Assante, L. Candela, D. Castelli, F. Mangiacrapa, P. Pagano.
A Service for Data-Intensive Computations on Virtual Clusters Rainer Schmidt, Christian Sadilek, and Ross King Intensive 2009,
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
Massimiliano Assante – Leonardo Candela – Donatella Castelli – Pasquale Pagano Fourteenth International Conference on Grey Literature An Environment Supporting.
Interoperability ERRA System.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
CEMS: The Facility for Climate and Environmental Monitoring from Space Victoria Bennett, ISIC/CEDA/NCEO RAL Space.
Project number: Data and Data Requirements Wouter Los University of Amsterdam.
Ocean Observatories Initiative Common Execution Infrastructure (CEI) Overview Michael Meisinger September 29, 2009.
material assembled from the web pages at
Introduction to iMarine and it’s challenges Alexandros Antoniadis (NKUA) John Gerbesiotis (NKUA)
Design central EMODnet portal Objectives, Technical Proposal and Consultation Process.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
Magellan: Experiences from a Science Cloud Lavanya Ramakrishnan.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
The FI-WARE Project – Base Platform for Future Service Infrastructures FI-WARE Stefano De Panfilis (Fi-WARE PCC Member) 4 th July 2011 FInES - Samos Summit.
IMarine and our contribution 1 Presentation methodology: PechaKucha 20x20 Andrea Manzi (CERN) Nick Drakopoulos (CERN) IT GT.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Building Scientific Workflows for the Fisheries and Aquaculture Management Community based on Virtual Research Environments Pedro Andrade (CERN)
On the D4Science Approach Toward AquaMaps Richness Maps Generation Pasquale Pagano - CNR-ISTI Pedro Andrade.
Managing Virtual Research Environments in Hybrid Data Infrastructures Pasquale Pagano (CNR, Italy) iMarine Technical Director
User scenario on Marine Biodiversity AquaMaps Pasquale Pagano National Research Council (CNR) – ISTI Italy.
Deriving fishing monthly effort and caught species from vessel trajectories Coro, Gianpaolo Fortunati, Luigi Pagano, Pasquale OCEANS - Bergen, 2013 MTS/IEEE.
Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,
D4Science: Opening data Infrastructures to boost education and knowledge Pasquale Pagano D4Science Technical Director CNR - ISTI EGI Conference.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
Pedro Andrade > IT-GD > D4Science Pedro Andrade CERN European Organization for Nuclear Research GD Group Meeting 27 October 2007 CERN (Switzerland)
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Virtual Research Environments as-a-Service Donatella Castelli CNR-ISTI EGI Conference 2016, 6-8 April.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Accessing the VI-SEEM infrastructure
Experience in managing service portfolio: OpenAIRE, BlueBridge
Organizations Are Embracing New Opportunities
The BlueBRIDGE project
Pasquale Pagano (CNR-ISTI) Project technical director
Discovering and accessing data from a distributed network of data centres S. Mazzeo (ESA)
EOSC MODEL Pasquale Pagano CNR - ISTI
Virtual Research Environments as-a-Service
Joslynn Lee – Data Science Educator
Pasquale Pagano CNR – ISTI (Pisa, Italy)
Pasquale Pagano CNR, Italy
INTAROS WP5 Data integration and management
Donatella Castelli CNR-ISTI
Flanders Marine Institute (VLIZ)
Marketplace & service catalog concepts, first design analysis
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Introduction to D4Science
Virtual Research Environments as-a-Service
MMG: from proof-of-concept to production services at scale
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

iMarine: Accessing and Managing Biodiversity Data Pasquale Pagano (CNR) iMarine Technical Director

Concepts iMarine - Just an overview The initiative (the visionary leadership) The e-infrastructure (the operational platform) The system (the enabling sw system) 2

e-Infrastructure Geographically Distributed Computing Infrastructure Across administrative boundaries Across private and commercial providers Service Allocations, Deployment, Monitoring, and Operation Uniform resource and data access iMarine - Just an overview 3

Infrastructure: key characteristics Efficient and tailored storage technologies Computational environments dealing with the volume of the data Elastic management of the resources, monitoring, alerting, recovery Collaborative environment to support scientific communities Rich portfolio of applications to perform access, validation, enriching, processing, sharing, and mash-up of data iMarine - Just an overview 4

Infrastructure: Management as Service Operation Machine readable SLAs Machine readable monitoring, auditing, billing, reporting, and notification Machine readable resource/performance capabilities description Trust Privacy, governance, and attribution Security, trusted network iMarine - Just an overview 5

Infrastructure: Storage as Service Scalability and high availability Across sites ISO 19115/10139 Metadata Catalogue Open source RDBMS Up to 1 TB data Secure Fault-tolerant Replication Virtual Workspace Relational Databases Large and Active data storage Spatial Database iMarine - Just an overview 45 TB Currently Used 6

iMarine OBIS WoR MS WoR DS GBIFCoLITIS IRMN G NCBI MyOc ean WOA EuroS tat Data. FAO … iMarine Registries ValidationEnrichingProcessingSharing Data Bonanza iMarine - Just an overview Private Cloud Commercial Cloud 7

Infrastructure: Computing as Service Statistical Manager Hadoop R clusters Analysis/clustering/modeling MapReduce Windows and Linux iMarine - Just an overview 330 Cores Currently Allocated 8

Is this enough? An ecosystem of participatory data e- Infrastructures Regulated by policies Enabled by standards Promoting not only access but mash-up of heterogeneous data iMarine - Just an overview User centric 9

Virtual Research Environment iMarine is user-centric and workflow-oriented thanks to the gCube VRE technology Virtual Research Environment (VRE) is a distributed and dynamically created environment where subset of data, services, computational, and storage resources regulated by tailored policies are assigned to a subset of users via interfaces for a limited timeframe at little or no cost for the providers of the participatory data e-infrastructures iMarine - Just an overview L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol

Statistical Manager D4Science Computational Facilities Sharing Setup and execution Statistical Manager Statistical Manager is a set of web services that aim to: Help scientists in performing biological or climate analyses Supply precooked state-of-the-art algorithms as-a-Service Perform calculations by using Cloud Computing approaches in a transparent way to the users Share input, results, parameters and comments with colleagues by means of Virtual Research Environment in the D4Science e- Infrastructure iMarine - Just an overview 11

Architecture iMarine - Just an overview 12

Internal Work iMarine - Just an overview 13

Statistical Manager in D4Science The Statistical Manager distributed computations may run on the D4Science Infrastructure: D4Science WNs are VMs equipped with the gCube Container running the gCube Executor Web Service Executables and Input are remote downloaded (MongoDB and Postgresql) Tasks queue is implemented trough Messaging (ActiveMQ) iMarine - Just an overview 14

Statistical Manager & DIRAC The Statistical Manager exploits the assigned D4Science WNs and additional nodes can be added by site managers at any time by using a management control UI. D4Science WN assign VRE1 VRE2 iMarine - Just an overview 15

Statistical Manager & DIRAC Requirements Handling of credentials without X509 certificates ( username/password ) Integration of Accounting and Monitoring Integration steps Creation of a D4Science WN VM and upload to a VM Repository to test image contextualization Integration of VM Scheduler API within D4Science Infrastructure. iMarine - Just an overview 16

Landscape D4Science e-Infrastructure gCube Framework gCube Apps Discussion i-marine.d4science.org iMarine - Just an overview 17

Google Analytics iMarine portal iMarine - Just an overview 18

Application Bundles iMarine - Just an overview Management and interpretation of biological and ecological data in the environment Complete full life-cycle data framework, from observational data to aggregated data repositories enriched with validation and analytical tools Storage and interpretation of geospatial explicit information, including WPS processing Flexible sharing, storage, reporting, search and retrieval, aggregation and projection facilities A BUNDLE is a set of services and technologie s grouped according to a family of related tasks for ac hieving a common objective 19

BiolCube related publications W. Appeltans, P. Pissierssens, G. Coro, A. Italiano, P. Pagano, A. Ellenbroek, T. Webb (2013). Trendylyzer: a Long-Term Trend Analysis on Biogeographic Data, In Proceedings of the International Conference on Marine Data and Information Systems (IMDIS). Lucca, Italy. L. Candela, D. Castelli, G. Coro,P. Pagano, F. Sinibaldi (2013) Species Distribution Modeling in the Cloud, Concurrency and Computation: Practice and Experience, Ed. Wiley (DOI: /cpe.3030).DOI: /cpe.3030 D. Castelli, P. Pagano, G. Coro, F. Sinibaldi (2013) Modellazione della Nicchia Ecologica di Specie Marine (Marine Species Ecological Niche Modelling). In Le Tecnologie del CNR per il Mare (CNR Marine Technologies) pp. 140, Ed. CNR (Roma, Italy). D. Castelli, P. Pagano, L. Candela, G. Coro (2013). The iMarine Data Bonanza: Improving Data Discovery and Management through an Hybrid Data Infrastructure, In Proceedings of the International Conference on Marine Data and Information Systems (IMDIS), Lucca. G. Coro, P. Pagano, A. Ellenbroek (2013) Combining Simulated Expert Knowledge with Neural Networks to Produce Ecological Niche Models for Latimeria chalumnae, Ecological Modelling, DOI /j.ecolmodel , Ed. Elsevier. (Acknowledged in) R. Froese, J. Thorson, R. B. Reyes Jr. (2013) A Bayesian Approach to the estimation of length-weight relationships in fishes. Journal of Applied Ichthyology P. Pagano, G. Coro, D. Castelli, L. Candela, F. Sinibaldi, A. Manzi (2013) Cloud Computing for Ecological Modeling in the D4Science Infrastructure. Proceedings of EGI Community Forum. iMarine - Just an overview 20

Links and References (1/3) GeosCube Selected Links Geospatial Cluster work plan for iMArine Board marine.eu/index.php/Geospatial_clusterhttp://wiki.i- marine.eu/index.php/Geospatial_cluster Geospatial Data Processing system.org/gcube/index.php/Geospatial_Data_Processinghttp://gcube.wiki.gcube- system.org/gcube/index.php/Geospatial_Data_Processing OGC/ISO publishing guidelines marine.eu/index.php/OGC/ISO_Publishing_guidelines_for_Data_and_Services_Provi ders marine.eu/index.php/OGC/ISO_Publishing_guidelines_for_Data_and_Services_Provi ders OGC OWS Context 1.0 Guidelines marine.eu/index.php/OGC_OWS_Context_1.0_Guidelineshttp://wiki.i- marine.eu/index.php/OGC_OWS_Context_1.0_Guidelines Environmental Service system.org/gcube/index.php/Environmental_Servicehttps://gcube.wiki.gcube- system.org/gcube/index.php/Environmental_Service iMarine - Just an overview 21

Links and References (2/3) GeosCube Selected standardization work P.Gonçalves, R.Brackin, Open Geospatial Consortium, OWS Context 1.0 Conceptual Model, Candidate Standard, 30th June 2013: – (OGC r1) P.Gonçalves, R.Brackin, Open Geospatial Consortium, OWS Context 1.0, Atom Encoding Specification, Candidate Standard, 3rd June 2013: – (OGC r1) iMarine - Just an overview 22

Links and References (3/3) GeosCube Selected Publications D. Castelli, P. Pagano, G. Coro Variazioni Climatiche ed Effetto sulle Specie Marine (Climate Changes and Effect on Marine Species)”. In Le Tecnologie del CNR per il Mare (CNR Marine Technologies) pp. 139, Ed. CNR (Roma, Italy). D. Castelli, P. Pagano, G. Coro Elaborazione di Dati Trasmessi da Pescherecci (Processing of fishing vessel transmitted information). In Le Tecnologie del CNR per il Mare (CNR Marine Technologies). pp. 133, Ed. CNR (Roma, Italy). G. Coro, L. Fortunati, P. Pagano Deriving Fishing Monthly Effort and Caught Species from Vessel Trajectories. To be published in Oceans 2013, Proceedings of MTS/IEEE. iMarine - Just an overview 23