Download presentation
Presentation is loading. Please wait.
1
D4Science: An e-Infrastructure for Facilitating Data Management, Process, Sharing, and Access Pasquale Pagano National Research Council of Italy pasquale.pagano@isti.cnr.it Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 FAO (Rome) www.d4science.eu
2
2 D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Assumptions Consolidated facts: Very rich applications and data collections are currently maintained by a multitude of authoritative providers Different problems require different execution paradigms: batch, map- reduce, synchronous call, message-queue, … Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), … Several standards are adopted in the same domain Societal observations A rich variety of protocols, models, and formats Create barriers in the usage of resources Delay dramatically new exploitation patterns Technical observations Protocols, models, and formats heterogeneity increases load, Load increases failures
3
3 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 D4Science Vision D4Science objectives: hide heterogeneity, i.e. abstract over differences in location, protocol, and model; embrace heterogeneity, i.e. allow for multiple locations, protocols, and models; Technical goals no bottlenecks: scale no less than the interfaced resources no outages: keep failures partial and temporary autonomicity: system reacts and recovers
4
4 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 From a testbed to a production ecosystem DiligentD4ScienceD4Science II Oct.’04Nov.’07Jan.’08Dec.’09Oct.’09Sept.’11 Testbed Empower the grid middleware to: > manage data and metadata as primary resources > virtualise the VO environment Production Stabilize gCube by supporting two large user communities: > FARM > EM Production Promote interoperability across e-Infrastructures by empowering large user communities Prototype => gCube 0.9 Software Framework => gCube 1.6 (stable and open source) => d4science e- Infrastructure Open Platform => gCube 2.0 (feature reach and interop.) => d4science ecosystem
5
5 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 From a testbed to a production ecosystem functionality gLite gCube DiligentD4ScienceD4Science II Oct.’04Nov.’07Jan.’08Dec.’09Oct.’09Sept.’11
6
6 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Infrastructure Exploitation 30 Nodes CNR NKUA ESA FAO UNIBASEL 25 Data EEA MERIS AATSR 69 Metadata es ISO19115 eiDB 15 Data AquaMaps Fact Sheets Country Maps 28 Metadata FARM_dc aquamaps NodesCollectionsFunctionality 29 Nodes CNR NKUA FAO UNIBASEL Integration with gPod Geographical and text search Search by metadata Personal workspace Objects annotation Report generation Maps Generation Time Series management Production More than 500 autonomic Web Services
7
7 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 A Digital Library System is a possibly distributed system that collects, manages and preserves for the long term rich digital content, and offers to its user communities specialised functionality on that content, of measurable quality and according to codified policies [The Digital Library Reference Model] The gCube data infrastructure enabling framework provides DL functionality by: gCube as a Digital Library System Federating exiting digital content Supporting the generation of new digital content Providing discovery and access capabilities maintained in a variety of tailored repository systems by exploiting heterogeneous computational platforms on diversely described and modeled digital content
8
8 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 gCube as an e-Infrastructure ecosystem enabling framework By bridging a number of well-established systems and standards from various domains including high-energy physics, biodiversity, fishery and aquaculture resources management gCube realises an e-Infrastructure ecosystem
9
9 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Each community (VO) registers its own resources under its domain, registers and authorises its users. Starting from this set of resources (hardware, data and applications) VREs can be dynamically set up and activated Each user logins to the VO’s personalized environment and from there, the user will search, elaborate and store shared and personal information. Later on the community administrators can dynamically add or remove resources and users from their domain. How does it work ?
10
10 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Why sharing through VREs is a key? Through the VRE, groups of users have controlled access to distributed data and services integrated under a personalised interface.
11
11 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Why sharing through VREs is a key? A Virtual Research Environment (VRE) supports cooperative activities Metadata cleaning, enrichment, and transformation by exploiting mapping schema, controlled vocabulary, thesauri, and ontology Processes refinement and show cases implementation (restricted to a set of users); Data assessment (required to make data publically exploitable by VO members); Expert users validation of products generated through data elaboration or simulation.
12
12 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Why sharing through VREs is a key? VREs integrated environment put at disposal a functionality set to support and perform research activities: the ability to integrate heterogeneous data and services the ability to process information on-demand ingesting the results, to share data and process with other users, to customize collection of information, to store user actions and exploit them for further use, to aggregate relevant information into ad-hoc information sources and keeping them updated. VREs integrated environment put at disposal a functionality set to support and perform research activities: the ability to integrate heterogeneous data and services the ability to process information on-demand ingesting the results, to share data and process with other users, to customize collection of information, to store user actions and exploit them for further use, to aggregate relevant information into ad-hoc information sources and keeping them updated.
13
13 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Building Virtual Research Environments Lifetime & Description Information Space MetadataFunctionalityQoS
14
14 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 … TransformationStorage VRE Facilities Tools supporting specific tasks A virtual live document to describe research results A virtual desktop to organize the working environment Workspace Species Maps Generation Time Series Management Report Management SearchAnnotationVisualisation SearchAnnotationVisualisation AnnotationSearch Storage Visualisation Transformation Storage
15
15 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Workspace A collaboration-oriented suite providing for seamless access and organisation facilities on a rich array of objects (e.g. Information Objects, Queries, Files, Templates) mediation between external world objects, systems and infrastructures (import/export/publishing) support common file manager (drag & drop, contextual menu) support an effective rich object sharing facility
16
16 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 AquaMaps is an application* tailored to predict global distributions of marine species initially designed for marine mammals and subsequently generalised to marine species, that generates color-coded species range maps using a half-degree latitude and longitude blocks by interfacing several databases and repository providers Species Distribution Maps Generation * Algorithm by Kashner et al. 2006
17
17 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 AquaMaps execution is based on the gCube Ecological Niche Modelling Suite which allows the extrapolation of known species occurrences Species Distribution Maps Generation ◦ to determine environmental envelopes (species tolerances) ◦ to predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution) Very large volume of input and output data: HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360 Very large number of computation: One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)
18
18 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Time Series Management Offers a set of tools to manage capture statistics Supports the complete TS lifecycle Supports validation, curation, and analysis Provides support for data reallocation Produces uniform data-set
19
19 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Time Series Offers a set of tools to operate on capture statistics Multiple key families support Filtering, grouping, and aggregation Union Mining Produce automatically provenance information
20
20 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Report Management A collaboration-oriented suite providing for template-oriented, feature-rich and flexible document format definition effective and infrastructure-integrated report compilation (drag & drop workspace items) collaborative and distributed editing (workspace based) standard-based report materialisation (HTML, OpenXML)
21
21 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 VREs, Workspaces and Report in Action
22
22 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 BEHIND THE SCENE
23
23 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 PE2ng Definition Process Execution Engine (PE2ng, pronounced as ‘peng’) is a system to manage the execution of software elements in a distributed infrastructure under the coordination of a composite plan that defines the data dependencies among its actors. Close relatives: Job Management Systems (Condor) Distributed Computing Frameworks (MPI, MapReduce)
24
24 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 More Info PE2ng motivation is the instantiation of a liberal computational infrastructure that: Builds on existing infrastructures Integrates existing technologies Supports several software paradigms without performance compromises Provides a powerful, flow-oriented processing model PE2ng’s dual nature: Coordinator of external computational infrastructures Native computational infrastructure provider and manager
25
25 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 PE2ng and the Cloud Exploits all modern clouds paradigms (PaaS, SaaS, IaaS) Provides a PaaS: Based on Streams (gCube Resultset – gRS2) Support for dynamic infrastructure reorganisation Offloaded to Cloud Management decision making Direct interaction with cloud management : under implementation Supports SaaS via a combination of gCube services Fits several Infrastructures: No built-in dependencies for computation or storage
26
26 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Binding together infrastructures Single Infrastructure Utilise capacities to the fullest Bound “for better or for worst” Bend business logic to fit One size fits all? Infrastructure ecosystem Don’t hide Infrastructures Not yet another layer Choose infrastructure to fit needs Turn Infrastructure into a utility Unrestrictive Meta-Infrastructure Single submission, monitoring, access Single language for “Programming in the Large” and “Small” … ?
27
27 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Terms use on PE2ng Workflow: a high level plan that binds together conceptual operations for the implementation of a task. Execution Plan: a plan for the invocation of code components (aka invocables, i.e. services, binary executables, scripts, …) that ensures that prerequisite data are prepared and delivered to their consumers by defining the flow of data and/or control. Resource: Software, data, network, systems… Registry: A directory service where resources are enlisted for discovery
28
28 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 System Architecture Overview Storage Execution Engine Workflow Engine Registry Transport Processing Workflow Presentation System Security Proxying Delegation Resource Model State Network State Workflow Plan Query Invoke Store Transfer Comp. Process Software / Callable Execution Engine Workflow Engine Pluggable Domain Logic Pluggable Domain Logic Adaptors Domain-specific Business Logic Layers Search, Maintenance, Administration, … Domain-specific Business Logic Layers Search, Maintenance, Administration, … SOAP calls, Java calls, HTTP API, Shell Invocations… Execution Plan Workflow Language Adaptor Specific Resources Clients, Applications Domain Specific Language
29
29 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Infrastructure A A PE2ng infrastructure 30/9/2010 External Advisory Board Meeting PE2ng Execution “Boundary”: The distributed “node” of PE2ng Executables Other Infrastructures Storage Registry Node PE2ng Node PE2ng Worker Node # Grid ui x x x x x x Infrastructure B Node PE2ng Worker Node # Hadoop gw x x x x x x HDFS Storage Adapter FTP Server Storage Adapter SE Storage Adapter Node PE2ng x x x x x x x x x x x x Node PE2ng Node x x x x x x x x x x x x Registry Resource Model Adaptor
30
30 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 gCube Data Transformation Service (gDTS) A service to tackle with the issue of transformation of data among various manifestations Features: Distributed (PE2ng based) Manifestation and transformation agnostic “Intelligent”, objective-driven operation Why so important ? Plays vital role to several data staging steps within the infrastructure Seems to cover out of the box several needs of “interoperability” as conceived by the communities
31
31 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 A Transformers Registry gDTS case 30/9/2010 External Advisory Board Meeting T1 BA T2 CA T3 DC T4 EB Conf A T4 CB Conf B D T2 CA T3 DC T1 BA T4 CB Conf B T3 DC Input Output 3 hops 2 hops
32
32 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 VRE Sumamry D4Science approach: Heterogeneous resources are accessible in a common ecosystem of resources despite their locations, technologies, and protocol Different communities have access to different views according to the conditions under which the sharing can occur Each community can define its own virtual research environment to satisfy specific needs for a limited timeframe and at no cost for the providers of the resource Several virtual research environments can coexist without interfering each other even by competing for the same resources
33
33 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Conclusions Facts Very rich services and data collections are currently maintained by a multitude of authoritative providers Several standards are adopted in the same domain Interoperability approaches are key to exploit such richness D4Science offers a variety of patterns, tools, and solutions to interconnect Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms with a rich set of free-to-use tailored services to decrease the cost of adoption to reduce the time to market of new ideas to deal with plethora of standards
34
34 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Supported Standards WS-* WSRF WS-BPEL JDL JSDL Glue Schema (part) X-* DC, TEI, ISO etc JSR (several) GSI-Security XACML SAML OpenSearch OGC related Comply with: OAI-PMH OAI-ORE
35
35 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Supported Standards WSRF Specifications WS-ResourceProperties (WSRF-RP) WS-ResourceLifetime (WSRF-RL) WS-ServiceGroup (WSRF-SG) WS-BaseFaults (WSRF-BF) JSR 168 : Simple Portlets 286 : 186 update 160 : JMX WSN Specifications: WS-BaseNotification WS-Topics (WS-BrokeredNotification) …. WS-* Standards SOAP WSDL WS-Addressing …. ISO: ISO3166 countries ISO4217 currencies ISO19115 geo-location …. X-* XML XSD XSL XSLT xPath xQuery OGC Web Coverage Processing Service Web Coverage Service Web Feature Service Web Map Context Web Map Service Web Map Tile Service Web Processing Service Web Service Common OGF Standard: Glue Schema (2) ………. Comply with: OAI-PMH OAI-ORE
36
36 www.d4science.eu D4Science Digital Repositories – Linked Open Data: the possible Role of D4Science 16-17 December 2010 Find us www.gcube-system.org www.d4science.eu Donatella Castelli D4Science-II Project Director donatella.castelli@isti.cnr.it Pasquale Pagano D4Science-II Technical Director pasquale.pagano@isti.cnr.it Thank You For Your Attention
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.