Download presentation
Presentation is loading. Please wait.
1
www.d4science.org D4Science technical features and opportunities of the grid infrastructure for large scale data management Pasquale Pagano D4Science Technical Director National Research Council, ISTI-CNR www.d4science.eu
2
2 A closer look on gCube technology Core Services Information Organisation Services Information Retrieval Services Presentation Services gCube architectural overview Services, layers & specifications The talk is not about this
3
3 www.d4science.eu Outline D4Science Mission D4Science World E-Infrastructure Technology Service D4Science Exploitation Data Management VREs Summing Up D4Science technical features
4
4 www.d4science.eu D4Science mission to provide a scientific e-Infrastructures that removes all heterogeneity, sustainability, scalability, and other technical concerns from the minds of scientists, hides all related complexities from their perception, and enables them to focus on their science and collaborate on common research challenges gCube is a framework to manage e-infrastructures where it is possible to define, host, and maintain dynamic Virtual Research Environments (VREs) capable to satisfy the collaboration needs of distributed Virtual Organizations (VOs) D4Science technical features
5
5 www.d4science.eu From a testbed to a production ecosystem DiligentD4ScienceD4Science II Oct.’04Nov.’07Jan.’08Dec.’09Oct.’09Sept.’11 Testbed Empower the grid middleware to: > manage Data and metadata as primary resources > virtualise the VO environment => gCube 0.9 => testbed Production Stabilize gCube by supporting two large user communities: > FARM > EM => gCube 1.5 (stable and open source) => d4science e- Infrastructure Production Promote interoperability across e- Infrastructures by empowering large user communities => gCube 2.0 => d4science ecosystem D4Science technical features
6
6 www.d4science.eu From a testbed to a production ecosystem functionality gLite gCube DiligentD4ScienceD4Science II Oct.’04Nov.’07Jan.’08Dec.’09Oct.’09Sept.’11 D4Science technical features
7
7 www.d4science.eu D4Science: a Threefold World Infrastructure TechnologyServices D4Science technical features
8
8 www.d4science.eu Infrastructure vs. e-Infrastructure An infrastructure is the basic physical and organizational structures and facilities (roads, power supplies,..) needed for the operation of a society or enterprise The D4Science e-Infrastructure provides support for effective consumption of shared resources: hardware-bound resources (i.e. networks, storage, instruments, and computational resources), system-level software resources (i.e. basic middleware services), and application-level software resources (i.e. data sources and services). D4Science technical features
9
9 www.d4science.eu Infrastructure vs. e-Infrastructure An infrastructure Connects remote places by providing facilities to assist supported resources and consumers. Has policies The D4Science e-infrastructure enables scientific communities to cooperate within a coherent model, regardless of the location of their research facilities Enforces policies D4Science technical features
10
10 www.d4science.eu D4Science e-Infrastructure (1/2) Facilitate the life of scientists by hiding the complexity D4Science e-Infrastructure Data analysis 50 Gb D4Science technical features
11
11 www.d4science.eu D4Science e-Infrastructure (2/2) Facilitate the life of scientists by supporting collaboration D4Science e-Infrastructure share 50 Gb access D4Science technical features
12
12 www.d4science.eu D4Science as e-Infrastructure: Key Features D4Science e-Infrastructure provides scientists with Easy-to-use tools for infrastructural resources registration and management Cost-effective tools for data resource registration, metadata generation, and curation Seamless access to shared, distributed and heterogeneous resources organized in dynamically created Virtual Research Environments D4Science technical features
13
13 www.d4science.eu e-Infrastructure Resources Hardware: Storage, Computing gCube Container Services & Applications: gCube Web Services External Software Collections & Related Resources: Data, Metadata, Indexes, Annotations Schemas, Mappings, Transformation programs The D4Science managed resources are: D4Science technical features
14
14 www.d4science.eu e-Infrastructure Resources [cont.] D4Science technical features
15
15 www.d4science.eu e-Infrastructure Site A Site B Site C D4Science technical features
16
16 www.d4science.eu Virtual Organization A Virtual Organization (VO) specifies how a set of users can access a set of resources by defining what is shared, who is allowed to share, the conditions under which sharing occurs and enforcing the authentication and authorization policies. VO D4Science technical features
17
17 www.d4science.eu Virtual Research Environment (1/3) VRE scenarios Data needs to be assessed before to make it publically exploitable by the VO members. Restricted set of users have to collaborate to refine processes and implement show cases. Products generated through elaboration of data or simulation have to be validated by expert users. Is the VO adequate to represent a growing aggregation of resources tailored to satisfy the evolving needs of the user community? NO, it is not ! D4Science technical features
18
18 www.d4science.eu Virtual Research Environment (2/3) VRE resources can be published in the VO at any time by the VRE data managers. Virtual Research Environment (VRE) is a distributed and dynamically created environment where subset of resources can be assigned to a subset of users for a limited timeframe. VRE 2 VRE 1 VO D4Science technical features
19
19 www.d4science.eu Virtual Research Environment (3/3) A Virtual Research Environment (VRE) supports cooperative activities like data analysis and processing; data generation, integration, enrichment, and curation; production of new knowledge using specialized tools D4Science technical features
20
20 www.d4science.eu Infrastructure, Virtual Organisation and VRE Infrastructure VRE VO D4Science technical features
21
21 www.d4science.eu D4Science as Technology: Key Features gCube Core (gCore) simplifies and standardizes all systemic aspects of service development; promotes the adoption of best practices in multiprogramming and distributed programming gCube Enabling Services lift the Grid approach for batch job execution and resource sharing to Web Services deployment and invocation in a SOA empowered e-Infrastructure D4Science technical features
22
22 www.d4science.eu gCore: innovation in developing An initiative to reduce complexity in the design and implementation of gCube services an application framework for the consolidation / development of existing/new services the gCube Core Framework (gCF) An initiative to meet the needs of system administrators, infrastructure managers, and resource providers an easy-to-install, self-contained sandbox to participate to the D4Science empowered e-Infrastructure the gCube Core Distribution (gHN) D4Science technical features
23
23 www.d4science.eu gCube Enabling Services – IS gCube provides an Information and Monitoring System where rich set of resources including computing, storage, service, data, metadata, and applications can be independently of their type : registered, discovered, and accessed monitored, shared in a controlled way, accounted Is a simple Registry sufficient to manage a growing set of heterogeneous resources? NO, it is not ! D4Science technical features
24
24 www.d4science.eu gCube Enabling Services - IS [cont] gCube Information System: collects information about the capabilities and status of all resources: Glue schema for computational and storage resources profiles for gCube services and their running instances profiles for content and metadata collections Currently it manages more than 100 M operations per year Serving more than 300 web services D4Science technical features
25
25 www.d4science.eu gCube Information System gHN embedded Mandatory D4Science technical features
26
26 www.d4science.eu gCube Enabling Services – dynamic application building gCube VRE Management System: manages services and applications reduces deployment costs reduces operational costs and application porting timeframes grants execution only to certified software It reduces the costs related to e-Infrastructure ownership, maintenance, and upgrade without compromising the essence of secure sharing VRE 2 VRE 1 VO D4Science technical features
27
27 www.d4science.eu gCube VRE Management System gHN embedded Mandatory D4Science technical features
28
28 www.d4science.eu D4Science as a Services Provider: Key Features gCube Service Frameworks tailored set of services to effectively manage all resources by providing seamlessly discover, access, and retrieval of data, metadata, and annotations through a variety of tools and protocols gCube Documentation tailored set of manuals to maximise the exploitation of the functionality by users, developers, and system administrators. D4Science technical features
29
29 www.d4science.eu gCube Services – powerful information model gCube Data Management System Persistently stores compound objects Manages heterogeneous metadata Supports metadata cleaning, enrichment, and transformation by exploiting mapping schema, controlled vocabulary, thesauri, and ontology describe similar to aggregate C 1 C 2 C 3 VRE 1 VRE 2 Supports programmatic/manual annotation of content, e.g. data provenance Supports content linking Provides support for collections Supports collections sharing across VREs D4Science technical features
30
30 www.d4science.eu gCube Data Management System D4Science technical features
31
31 www.d4science.eu gCube Services – flexible IR gCube Search Management provides an XML-based query language over full text, geospatial, and temporal information Maximizes the usefulness of resources available to VRE users by promoting resource sharing avoiding suboptimal usage Combines information retrieval and data processing capabilities D4Science technical features
32
32 www.d4science.eu gCube Search Management Search types Structured data (fielded search / xml search) Semi structured data (xml search) Geospatial / temporal data (R-Tree) Content based search Full text search Image similarity search Access XML-based Query Language Web user interface (portal / search portlets) Command line UI Retrieval Incremental result delivery Automatic caching Result persistence D4Science technical features
33
33 www.d4science.eu gCube Services – collaboration Collaborative Environment : a workspace where users can share Private data Data process results Annotation Process definition Derived data collaborate to define new document templates, new documents to tune applications and processes to compare execution results … opens unique opportunities for virtual collaborations Contain both objects owned by the workspace owner and objects the workspace owner has been allowed to see, e.g. group objects; D4Science technical features
34
34 www.d4science.eu Exploiting D4Science: Data Management D4Sciene technical features
35
35 www.d4science.eu Data Resources Staging AnalysisModelling Generation & Curation Registration D4Science technical features
36
36 www.d4science.eu ExampleProtocolMetadataDataRestriction EEAHTTPto be generatedweb pages scavenging N/A AATSRFTPto be generateddownloadN/A AquaMapsDatabaseto be generatedgrid jobsN/A NASORSS Feeddownload N/A Landsat7GridFTP (SE)to be generatedgrid jobsESA Site MERIS L3 Chlorophyll GridFTP (SE)to be generateddownloadN/A Specific ReportsFile Systemto be generateddownloadN/A …… AnalysisModelling Generation & Curation Registration Data Resources Staging I D4Science technical features
37
37 www.d4science.eu AnalysisModelling Generation & Curation Registration alternative views (a 2D map and 13 Global 3D views) … AquaMaps IO global Asia Indian Ocean Data Resources Staging II EEA Report IO multi media & multi part … report / part talk data cover URI D4Science technical features
38
38 www.d4science.eu AnalysisModelling Generation & Curation Registration Data Resources Staging III AquaMaps IO Descriptive metadata in proprietary format Data and metadata generated by filtering and rendering Relational DB data Standard classification (Phylum – Class – Order – Family – Species) Data provenance injected EEA Report IO Descriptive metadata in DC Data and metadata generated by web pages scavenging Data provider classification (e.g. Agriculture, Land use) Data provenance injected D4Science technical features
39
39 www.d4science.eu Data Resources Staging IV based on a scripting language abstracting over the gCube powerful data model 3 object types: {collection, resource, relationship} Each object has a set of properties Each object has a unique “external identifier” equipped with common data manipulation constructs, e.g., XSLT, Xpath provided with predefined data and metadata importers hiding infrastructure complexities Very compact workflow specifications D4Science technical features AnalysisModelling Generation & Curation Registration
40
40 www.d4science.eu Exploiting D4Science: The VREs
41
41 www.d4science.eu AquaMaps Grid implementation of the current AquaMaps.org approach Takes benefit from the computing capabilities Adds advanced filtering Manages integration of different data sources Generates provenance data 5 seconds to generate an AquaMaps object Up to hundreds concurrent generation Bulk support Still to come a facility to compare maps D4Science technical features
42
42 www.d4science.eu FCPPS VRE Provides support for the generation of fisheries and aquaculture country report Uses annotations as a means for the editors to communicate on specific topics and sections Supports aggregation of evolving data Enriched with a rich set of metadata Generates provenance data HTML publishing with a variety of XSLT OpenXML export Text, Images, TimeSeries D4Science technical features
43
43 www.d4science.eu ICIS VRE Offers a set of tools to manage capture statistics Supports the complete TS lifecycle Supports validation, curation, and analysis Provides support for data reallocation Produces uniform data-set Generates provenance data Multiple key families support Filtering, grouping, and aggregation Union Still to come facilities to perform complex reallocation rules Still to come facilities to compare large TSs D4Science technical features
44
44 www.d4science.eu SUMMING UP
45
45 www.d4science.eu Exploitation Models A new user community can exploit gCube / D4Science By creating a new infrastructure Different communities can run their own infrastructure The new community provides all resources By joining the D4Science infrastructure The production infrastructure currently serves two user communities (Earth Monitoring and Fisheries Management) The new community provides part of the resources D4Science technical features
46
46 www.d4science.eu VOs & VREs building A VRE brings together different types of resources through a well defined cost-effective process by offering a rich variety of functionality to access and exploit them. The creation of the community environment is simple and easy: A new VO can join one infrastructure in less then 1 day A new VRE can be deployed in less then 1 hours Many automatic deployment & configuration operations managed via the gCube Portal D4Science technical features
47
47 www.d4science.eu D4Science & the Grid Grid is controlled sharing of computing and storage facilities D4Science provides controlled sharing of Computing and storage facilities Services and applications Data, metadata and related resources To offer control-oriented and cross-domain content-oriented applications to store, describe, curate, annotate, search, select, merge, and transform heteregeneous information In the landscape of an on-demand created collaborative environment (VRE) D4Science technical features
48
48 www.d4science.eu gCube Specifications, Standards & Technologies WS-* WSRF X-* WS-BPEL JSR Glue Schema GSI-Security Java Globus Toolkit gLite More coming: OAI-PMH & OAI-ORE WS-DAI OpenSearch OpenGIS - related https://quality.wiki.d4science.research-infrastructures.eu/quality/index.php/Standards More Exploited: DC ISO19*
49
49 www.d4science.eu QUESTIONS? The gCube Technology is open source.
50
50 www.d4science.eu gCube Main Links gCube software http://software.d4science.research-infrastructures.eu/ gCube Administrator Guide https://wiki.gcore.research- infrastructures.eu/gCube/index.php/Administrator_Guide https://wiki.gcore.research- infrastructures.eu/gCube/index.php/Administrator_Guide gCube User Guide https://technical.wiki.d4science.research- infrastructures.eu/documentation/index.php/User%27s_Guide https://technical.wiki.d4science.research- infrastructures.eu/documentation/index.php/User%27s_Guide gCube Developer Guide https://technical.wiki.d4science.research- infrastructures.eu/documentation/index.php/Developer%27s_Guide https://technical.wiki.d4science.research- infrastructures.eu/documentation/index.php/Developer%27s_Guide
51
51 www.d4science.eu gCube License I EUPL to simplify the circulation of the software, its maintenance, and distribution. Full ownership of the software and guarantee that copyright is publicly known. Distribution of improvements has to return to the author for free. Compatible with the specificity and diversity of Member States Law and the Community Law: copyright terminology, information, warranty, liability, applicable law and jurisdiction Downstream compatible with the most relevant other licenses, e.g. GPL
52
52 www.d4science.eu gCube License II EUPL licensing makes software Open Source (or more generally “Free / Libre / Open Source Software – FLOSS) because the EUPL ensures the following rights to the licensee: Obtain the source code from a free access repository Modify the software, and/or make derivative works out of it Reproduce (copy, duplicate) the software Use the software in any circumstance and for all usage Communicate the software to the public by using it through a public network or by distributing services based on it Distribute the software or copies thereof to other users Lend and rent the software or copies thereof Sub-license rights in the software or copies thereof.
53
53 www.d4science.eu gCube by numbers gCube is inherently complex & large due to: The vast functional domain it covers The required abstractions allowing collaborative development and openness Distinguished build elements as of release v1.5.0: 76 Services and associated Libraries 48 Portlets & servlets 153 Distribution packages Corresponding testsuites, service stubs, archives, … Code size: Packages: 799, Classes: 4.406, Methods: 30.285, NCSS: 305.039 Building blocks’ characteristics: Highly sophisticated, composite sub-systems (accounting for more than 80%) 3 large frameworks (for building higher level elements) Implementation team size: Constantly more than 20 developers & designers D4Science technical features
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.