Rome - 24 January Earth Server EU FP7-INFRA project Scalability for Big Data Roberto Barbera - University of Catania and INFN - Italy Rome - 24 January 2014
2 Earth Server EU FP7-INFRA project Big Data infrastructures’ layout v v [1] (*) EC Directorate General CNECT - Directorate C: Excellence in Science - Unit C1 – e-Infrastructures – «Research Data e-Infrastructures: Framework for Action in H2020”
Rome - 24 January Earth Server EU FP7-INFRA project Time Common Data Services - Evolution of distributed computing and storage Mainframe Computing 80’s-90’s Cluster Computing 90’s-00’s Grid Computing (e.g., EGI) Cost of hw Cost of networks Power of COTS WAN bandwidth 00’s-10’s Cloud Computing (e.g., EGI FedCloud) A Big Data e-Infrastructure ready for H2020 should be standard-based, scalable, computing-model-agnostic and interoperable
Combine everything together and get a new buzzword: Jungle Computing (*) (*) B. Kahanwal and T. P. Singh, “The Distributed Computing Paradigms: P2P, Grid, Cluster, Cloud, and Jungle”, International Journal of Latest Research in Science and Technology, Vol. 1, Issue 2, Page , July-August (2012), ISSN (Online): , 4
Rome - 24 January Earth Server EU FP7-INFRA project Scalability A pertinent definition: ◦ ” In electronics (including hardware, communication and software), scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth ” [1] …and several implementations: ◦ Scalability across infrastructure models ◦ Scalability of software ◦ Scalability of potential users ◦ Scalability across data types and clients ◦ Scalability across platforms ◦ Scalability across services and standards [1] Bondi, André B. (2000), "Characteristics of scalability and their impact on performance“, Proceedings of the second international workshop on Software and performance – WOSP '00. p. 195, doi: / , ISBN X.
Scalability of potential users: Science Gateways
Scalability of potential users: Identity Federations
Scalability across infrastructure models 15
Scalability across platforms Cloud FedCloud IT ES EG ZA CZ GR 8 clouds 6 countries 3 m/w stacks 1 SME
Current functionalities: Federated authentication Fine-grained authorisation Single/multi-deployment of VMs on a cloud and across clouds Single/multi-move of VMs across clouds Single/multi-deletion of VMs on a cloud and across clouds SSH connection to VMs Direct web access to VMs hosting web services Scalability across platforms
Rome - 24 January Earth Server EU FP7-INFRA project Fine grained authorisation
Rome - 24 January Earth Server EU FP7-INFRA project Scalability across services and standards
Rome - 24 January Earth Server EU FP7-INFRA project Scalability across infrastructure models and clients eToken service Front-ends glibrary.ct.infn.it REST API AuthN / AuthZ Science Gateway User Tracking DB Call gLibrary REST API through API Server Gateway Metadata Service Local storage Grid storage Cloud Storage Authorization service Authentication service
Rome - 24 January Earth Server EU FP7-INFRA project Scalability across data types and e-infrastructure models – the ESA MERIS repository
Rome - 24 January Earth Server EU FP7-INFRA project Scalability across data types and clients – the unified mobile client glibrary.ct.infn.it REST API AuthN / AuthZ Unified mobile client Local storage Grid storage Cloud Storage Authorization service Authentication service Provider’s storage …
Rome - 24 January Earth Server EU FP7-INFRA project Scalability across platforms (using Appcelerator Titanium)
Rome - 24 January Earth Server EU FP7-INFRA project Unified view of repositories Hierarchical filtering
Rome - 24 January Earth Server EU FP7-INFRA project Browse & download
10 The eCSG Mobile «in action» (data browsing and download)