VLDATA Common solution for the (very-)large data challenge EINFRA-1, focus on topics (4) & (5)

Slides:



Advertisements
Similar presentations
HOlistic Platform Design for Smart Buildings
Advertisements

1 From Grids to Service-Oriented Knowledge Utilities research challenges Thierry Priol.
Research Infrastructures WP 2012 Call 10 e-Infrastructures part Topics: Construction of new infrastructures (or major upgrades) – implementation.
High-Performance Computing
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
SCD in Horizon 2020 Ian Collier RAL Tier 1 GridPP 33, Ambleside, August 22 nd 2014.
ROSCOE Status IT-GS Group Meeting Focus on POW November 2009.
Research and Innovation Research and Innovation Research and Innovation Research and Innovation Research Infrastructures and Horizon 2020 The EU Framework.
WP5 Strategy Domenico Giardini SED ETHZ. WP5 Objectives Harmonize national implementation Integrate the European scientific community Establish Centres.
Assessment of Core Services provided to USLHC by OSG.
August 27, 2008 Platform Market, Business & Strategy.
Common solution for the (very-)large data challenge. VLDATA Call: EINFRA-1 (Focus on Topics 4-5) Deadline: Sep. 2nd 2014.
The Preparatory Phase Proposal a first draft to be discussed.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
1 European policies for e- Infrastructures Belarus-Poland NREN cross-border link inauguration event Minsk, 9 November 2010 Jean-Luc Dorel European Commission.
Work Programme for the specific programme for research, technological development and demonstration "Integrating and strengthening the European Research.
1 INFRA : INFRA : Scientific Information Repository supporting FP7 “The views expressed in this presentation are those of the author.
WP7 - Architecture and implementation plan Objectives o Integrating the legal, governance and financial plans with technological implementation through.
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
DASISH Final Conference Common Solutions to Common Problems.
1 Web: Steve Brewer: Web: EGI Science Gateways Initiative.
Jarek Nabrzyski, Ariel Oleksiak Comparison of Grid Middleware in European Grid Projects Jarek Nabrzyski, Ariel Oleksiak Poznań Supercomputing and Networking.
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
ATTRACT – From Open Science to Open Innovation Information Sharing Meeting Brussels, June 19, 2014 Markus Nordberg (CERN) Development and Innovation Unit.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
New from the Project R.Graciani 4 th DIRAC User Workshop May, CERN Disclaimer: what it is not there is my fault.
Common solution for the (very-)large data challenge. VLDATA Call: EINFRA-1 (Focus on Topics 4-5) Deadline: Sep. 2nd 2014.
EMI INFSO-RI EMI Roadmap to Standardization and DCI Collaborations Alberto Di Meglio (CERN) Project Director.
ComPASS Summary, Budgets & Discussion Panagiotis Spentzouris, Fermilab ComPASS PI.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
NCP Info DAY, Brussels, 23 June 2010 NCP Information Day: ICT WP Call 7 - Objective 1.3 Internet-connected Objects Alain Jaume, Deputy Head of Unit.
DIRAC 4 EGI: Report on the experience R.G. 1,3 & A.Tsaregorodtsev 2,3 1 Universitat de Barcelona 2 Centre de Physique des Particules de Marseille 3 DIRAC.
Research Infrastructures WP 2012 Call 10 e-Infrastructures 9th e-Infrastructure Concertation Meeting 23 September 2011, Lyon "The views expressed in this.
A Reference Model for RDA & Global Data Science Yin ChenWouter Los Cardiff University University of Amsterdam 1.
Overview on European e-Infrastructure Augusto Burgueño DG CONNECT Porto, 18 June 2015 – GÉANT General Assembly.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
European Middleware Initiative (EMI) Alberto Di Meglio (CERN) Project Director.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
1 st EGI CMMST VT meeting 19 February 2013 A. Laganà (UNIPG, Italy)
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Plans for PY2 Steven Newhouse Project Director, EGI.eu 30/05/2011 Future.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
DIRAC as a Service R. Graciani 1, V. Méndez 2, T. Fifield 3, A. Tsaregordtsev 4 1 University of Barcelona 2 University Autónoma of Barcelona 3 University.
European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.
IoT R&I on IoT integration and platforms INTERNET OF THINGS
EGI-Engage EGI Webinar - Introduction - Gergely Sipos EGI.eu / MTA SZTAKI 6/26/
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
BioExcel - Intro Erwin Laure, KTH. PDC Center for High Performance Computing BioExcel Consortium KTH Royal Institute of Technology – Sweden University.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Grant.
EUB Brazil: IoT Pilots HORIZON 2020 WP EUB Brazil: IoT Pilots DG CONNECT European Commission.
EGI-InSPIRE EGI-InSPIRE RI EGI strategy towards the Open Science Commons Tiziana Ferrari EGI-InSPIRE Director at EGI.eu.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Bob Jones EGEE Technical Director
Accessing the VI-SEEM infrastructure
H2020, COEs and PRACE.
EGI-Engage Engaging the EGI Community towards an Open Science Commons
EGI – Organisation overview and outreach
Claire NAUWELAERS, independent policy expert
EGI Webinar - Introduction -
From Use Cases to Implementation
Presentation transcript:

VLDATA Common solution for the (very-)large data challenge EINFRA-1, focus on topics (4) & (5)

Objectives  An open & generic platform supporting efficient and cost-effective solutions for large-scale distributed data processing, curation, analysis and publication. Providing standard-base interfaces and interoperable access to various e-Infrastructures  Evolution of existing solutions, advancing state-of-the- art, addressing: openness, extensibility, flexibility, interoperability, scalability, efficiency, productivity, security, cost effectiveness  User Community driven co-design, validated by end users and supporting a new generation of data scientists.  Cooperation among Technology providers, integrating existing technologies, to simplify the connection between Users and Resource providers.

(Direct) Impact  To the Research Infrastructures  Scalability, robustness  Participating RIs will operate their Distributed Computing Systems efficiently, processing their large volume research data, making it available to their end users in a reliable and cost- effective manner that couldn't be achieved before. This may lead to new ways of organizing science activities, leading to significant scientific breakthroughs.  To the end user: scientist/operator  Simplicity and interoperability  By providing important functional components (e.g., pilots, single interface, etc. ) missing from existing practices, VLDATA platform will make possible the transparent integration of resources, hiding the complexity from use, resulting in the extension of the scale of the resources RIs can utilize.  To funding agencies  Cost efficiency  By reducing the duplication efforts, maximizing the use of EU-invest on e-Infrastructures, enlarging the user communities, providing efficient data processing services, providing advanced technology by integrating the state-of-the-art will reduce development cost significantly.

Consortium

Make IT simple  Simplicity: VLDATA provides an abstraction of the different Resources that are all made accessible the end user via the same interfaces.  Transparency: Users are allowed to specify their Workflows/Pipelines with different levels of abstractions. The platform takes care of the necessary Resource Allocation to fulfill the required specifications.  Extendibility and flexibility: VLDATA provides an API that allows users to extend the provided functionality by developing new or customized components  Reliability: Quality standards and extensive validation in several scientific domains to ensure the readiness-to-use and robustness of VLDATA based solutions  Scalability: Modular implementation allowing horizontal (amount of connected Resources or Users) and vertical (amount of processed Units) scaling to adapt VLDATA to the needs of each particular community or Research Infrastructure project.  Smart and intelligent: building on collected experience and monitoring data, algorithm can look for optimized scheduling/searching strategies, including automated decision making based on usage traces and expectations.  Cost-effective: Building up on existing well-established solutions and incrementally extending and developing to address new challenges with an evolving validated common solution, avoiding unnecessary duplicated efforts.

VLDATA Framework: System Logging, Configuration, Accounting, Monitoring Framework: System Logging, Configuration, Accounting, Monitoring Advanced Modules: Compute & Data Management Basic Modules: File Catalog, Resource Status, Request Management, Workload Management User Interfaces: Portals, Command Line, REST, APIs User Interfaces: Portals, Command Line, REST, APIs Resource Interfaces: Grids, Clouds, Clusters Computing, Storage Public, Private, Volunteer Resource Interfaces: Grids, Clouds, Clusters Computing, Storage Public, Private, Volunteer Requirements Quality & Security

Organization

Development Area (WP 1-5)  Main Partners:  AMC, Cardiff, CPPM, CYFRONET, DESY, MTA SZTAKI, UAB, UB, Westminster  Working Cycle:  Requirement-Analysis -> Design -> Development -> Integration -> Quality Control  Work Plan:  Year 0:Prototype  Year 1:Scaling + Integration  Year 2:Catalogs + Quality  Year 3:Virtualization + Security  Year 4:New Challenges + Consolidation  Sustainability  Open Source Distribute Data Processing Collaboration: Open DISData Association  4-year decreasing budget for Development Area 8

Validation Area (WP 6)  Each participating RI produces and validates it own solution using common framework and tools  Main Partners  LHCb, Belle II, BES III, Pierre Auger Observatory, EISCAT_3D, Astrophysics, Computational Chemistry, Molecular Structure simulation, Seismology  (TBC) IceCube, COMPASS, NA62, CTA  SMEs  Working Cycle:  Design -> Integration -> Validation  Transversal activities, sharing experience, training, tools, etc.

Exploitation Area (WP 7-9)  Target: sustainability  Main partners  CNRS, UvA, EGI, ASCAMM, ETL, Bull  From “DIRAC Consortium” towards “Open Distributed Data Processing Collaboration”  Work Packages:  Communication  Training  Sustainability

Outputs & Inputs This Consortium WP1 WP2, 3, 4, 5 WP6 WP7, 8, 9 Experts Resource Providers Other Projects Other RIs Existing Products Policy Makers

Working model  User community driven co-development (Rapid Application Development):  Open, iterative, incremental and parallel, requirement- driven development process 12

Abstract The proposed project aims to produce and validate common solutions to the processing, curation, analysis and publication of very large scientific data generated by European and world­wide scientific Research Infrastructures (RIs). The number of RIs in Europe and beyond expected to collect yearly multi­Petabyte data samples increases exponentially and they will soon be reaching the Exa scale. Existing solutions must be evolved in order to cope with large­scale distributed data processing. The VLDATA platform will provide standard­based interoperable access to various types of resources: Grid, Cloud, Volunteer, HPC, etc. (funded with different models: capex or opex, and coming from the public or private sector), and software tools/services running on­top to support global data science. Various RIs from different scientific domains, from physics to life sciences or to chemistry will validate VLDATA platform by implementing solutions for their concrete use cases, achieving at the same time a significant optimization in the efficiency and cost, and ensuring that no aspects of the challenge will be ignored. The complete life­cycle of the data will be addressed, as well as interoperability between different scientific domains and e­Infrastructures. The project gathers experts with complementary backgrounds from the Technology and the Resource Provider worlds that, collaborating with other relevant external experts and those from the participating RIs, will: (1) analyse the requirements for each of the RIs, (2) provide the VLDATA generic platform, starting from current solutions and following an incremental iterative development model, (3) design, prototype and implements the Distributed Computing Systems for each of the projects participating RIs, and (4) make the resulting VLDATA platform available to other RIs with similar needs. To reach other RIs, VLDATA will promote standard interfaces and tools, define appropriate quality assurance mechanisms and provide dissemination and training events, aiming to be sustained in the long run by contributions from new RIs benefitting from the VLDATA platform.