Common solution for the (very-)large data challenge. VLDATA Call: EINFRA-1 (Focus on Topics 4-5) Deadline: Sep. 2nd 2014.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
SCD in Horizon 2020 Ian Collier RAL Tier 1 GridPP 33, Ambleside, August 22 nd 2014.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Research and Innovation Research and Innovation Research and Innovation Research and Innovation Research Infrastructures and Horizon 2020 The EU Framework.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Information Technology Audit
Assessment of Core Services provided to USLHC by OSG.
1 Framework Programme 7 Guide for Applicants
Common solution for the (very-)large data challenge. VLDATA Call: EINFRA-1 (Focus on Topics 4-5) Deadline: Sep. 2nd 2014.
Objective 1.2 Cloud Computing, Internet of Services and Advanced Software Engineering Arian Zwegers European Commission Information Society and Media Directorate.
The Preparatory Phase Proposal a first draft to be discussed.
European Life Sciences Infrastructure for Biological Information ELIXIR
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Future support of EGI services Tiziana Ferrari/EGI.eu Future support of EGI.
Work Programme for the specific programme for research, technological development and demonstration "Integrating and strengthening the European Research.
Brussels, 1 June 2005 WP Strategic Objective Embedded Systems Tom Bo Clausen.
Wireless Networks Breakout Session Summary September 21, 2012.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
DASISH Final Conference Common Solutions to Common Problems.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
“Thematic Priority 3” Draft Evaluation of IP + NoE.
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
1 SMEs – a priority for FP6 Barend Verachtert DG Research Unit B3 - Research and SMEs.
Page 1 Road map for e-business implementation in Extended Enterprise Project funded by the European Community under the ‘Competitive and Sustainable Growth’
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
CLARIN work packages. Conference Place yyyy-mm-dd
WP8– Governance Models Jurry de la Mar T-Systems – 26 June 2014.
This document produced by Members of the Helix Nebula Partners and Consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
RI EGI-InSPIRE RI EGI Future activities Peter Solagna – EGI.eu.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
New from the Project R.Graciani 4 th DIRAC User Workshop May, CERN Disclaimer: what it is not there is my fault.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGI Operations Tiziana Ferrari EGEE User.
WP3 Harmonization & Integration J. Lauterjung & WP 3 Group.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware A Cloud Computing Methodology Study of.
NCP Info DAY, Brussels, 23 June 2010 NCP Information Day: ICT WP Call 7 - Objective 1.3 Internet-connected Objects Alain Jaume, Deputy Head of Unit.
DIRAC 4 EGI: Report on the experience R.G. 1,3 & A.Tsaregorodtsev 2,3 1 Universitat de Barcelona 2 Centre de Physique des Particules de Marseille 3 DIRAC.
VLDATA Common solution for the (very-)large data challenge EINFRA-1, focus on topics (4) & (5)
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
Cultural Heritage in Tomorrow ’s Knowledge Society Cultural Heritage in Tomorrow ’s Knowledge Society Claude Poliart Project Officer Cultural Heritage.
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number EGI vision for the EOSC Tiziana.
FROM PRINCIPLE TO PRACTICE: Implementing the Principles for Digital Development Perspectives and Recommendations from the Practitioner Community.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
WP9– Evaluation, roadmap & development plan Rupert Lueck EMBL – 26 June
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.
IoT R&I on IoT integration and platforms INTERNET OF THINGS
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
BioExcel - Intro Erwin Laure, KTH. PDC Center for High Performance Computing BioExcel Consortium KTH Royal Institute of Technology – Sweden University.
EUB Brazil: IoT Pilots HORIZON 2020 WP EUB Brazil: IoT Pilots DG CONNECT European Commission.
EGI-InSPIRE EGI-InSPIRE RI EGI strategy towards the Open Science Commons Tiziana Ferrari EGI-InSPIRE Director at EGI.eu.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI towards H2020 Feedback (from survey)
Bob Jones EGEE Technical Director
Accessing the VI-SEEM infrastructure
EGEE Middleware Activities Overview
EOSC MODEL Pasquale Pagano CNR - ISTI
Steven Newhouse EGI-InSPIRE Project Director, EGI.eu
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Antonella Fresa Technical Coordinator
EGI – Organisation overview and outreach
EGI Webinar - Introduction -
Technical Outreach Expert
Presentation transcript:

Common solution for the (very-)large data challenge. VLDATA Call: EINFRA-1 (Focus on Topics 4-5) Deadline: Sep. 2nd 2014

1.1 Objectives The mission/vision/endgoal of VLDATA is to provide common solutions for handling large and extremely large common scientific data in a cost- effective way. This solution builds on existing pan- European e-Infrastructure and tools to provide an interoperable, efficient and sustainable platform for scientific user communities, in particular, to support a new generation of data scientists. The success of this project will secure European leadership on the development and support of big data and global data science and, therefore, will contribute to the leadership of European scientists and enterprises in many research and innovation fields

Objectives (I) O1: a "flexible and extendable" platform supporting common solutions for large scale distributed data processing and analysis, ensuring interoperability among existing e- Infrastructure providers. –O1.1: (WP2,3,4,5) provide a common solution using generic e-Infrastructure for processing large scale or extremely large scale of scientific data in a robust, efficient and cost-effective way. –O1.2: (WP6) provide a flexible and customizable platform that can be extended to cover the specific requirements of each community.

Objectives (II) O2: standardized solutions aiming to a global interoperability of open access for large-scale data processing, minimizing unnecessary large transfers –O2.1: (WP2) provide common language and standard for handling big volume of data –O2.2: (WP2,3,4,5) improve the efficiency of distributed data processing by providing smart data and computing management platform –O2.3: (WP2,3,4,5) enable effective handling of big data samples by integrating new technologies –O2.4: (WP8) assess the value of this generic solution towards relevant stakeholders: end scientists, their management, funding agencies, policy makers, companies and the society at large

Objectives (III) O3: Increase the number of users and Research Infrastructure projects making efficient use of existing e-Infrastructure resources, designing appropriate exploitation strategies and a long-term sustainability plan. –O3.1: (WP5,7) deliver ready-to-use high-quality standard products for internal and external usage, enhancing interdisciplinary data sciences at a global scale –O3.2: (WP6,9) increase the degree of the open access of large scale distributed data –O3.3: (WP9) educate new generation of data scientists and the society in general

1.2 Relation to the work programme

1.3Concept and approach (ideas) Make IT simple Simplicity: VLDATA provides an abstraction of the different Resources that are all made accessible the end user via the same interfaces. Transparency: Users are allowed to specify their Workflows/Pipelines with different levels of abstractions. The platform takes care of the necessary Resource Allocation to fulfill the required specifications. Extendibility and flexibility: VLDATA provides an API that allows users to extend the provided functionality by developing new or customized components Reliability: Quality standards and extensive validation in several scientific domains to ensure the readiness-to-use and robustness of VLDATA based solutions Scalability: Modular implementation allowing horizontal (amount of connected Resources or Users) and vertical (amount of processed Units) scaling to adapt VLDATA to the needs of each particular community or Research Infrastructure project. Smart and intelligent: building on collected experience and monitoring data, algorithm can look for optimized scheduling/searching strategies, including automated decision making based on usage traces and expectations. Cost-effective: Building up on existing well-established solutions and incrementally extending and developing to address new challenges with an evolving validated common solution, avoiding unnecessary duplicated efforts.

1.3Concept and approach (model) Model (building blocks): –Collaborative modular architecture, with multiple layers sharing the same Framework and Basic modules, allowing horizontal & vertical scaling to ensure scalability. –Open, iterative, incremental and parallel, requirement-driven development process. Agile(?) methodology. –Standard procedures for quality assurance, including security, platform integration and validation, including reference benchmarks, and release procedures in accordance with requirements for production level services. Layers: (result of 10 evolution of DIRAC development effort) –Framework: (communication, security, access control, user/group management, DBs) –Basic modules: SystemLogging, Configuration, Accounting, Monitoring –Low level modules: File Catalog, Resource Status, Request Management, Workload Management –High level modules: Data Management, Workflow Management –Interfaces: User - Resource

1.3 Concept and approach (assumption) Current solution can be evolved into the new general platform to be widely applied. Evolution from grids to clouds, but heterogeneity will increased Large degree of commonality on low-level requirements and tools between different scientific domains Fast grow of data and computing requirements almost doubling every year. Aggregated estimation close exabyte level in 5 years from now (EGI expects Cores and 1?? exabyte of scientific data by 2020). (Ref: Similar grow in number of data objects, computing units and end users (60 % of ESFRI projects completed or launched by 2015). New scientific domains are entering the digital era 4th paradigm of science, new data science is emerging ( us/collaboration/fourthparadigm/) us/collaboration/fourthparadigm/ Data is to be made openly available beyond the community that produced them, down to the citizens that might also contribute to its further processing Common development and validation provides robustness as well as cost-saving and, thus, enables sustainability

1.4 Ambition

2.1 Expected impacts DIRECT impact. –scalability, robustness (for the Research Infrastructure) –The expected impact is that participating RI projects will be able to operate their Distributed Computing Systems efficiently processing their large volume research data, making it available to their end users in reliable and cost-effective way, which couldn't be achieved before, which may lead to new way of organizing science activities, leading to significant scientific break throughs. By providing important functional components (e.g., ) which was missing from existing practices, VLDATA platform will make possible the transparent integration of resources, hiding the complexity from use, resulting in the extension of the scale of the resources Resource Infrastructure projects can utilize. This will increase the number of RI using the project tools and the number of different types of resources reachable through the tools. –simplicity (for the user: scientist/operator) –cost-efficiency (for funding agencies) –reduce the duplication efforts, maximizing the use of EU-invest e- Infrastructures, enlarging the user communities, providing efficient data processing services, providing advanced technology by integrating the state-of-the-art which reduces development cost significantly. (also the processing algorithm )

2.1 Expected impacts Indirect impact: large user community –Science –innovation –Society –industry –Citizens –policy maker –new generation data scientists On the other hand, the scale of the data challenge requires simple but intelligent solutions to integrate resources from different e-Infrastructure providers.

2.2 Measures to maximize impact

Research Infrastructures (I) Belle II: –Usage of DIRAC for the Experiment, use case presented: Common access to various platforms: Grid + cloud + cluster + HPC Support for Monitoring for Workflow management tools Integrate for the needs of other participants User interface –EU-T0: Virtual data centers / New Virtualization techniques?

Research Infrastructures (II) PAO: –Usage of DIRAC for the Experiment, data taking -> 2022 using a standard solution will help the sustainability. Extend functionality for their use case. Common access to various platforms: Grid + cloud + cluster + HPC (follow evolution of providers) in particular OSG Open Access to data –EU-T0: Data locality

Research Infrastructures (III) LHCb: –should cover Run 2 needs and target to the needs of Run 3 (DAQ Upgrade) Data rate will be increase by a factor ~5, 10 PB/year. Integration of Cloud resources. Massive data-driven Workflows for users. Data preservation (?) Resource (cpu/storage/network/...) description/monitoring/availability/management, smart allocation Smart/Intelligent/dynamic data placement strategies (network) –EU-T0: New Virtualization techniques, Resource description/monitoring/availability, Virtual data centers, Data locality

Research Infrastructures (IV) EISCAT_3D: –searching data (metadata catalog), intelligent searching (patterns recognition) –visualization, –Workflow to go from one data level to another with appropriated access rights –Training –flexible interconnection of different resources, central (HPC) + distributed (Grid/Cloud) –time constrained massive data reduction (10 PB - > 1 PB / month ??), including the possibility for users defined algorithm. EU-T0:

Research Infrastructures (V) BES III:

3.1 Work Plan (To be confirmed) WP1 Coordination (UB, Spain) –External Advisory board (EUDAT, OGF, RDA, OSG, PRACE, XSEDE, CERN/HelixNebula) WP2 Requirement analysis & Design (CU, UK) WP3 Data-driven development( UB, Spain) WP4 User-driven development( CYFRONET, Poland) WP5 Quality ( UAB, Spain) WP6 Validation (????) –LHCb (CNRS/INFN) –Belle II (Institut Jozef Stefan, UniMB, Mariborand UniLJ,Slovenia) –EISCAT_3D (SNIC, Sweden/EISCAT Science Associate) –PAO (CESNET, Czech Republic) –BES III (IHEP, China/INFN-Torino, Italy) –DIRAC 4 EGI, multi-community solution EGI ( EGI.eu, the Netherlands) WP7 Dissemination: outreach + Training (CNRS, France) WP8 Exploitation (ASCAMM, Spain) WP9 Communication, Internationalization (UvA, the Netherlands)

3.2 Management structure and procedures Coordinator Tech. Coordinator Consortium Board (all partners) Executive Board (1 Representative from each Area) External Advisory Board Integration/Opera tions WPs (6) Design/Develop WPs (2,3,4,5) Communication/ Sustainability WPs (7,8,9) Internal Communities’ Coordinators Internal Communities’ Coordinators External Communities’ Coordinators External Communities’ Coordinators Project Manager Comm./Exploit. Coordinator Comm./Exploit. Coordinator

3.3 Consortium as a whole

Private Companies Bull/Dell (??) ETL (UK) AlpesLaser (CH)

3.4 Resources to be committed

Calendar (milestones) May 23: Close the Contractors June 11-13: all WP ready, F2F meeting to close the Work plan. Deadline for RIs and third Parties July 9-11: Close proposal (I) July 25: Proof read -> External review Aug 18 -> Sep 2: final upates