D.Cesini – INFN-CNAF Bari – 28/09/2017

Slides:



Advertisements
Similar presentations
Distributed Tier1 scenarios G. Donvito INFN-BARI.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
INFSOM-RI Elisabetta Ronchieri INFN CNAF ETICS 2 nd EU Review (CERN) 15 February 2008 WP3 - Software Configuration Tools and Methodologies.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Accessing the VI-SEEM infrastructure
J.Marco Spanish JRU EGI-ENGAGE meeting Madrid, 23 Feb 2015
Grid Computing: Running your Jobs around the World
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Discovering Computers 2010: Living in a Digital World Chapter 14
Cloud Challenges C. Loomis (CNRS/LAL) EGI-TF (Amsterdam)
Vincenzo Spinoso EGI.eu/INFN
Connected Maintenance Solution
The PaaS Layer in the INDIGO-DataCloud
INFN Computing Outlook The Bologna Initiative
Population Imaging Use Case - EuroBioImaging
INTAROS WP5 Data integration and management
Reminder : The collaboration
Exploitation and Sustainability updates
Some ideas on possible INDIGO participation to the EINFRA call
Design your e-infrastructure. egi
Donatella Castelli CNR-ISTI
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
KER - Open Data Platform
Fernando Aguilar, IFCA-CSIC
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
PaaS Core Session (Notes from UPV)
Joseph JaJa, Mike Smorul, and Sangchul Song
StoRM Architecture and Daemons
dCache Scientific Cloud
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
Connected Maintenance Solution
Patrick Fuhrmann (DESY) Benjamin Ertl (KIT) Maciej Brzezniak (PSNC)
OGSA Data Architecture Scenarios
An easier path? Customizing a “Global Solution”
Cloud Computing By P.Mahesh
PROCESS - H2020 Project Work Package WP6 JRA3
Connecting the European Grid Infrastructure to Research Communities
The Onedata platform Konrad Zemek, Krzysztof Trzepla ACC Cyfronet AGH
Case Study: Algae Bloom in a Water Reservoir
EGI Webinar - Introduction -
The XDC project Daniele Cesini
Module 01 ETICS Overview ETICS Online Tutorials
Technical Capabilities
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Client/Server Computing and Web Technologies
WP6 – EOSC integration J-F. Perrin (ILL) 15th Jan 2019
EOSC-hub Contribution to the EOSC WGs
Fabio Pasian, Marco Molinaro, Giuliano Taffoni
Presentation transcript:

D.Cesini – INFN-CNAF Bari – 28/09/2017 XDC@CCR D.Cesini – INFN-CNAF Bari – 28/09/2017

The XDC Project The eXtreme DataCloud (XDC) project will develop scalable technologies for federating storage resources and managing data in highly distributed computing environments Will be based on existing tools (TRL8+) that the project will enrich with new functionalities and plugins already available as prototypes (TRL6+) The targeted platforms are the current and next generation e-Infrastructures deployed in Europe European Open Science Cloud (EOSC) European Grid Infrastructure (EGI), the Worldwide LHC Computing Grid (WLCG) computing infrastructures funded by the upcoming EINFRA-12 call ID Partner Country Represented Community Tools and system that will be developed 1 INFN (Lead) IT HEP/WLCG INDIGO-Orchestrator, Smart caching mechanisms, Access pattern analyzer 2 DESY DE Astroparticle Physics, Research with Photons dCache, Orchestrator, Smart Caching mechanisms 3 CERN CH EOS,DYNAFED, FTS, Smart Caching mechanisms 4 AGH PL ONEDATA 5 ECRIN [ERIC] Medical data 6 UC ES Lifewatch 7 CNRS FR Astro [CTA and LSST] 8 EGI.eu NL EGI communities Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Summary of the technical topics Intelligent & Automated Dataset Distribution Orchestration to realize a policy-driven data management Data distribution policies based on Quality of Service (i.e. disks vs tape vs SSD) at infrastructure level (cross-sites) The user can specify the number of replicas and the Quality of Service associated for each of them one replica on fast storage (disks or SSDs) and two more on tape in different locations. Quality of Service can be the access latency, the retention policy or the allowed access protocols. Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Summary of the technical topics Intelligent & Automated Dataset Distribution A typical workflow Initially the data will be stored on low latency devices for fast access To ensure data safety, the data will be replicated to a second storage device and will be migrated to custodial systems, which might be tape or S3 appliances Eligible users will get permission to restore archived data if necessary After a grace period, Access Control will be changed from “private” to “open access” Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Summary of the technical topics (1/2) Data pre-processing during ingestion Automatically run user defined applications and workflows when data are uploaded i.e. for Skimming, indexing, metadata extraction, consistency checks Implement a solution to discover new data at specific locations Create the functions to request the INDIGO PaaS Orchestrator to execute specific applications on the computing resources on the Infrastructure Implement a high-level workflow engine, that will execute applications defined by the users Implement the orchestrator handler to notify the users about execution completion Implement the data mover to store the elaborated data in the final destination Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Summary of the technical topics Data management based on access patterns Move to ‘glacier-like’ storage unused data, move to fast storage “hot” data (at infrastructure level) access predictions to improve data availability Smart caching Develop a global caching infrastructure supporting the following building blocks: dynamic integration of satellite cache by existing data centres creation of standalone caches modelled on existing web solutions federation of the above to create a large scale caching infrastructure Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Smart caching scenarios Scenario1: The dynamic extension of a site to remote locations. Data stored in the original site should be accessible from the remote location in a “quasi”-transparent way from the clients’ points of view. Implemented in EOS, ONEDATA and dCache using internal namespaces and algorithms. The cache is not addressable. Scenario2. In this scenario a tactical storage will be set up as a stand-alone cache, e.g. in running squid-like services where clients access the cache directly. The cache will fetch data on a miss (or at least redirect the client). In this case, the cache is federable, as it is directly addressable. The cache namespace will be done via a federator that is not embedded into the storage systems (Dynafed in our case). Scenario3. The creation of a permanent “Virtual Data Cloud” where storage resources (Grid and Cloud) are federated in a single namespace and remote data can be accessed transparently from any location without the need of explicitly copying them on the client location. As an extension of the previous scenario, this implies the creation of a distributed and federated cache system Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Summary of the technical topics Unified data access platform at a PaaS level at the Exascale Multi-region support in ONEDATA Advanced metadata management with no pre-defined schema Encryption Services and Secure Storage Sensitive data management and key storage within ONEDATA Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Metadata handling use cases CTA LIFEWATCH The CTA distributed archive lies on the « Open Archival Information System » (OAIS) ISO standard. Event data are in files (FITS format) containing all metadata. Metadata are extracted from the ingested files, with an automatic filling of the metadata database. Metadata will be used for the further query of archive. The system should be able to manage replicas, tapes, disks, etc, with data from low-level to high-level. Metadata management to handle heterogeneous and large datasets Different data types, formats, source and ways to access e.g. Copernicus data: ~16PB per year) Used as input for water quality forecasting systems Use of standards like EML (Ecological Metadata Language) and adopting best practices like FAIR+R principles ECRIN Clinical trial data objects available for sharing with others a variety of access mechanisms wide variety of different locations growing number of general and specialised data repositories trial registries Publications the original researchers’ institutions ‘discoverability’ will become much worse in the future as more and more materials is made available for sharing Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

XDC functionalities/community /tool matrix A use case driven project… …new functionalities added on top of existing production quality services Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

XDC high level architecture Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Project Structure && Management bodies ELG is responsible for maintaining active relationships with the infrastructure and technology providers, discussing synergies, strategies, roadmaps and requirements workflow for the software released by the project. Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017

Budget 3.077 M€ total budget INFN involved in WP4 to develop: INFN 580k€ (30k€ for subcontracting) INFN involved in WP4 to develop: The PaaS-Orchestrator policy (QoS) driven data management Pre-processing workflows Smart caching mechanisms Access Pattern Analyzer INFN involved in all WPs CNAF, BA, PD, PG INFN leads WP1 and WP3 INFN will represent WLCG in WP2 CCR@BARI 28/09/2017

Starting date The starting date is currently set to Nov 1st if we manage to have all the signatures in place for the GA Proposal: joint kickoff with DEEP in Jan 2018 after the EOSC- hub kickoff Week starting on the 22nd Held in Bologna Estimating 50/60 participants (for both projects) Daniele Cesini – INFN-CNAF CCR@BARI 28/09/2017