Simone Campana CERN IT-ES

Slides:

Advertisements

Similar presentations

From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.

Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.

1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)

1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.

Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.

PostgreSQL Replicator – easy way to build a distributed Postgres database Irina Sourikova PHENIX collaboration.

RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -

Disaster RTO SLAYearlyMonthly 99%3.65 days7.20 hours 99.5%1.83 days3.60 hours 99.9%8.76 hours43.8 minutes 99.95%4.38 hours21.56.

Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

AMOD Report Simone Campana CERN IT-ES. Grid Services A very good week for sites – No major issues for T1s and T2s The only one to report is

Module 12: Designing High Availability in Windows Server ® 2008.

Database Design – Lecture 16

ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)

Databases March 14, /14/2003Implementation Review2 Goals for Database Architecture Changes Simplify hardware architecture Improve performance Improve.

MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.

INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.

INFNGrid Constanza Project: Status Report A.Domenici, F.Donno, L.Iannone, G.Pucciani, H.Stockinger CNAF, 6 December 2004 WP3-WP5 FIRB meeting.

CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.

Transaction-based Grid Data Replication Using OGSA-DAI Presented by Yin Chen February 2007.

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

DELETION SERVICE ISSUES ADC Development meeting

1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.

Databases for data management in PHENIX Irina Sourikova Brookhaven National Laboratory for the PHENIX collaboration.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

CERN IT Department CH-1211 Genève 23 Switzerland PES 1 Ermis service for DNS Load Balancer configuration HEPiX Fall 2014 Aris Angelogiannopoulos,

Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.

EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)

The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.

CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.

DDM Central Catalogs and Central Database Pedro Salgado.

Distributed Data Management Miguel Branco 1 DQ2 status & plans BNL workshop October 3, 2007.

Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.

DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

INFSO-RI Enabling Grids for E-sciencE Running reliable services: the LFC at CERN Sophie Lemaitre

Lessons learned administering a larger setup for LHCb

a brief summary for users

QC-specific database(s) vs aggregated data database(s) Outline

Jean-Philippe Baud, IT-GD, CERN November 2007

Chapter Name Replication and Mobile Databases Transparencies

Backup and Recovery for Hadoop: Plans, Survey and User Inputs

LFC consolidation STEP-0

Ricardo Rocha ( on behalf of the DPM team )

Virtualization and Clouds ATLAS position

ATLAS Use and Experience of FTS

U.S. ATLAS Grid Production Experience

Database Services at CERN Status Update

3D Application Tests Application test proposals

Storage Interfaces and Access: Introduction

Collecting heterogeneous data into a central repository

LFC Status and Futures INFN T1+T2 Cloud Workshop

LCG Distributed Deployment of Databases A Project Proposal

Migrating Oracle Forms Using Oracle Application Express

CTA: CERN Tape Archive Overview and architecture

Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016

Data Management cluster summary

Microsoft Azure P wer Lunch

Arrested by the CAP Handling Data in Distributed Systems

EECS 498 Introduction to Distributed Systems Fall 2017

Zerto Virtual Replication 5.0

Site availability Dec. 19 th 2006

Presentation transcript:

Simone Campana CERN IT-ES Plans for LFC Simone Campana CERN IT-ES

Current Model LFC in DDM is used to map a GUID to a SURL i.e. map the logical file to a physical one DDM design: one catalog per endpoint LFC/SE consistency up to the site LFC could have even been merged with SE In reality Maintaining an LFC was load for small sites LFC/SE consistency non trivial (and requires dedication) The current deployment model: one LFC per T1 serving the all cloud There are exceptions (US T2s)

Issues of current model Each LFC is a single point of failure for the cloud Replication scenarios are non trivial Lots of duplicated information between catalogs LFC vs LFC LFC vs DDM kept lazily consistent at the client level LFC/SE consistency non trivial Many SEs for one LFC

LFC consolidation proposal Merge all LFCs into one Its natural location would be CERN In the same rack as DDM Central Catalogs Replicate the master LFC to one (more) T1s Provides a live backup in case of disaster Slave can become master at any point Can be used for other purposes Generate catalog dumps for example In a longer term, LFC and DDM Central Catalogs can be merged Many commonalities for file tables

Concerns Performance: can one LFC sustain all the load? New developments in LFC (SSL sessions) will help Still… Migration should be gradual One cloud at the time (start with 5% T1s) At each step we should carefully evaluate the load on LFC and the error rate from SS and Panda Asynchronous registration would improve the workflow Can Panda server register the outputs? Can we use ActiveMQ?

Concerns Scalability: would a central LFC scale with the number of entries Partitioning at the logical file level (GUID) Per time interval 90% of ATLAS GUIDs embed creation time information Partitioning at the replica (SURL) level Per site Will allow to generate fast dumps

Operational impact Need to agree about DB service in CERN IT Resources, manpower impact .. Need to write tools for migration Import and merging Both from Oracle and MySQL backends In a “rolling” fashion to minimize downtime impact LFC experts can help on this Need some support from T1s at the time of the migration Provide dumps, access, … With a well thought plan, estimate less than a couple of days of downtime per T1 during migration

Conclusions Consolidating LFCs would We have to be careful to Simplify the scenario for disaster recovery and prolonged downtimes Simplify the DDM architecture Simplify operations We have to be careful to Performance impact Scalability impact Migration operational impact This is why the proposal is a staged rollout and constant feedback loops In order to get support from service providers and developers we need a concrete and well defined plan Can we agree?