Update on replica management

Slides:



Advertisements
Similar presentations
SDN + Storage.
Advertisements

Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
IoP HEPP 2004 Birmingham, 7/4/04 David Cameron, University of Glasgow 1 Simulation of Replica Optimisation Strategies for Data.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Operating Systems 1 K. Salah Module 2.1: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Large Scale File Distribution Troy Raeder & Tanya Peters.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Chapter 2 Memory Management: Early Systems (all ancient history)
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Lecture Week 3 Introduction to Dynamic Routing Protocol Routing Protocols and Concepts.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CSC 360- Instructor: K. Wu CPU Scheduling. CSC 360- Instructor: K. Wu Agenda 1.What is CPU scheduling? 2.CPU burst distribution 3.CPU scheduler and dispatcher.
ALICE data access WLCG data WG revival 4 October 2013.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
CY2003 Computer Systems Lecture 09 Memory Management.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Chapter 2 Memory Management: Early Systems Understanding Operating Systems, Fourth Edition.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
Event Data History David Adams BNL Atlas Software Week December 2001.
N EWS OF M ON ALISA SITE MONITORING
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Computer Systems Week 14: Memory Management Amanda Oddie.
Operating Systems 1 K. Salah Module 2.2: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Storage discovery in AliEn
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Federating Data in the ALICE Experiment
Data Formats and Impact on Federated Access
ALICE internal and external network
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
The ADC Operations Story
Storage elements discovery
湖南大学-信息科学与工程学院-计算机与科学系
Providing Secure Storage on the Internet
An Introduction to Computer Networking
Presentation transcript:

Update on replica management

Replica discovery algorithm To choose the best SE for any operation (upload, download, transfer) we rely on a distance metric: ◦ Based on the network distance between the client and all known IPs of the SE ◦ Altered by current SE status  Writing: usage + weighted write reliability history  Reading: weighted read reliability history ◦ Static promotion/demotion factors per SE ◦ Small random factor for democratic distribution Update on replica management

Network distance metric distance(IP1, IP2) = same C-class network same DNS domain name same AS f(RTT(IP1,IP2)), if known same country + f(RTT(AS(IP1), AS(IP2))) same continent f(RTT(AS(IP1), AS(IP2))) far, far away Update on replica management

Network topology Update on replica management

SE status component Driven by the functional add/get tests (12/day) Failing last test => heavy demotion Distance increases with a reliability factor: ◦ ¾ last day failures + ¼ last week failures ◦ The remaining free space is also taken into account for writing with: ◦ f(ln(free space / 5TB))  Storages with a lot of free space are slightly promoted (cap on promotion), while the ones running out of space are strongly demoted Update on replica management

What we gained Maintenance-free system ◦ Automatic discovery of resources combined with monitoring data Efficient file upload and access ◦ From the use of well-connected, functional SEs ◦ Local copy is always preferred for reading, unless there is a problem with it, and then the other copies are also close by (RTT is critical for remote reading) ◦ Writing falls back to even more remote locations until the initial requirements are met Update on replica management

Effects on the data distribution Raw data-derived files stay clustered around CERN and the T1 that holds a copy ◦ job splitting is thus efficient Update on replica management

Effects on MC data distribution Some simulation results are spread on ~all sites and in various combination of SEs ◦ yielding inefficient job splitting ◦ this translates in more merging stages for the analysis ◦ affecting some analysis types ◦ overhead from more, short jobs ◦ no consequence for job CPU efficiency Very bad case Update on replica management

Merging stages impact on trains Merging stages are a minor contributor to the analysis turnaround time (few jobs, high priority) Factors that do affect the turnaround: ◦ Many trains starting at the same time in an already saturated environment ◦ Sub-optimal splitting with its overhead ◦ Resubmission of few pathological cases The cut-off parameters in LPM could be used, with the price of 2 out of 7413 jobs the above analysis would finish in 5h Update on replica management

How to fix the MC case Old data: consolidate replica sets in larger, identical baskets for the job optimizer to optimally split ◦ With Markus’ help we are now in the testing phase on a large data set for a particularly bad train  155 runs, 58K LFNs (7.5TB), 43K transfers (1.8TB)  target: 20 files / basket ◦ Waiting for the next departure to evaluate the effect on the overall turnaround time of this train Update on replica management

How to fix the MC case (2) Algorithm tries to find the least amount of operations that would yield large enough baskets Taking SE distance into account (same kind of metric as for the discovery, in particular usage is also considered, keeps data nearby for fallbacks etc) jAliEn can now move replicas (delete after copy), copy to several SEs at the same time, delayed retries ◦ TODO: implement a “master transfer” to optimize the two stages of the algorithm (first copy&move operations, delete the extra replicas at the end) Update on replica management

Option for future MC productions Miguel’s implementation of output location extension: ◦ The distance to the indicated SEs is altered with +/- 1 ◦ after the initial discovery, so broken SEs are eliminated, location is still taken into account The set should be: ◦ large enough (ln(subjobs) ?) ◦ set at submission time per masterjob ◦ with a different value each time, eg: ◦ space- and reliability-weighted random set of working SEs Caveats: ◦ Inefficiencies for writing and reading ◦ Not using the entire storage space, and later on not using all the available CPUs for analysis (though a large production would) Update on replica management

Summary Two possibilities to optimize replica placement (with the current optimizer) ◦ Implement in LPM the algorithm described before ◦ Trigger the consolidation algorithm at the end of a production/job And/or fix the “se_advanced” splitting method so the SE sets become irrelevant Update on replica management