LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Slides:



Advertisements
Similar presentations
IWR Ideen werden Realität Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Institut für Wissenschaftliches Rechnen Status of Database Services.
Advertisements

DB server limits (process/sessions) Carlos Fernando Gamboa, BNL Andrew Wong, TRIUMF WLCG Collaboration Workshop, CERN Geneva, April 2008.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Oracle Clustering and Replication Technologies CCR Workshop - Otranto Barbara Martelli Gianluca Peco.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
NL Service Challenge Plans Kors Bos, Sander Klous, Davide Salomoni (NIKHEF) Pieter de Boer, Mark van de Sanden, Huub Stoffers, Ron Trompert, Jules Wolfrat.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
Disk Farms at Jefferson Lab Bryan Hess
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
LHCb File-Metadata: Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 04 July 2006.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Oracle Clustering and Replication Technologies UK Metadata Workshop - Oxford Barbara Martelli Gianluca Peco.
Lessons learned administering a larger setup for LHCb
Servizi core INFN Grid presso il CNAF: setup attuale
Jean-Philippe Baud, IT-GD, CERN November 2007
LCG Storage Management Workshop, CERN, 7th April 2005
Experience of Lustre at QMUL
Dirk Duellmann CERN IT/PSS and 3D
Database Replication and Monitoring
LCG Service Challenge: Planning and Milestones
Diskpool and cloud storage benchmarks used in IT-DSS
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Database Services at CERN Status Update
3D Application Tests Application test proposals
BDII Performance Tests
Database Readiness Workshop Intro & Goals
LCG 3D and Oracle Cluster
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Workshop Summary Dirk Duellmann.
Oracle Storage Performance Studies
Experience with GPFS and StoRM at the INFN Tier-1
Oracle Streams Performance
Scalable Database Services for Physics: Oracle 10g RAC on Linux
High-Performance Storage System for the LHCb Experiment
Presentation transcript:

LFC Replication Tests LCG 3D Workshop Barbara Martelli

Objectives of LFC Replication Tests  Understand if and how the Streams replication impacts LFC behaviour.  Understand if the throughput achievable in terms of number of entries inserted per second is suitable for LHCb needs.  Understand if the sustained rate achievable in terms of number of entries inserted per second is suitable for LHCb needs.  Mesure the delay of replication for a particular entry.  Mesure the max throughput achievable in our configuration.  Mesure the max sustained rate achievable in our configuration.  Compare the read performances between present setup and Streamed setup (hope they’ll improve with a replica).

LHCb Access Pattern on LFC  At the moment LFC is used for DC06 MC production Stripping Analysis  Really difficult to estimate access pattern for the future, but we can make a snapshot of what happens today  Read access (end 2006) 10M PFNs expected, read access mainly for analisis, one average user starts O(100) jobs. Each job contacts LFC twice: once for DIRAC optimization, once at the aim of creating an XML POOL slice that will be used by the application to access data. Every 15 minutes 1000 users are expected to submit jobs contacting the LFC 200 times. 24*4*1000*200 ~ 20M LFC requests for analisis. 200Hz Read Only Requests.  Write access (today) MC Production: inserts per day DC06: About 40MB/s transfers from CERN to T1s, file size is about 100MB -> one replicated file every 3 seconds. Every 30 files processed, 2 are created. So we can expect about 1Hz for Write Access.

LFC Local Test Description (Feasibility test)  40 LFC clients, 40 LFC daemons threads, streams pool.  Client’s actions Control if LFN exists into the database  Select from cns_file_metadata If yes -> add a sfn for that lfn  Insert sfn into cns_file_replica If not -> add both lfn and sfn  Insert lfn into cns_file_metadata  Insert sfn into cns_file_replica For each lfn 3 sfn are inserted

LFC Master HW Configuration Gigabit Switch Private LHCB link rac-lhcb-01 rac-lhcb-02 Dell 224F 14 x 73GB disks ASM Dual Xeon 3,2GHz,4GB memory 2nodes-RAC on Oracle 10gR2 RHEL 4 kernel ELsmp 14 Fibre Channel disks (73GB each) HBA Qlogic Qla2340 – Brocade FC Switch Disk storage managed with Oracle ASM (striping and mirroring)

LFC Slave Configuration  LFC Read only replica Dual Xeon 2.4, 2GB RAM Oracle 10gR2 (oracle RAC but used as single instance) RHEL 3 kernel x 250GB disks in RAID 5 HBA Qlogic Qla2340 – Brocade FC Switch Disk storage formatted with OCFS2

Performance About 75 transactions per second on each cluster node. Inserted and replicated 1700k entries in 4 hours (118 insert per second). Almost real-time replica with Oracle Streams without significant delays (<< 1s).

CPU load on cluster nodes is far from being saturated.

CERN to CNAF LFC Replication  At CERN: 2 LFC servers connected to the same LFC Master DB Backend (single instance).  At CNAF: 1 LFC server connected to the replica DB Backend (single instance).  Oracle Streams send entries from the Master DB at CERN to the replica DB at CNAF.  Population Clients: python script which starts N parallel clients. The clients write entries and replicas into the Master LFC at CERN.  Read Only Clients: python script which reads entries from the master and from the replica LFC.

LFC Replication Testbed LFC Read-Only Server LFC Oracle Server Replica DB LFC R-W Server LFC Oracle Server Master DB LFC R-W Server Population Clients Oracle Streams rls1r1.cern.ch lxb0716.cern.ch lxb0717.cern.ch Read Only Clients lfc-streams.cr.cnaf.infn.it lfc-replica.cr.cnaf.infn.it WAN

Test 1: 40 Parallel Clients  40 parallel clients equally divided between the two LFC master servers.  Inserted 3700 replicas per minute during the first two hours.  Very good performance at the beginnig, but after few hours the master fall into a Flow Control state.  Flow Control means that the master is notified by the client that the update rate is too fast. Master slows down to avoid Spill Over at client side.  Spill Over means that the buffer of the Streams queue is full, so Oracle has to write the entries into the disk (persistent part of the queue). This decreases performances.  Apply side of Streams replication (slave) is usually slower than the master side, we argue that is necessary to decrease the insert rate to achieve good sustained performance.

Test 2: 20 Parallel Clients  20 parallel clients equally divided between the two LFC master servers.  Inserted 3000 replicas per minute, 50 replicas per second.  Apply parallelism enhanced: 4 parallel apply processes on the slave.  After some hours the rate decreases, but reaches a stable state at 33 replicas per second.  Achieved sustained rate of 33 replicas per second.  No flow control on the master has been detected.

Conclusions  Even this test setup is less powerful than the production one, sustained insertion rate is even higher than LHCb needs.  Need to test read random access to understand if and how the replication impacts the response time.  Could be interesting understand which is the best replication rate achievable whith this setup, even if not requested by the experiments.