CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.

Slides:



Advertisements
Similar presentations
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR Status Alberto Pace.
Advertisements

13,000 Jobs and counting…. Advertising and Data Platform Our System.
CERN Castor external operation meeting – November 2006 Olof Bärring CERN / IT.
16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
CASTOR Upgrade, Testing and Issues Shaun de Witt GRIDPP August 2010.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status Tony Cass (With thanks to Miguel Coelho dos Santos & Alex Iribarren) LCG-LHCC.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR Operational experiences HEPiX Taiwan Oct Miguel Coelho dos Santos.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Operation of CASTOR at RAL Tier1 Review November 2007 Bonny Strong.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Castor development status Alberto Pace LCG-LHCC Referees Meeting, May 5 th, 2008 DRAFT.
CASTOR Databases at RAL Carmine Cioffi Database Administrator and Developer Castor Face to Face, RAL February 2009.
Functional description Detailed view of the system Status and features Castor Readiness Review – June 2006 Giuseppe Lo Presti, Olof Bärring CERN / IT.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
CERN IT Department CH-1211 Geneva 23 Switzerland t IT/DB Tests and evolution SSD as flash cache.
A Brief Documentation.  Provides basic information about connection, server, and client.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
CERN - IT Department CH-1211 Genève 23 Switzerland t CASTOR Status March 19 th 2007 CASTOR dev+ops teams Presented by Germán Cancio.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Update on Windows 7 at CERN & Remote Desktop.
Report from CASTOR external operations F2F meeting held at RAL in February Barbara Martelli INFN - CNAF.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
New stager commands Details and anatomy CASTOR external operation meeting CERN - Geneva 14/06/2005 Sebastien Ponce, CERN-IT.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
CERN IT Department CH-1211 Genève 23 Switzerland t DBA Experience in a multiple RAC environment DM Technical Meeting, Feb 2008 Miguel Anjo.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Future Plans at RAL Tier 1 Shaun de Witt. Introduction Current Set-Up Short term plans Final Configuration How we get there… How we plan/hope/pray to.
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Drupal at CERN Juraj Sucik Jarosław Polok.
CASTOR Status at RAL CASTOR External Operations Face To Face Meeting Bonny Strong 10 June 2008.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
Developments for tape CERN IT Department CH-1211 Genève 23 Switzerland t DSS Developments for tape CASTOR workshop 2012 Author: Steven Murray.
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
CASTOR new stager proposal CASTOR users’ meeting 24/06/2003 The CASTOR team.
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR Overview.
Castor dev Overview Castor external operation meeting – November 2006 Sebastien Ponce CERN / IT.
Jean-Philippe Baud, IT-GD, CERN November 2007
Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.
Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna)
CERN-Russia Collaboration in CASTOR Development
Castor services at the Tier-0
PES Lessons learned from large scale LSF scalability tests
The INFN Tier-1 Storage Implementation
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Outline Certification (functionality) setup Stress testing setup Current stress tests

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Certification Setup (CERT2) DLF NS & CUPVStager Database Frontend HeadnodeNS & CUPV Name server Synchronisation Mover callbacks Hostnames: lxc2disk[21-24] 4 diskservers in total 10 filesystems, 15GB each, 150GB in total per diskserver No RAID Single core, 1.8GB of RAM SLC4, x86_64 ONLY! Hostnames: lxcastorsrv101(NS & CUPV) lxcastordev10 (aliased: castorcert2) Dual core, 3.7GB of RAM No RAID SLC4, x86_64 ONLY! Running secure and unsecure services Note: NS & CUPV frontend is shared across all castor development machines! Hostname: castordev64 ORACLE Non RAC based. RHES4, x86_64 4GB of RAM 500GB disk space Not dedicated, also contains the schemas for: All certification stager and DLF databases Three development setups The development NS and CUPV Development SRM, Repack and VDQM NO TAPE CERTIFICATION!! DISKCACHE ONLY VIRTUAL MACHINES 1 Physical machine, 8 Cores, 16GB of RAM “CASTOR in a BOX” 1

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Certification Setup - Notes Key principle: “Everything on Virtual Machines” Why? –Hardware is expensive! (currently: 7 Dom0’s = 40 DomU’s) –Minimise on idle resources. –Reduce power consumption. All CASTOR releases are functionality tested on: –CASTORCERT (-24) –CASTORCERT (-6) + XROOT ( ) SLC4, 64 bit guests ONLY!!! Fully quattor’ised (CERT reinstall < 2 hours) Test suites are evolving all the time. ~ 250 test cases (mainly functionality) Future plans: –Development setup for tape –2 additional certification setups for SLC5 (CERT3 and CERT4) –4 SRM endpoints (2.7 and 2.8 series) –Nightly CVS builds, installations and tests 2

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Stress Test Setup - ITDC 42 Diskservers Out of warranty hardware (i.e > 3 years old) All SLC4, x86_64 Configured in 5 services classes: (default, itdc, diskonly, replicate, repack) Mixed hardware types (non homogeneous). Therefore diskservers have:  Different capacities  Different number of filesystems  Different performance results Two databases: c2itdcstgdb (stager): ORACLE Identical to production VO databases but non-RAC based (i.e. Single node) c2itdcdlfdb (DLF): ORACLE RHES4, x86_64 4GB of RAM 500GB disk space Headnode 1: jobManager, rmMasterDaemon (master), stager, nsdaemon, repackserver, LSF (master), rhserver (internal/private) Headnode 2: jobManager, rmMasterDaemon (slave), stager, LSF (slave), rhserver (public), expertd, mighunter, rechandler, rtcpclientd, dlfserver Hardware: 8 Core, 16GB SLC4, x86_64 Production Headnode 1Headnode 2 c2itdcstgdbDLF Stager c2itdcdlfdb Tape Storage Central Services Production 300 clients nodes OUTDATED!!! 3

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Stress Test Setup - Notes Diskservers with hardware problems are retired (Good testing for admin tools ) ORACLE database for the stager is: –Less powerful than production –Runs on a single node (Why? Provides more detailed deadlock trace files) Runs the lemon-sensor-castormon. –Provides detailed profiling of memory, file descriptor usage, thread activity, nb core dumps and key performance indicators. Access to 300 dedicated lxbatch nodes, 4-8 cores each. –Heavily dependant on resource allocation plans. –On average the stress test runs with 100 nodes. Future plans: –Installation of an additional 20 diskservers for SLC5 tests. –Split of resources into 2 stress test setups (SLC4, SLC5, 2.1.[8|9]) 4

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Current tests Several categories of tests: –Throughput tests, maximum IO rates. –Database loading tests, max ops/sec. –Scheduling throughput. –Dedicated activity tests. Throughput tests: –Require enough clients to saturation diskserver connectivity. –File writing, 100M, 2G, 10G in a loop. –Depending on the test case, files maybe deleted or read (0..n) times after creation, from the same service class or different service classes. –Tape is involved when appropriate. Database load tests: –Designed to stress the stager database. –Exposes row lock contention and potential deadlocks. 5

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Current tests II –Requires very few physical clients, but many threads!! –Typical test, mass file queries on a list of files. Scheduling throughput: –Designed to reach LSF’s maximum submission rate of 20 jobs per second. –Very small files, 3k but 1000’s of clients. x2 the number of available job slots. –Includes replicationOnClose functionality tests D2T0. Dedicated activity tests: –Repack. –Tape both migration and recall. –Race condition tests (small no of files, random commands) –Diskserver draining tools. –The killer! As many tests as possible running simultaneously, all service classes used, all job slots occupied. 6

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Conclusion Many test scripts exist (/afs/cern.ch/project/castor/stresstest) –Heavily customized to the CERN environment. –Each test should run for hours. Requires expert knowledge. There is no green light at the end! Not all elements of CASTOR are tested! Tests are customized for each CASTOR version. Requires effort to find bugs! –Looking at database states. –Reviewing errors in log files. –Hunting for inconsistencies. –Reviewing monitoring information. 7

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Questions? 8