1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.

Slides:

Advertisements

Similar presentations

The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.

Advertisements

Handling Web Hotspots at Dynamic Content Web Sites Using DotSlash Weibin Zhao Henning Schulzrinne Columbia University Dagstuhl.

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

Distributed File Systems

Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,

Bellwether: Surrogate Services for Popular Content Duane Wessels & Ted Hardie NANOG 19 June 12, 2000.

ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.

Introduction: Distributed POOL File Access Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting.

LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.

SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.

1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

ATLAS Scalability Tests of Tier-1 Database Replicas WLCG Collaboration Workshop (Tier0/Tier1/Tier2) Victoria, British Columbia, Canada September 1-2, 2007.

ATLAS Database Operations Invited talk at the XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, September 2007 Alexandre.

2nd September Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance.

Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:

What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.

Hosting an Enterprise Financial Forecasting Application with Terminal Server Published: June 2003.

Investigating the Performance of Audio/Video Service Architecture II: Broker Network Ahmet Uyar & Geoffrey Fox Tuesday, May 17th, 2005 The 2005 International.

Database Administrator RAL Proposed Workshop Goals Dirk Duellmann, CERN.

EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,

Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

LFC Replication Tests LCG 3D Workshop Barbara Martelli.

1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,

08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.

10/2005COOL read workload1 COOL verification Reconstruction stress testing David Front, Weizmann Institute It works well! Doesn’t it?

Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.

11th November Richard Hawkings Richard Hawkings (CERN) ATLAS reconstruction jobs & conditions DB access  Conditions database basic concepts  Types.

for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.

Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.

FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1September 2009 ATLASFroNTier chache consistency stress testing.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?

CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.

ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.

Abhinav Kamra, Vishal Misra CS Department Columbia University

Database Replication and Monitoring

Belle II Physics Analysis Center at TIFR

Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007

Diskpool and cloud storage benchmarks used in IT-DSS

3D Application Tests Application test proposals

BDII Performance Tests

Scalability to Hundreds of Clients in HEP Object Databases

RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne

David Front Weizmann institute May 2007

Conditions Data access using FroNTier Squid cache Server

June 2011 David Front Weizmann Institute

Cloud based Open Source Backup/Restore Tool

Introduction to client/server architecture

Grid Canada Testbed using HEP applications

Support for ”interactive batch”

Admission Control and Request Scheduling in E-Commerce Web Sites

Small number of IOVs – no server stress

Presentation transcript:

1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN 26 Jan 07 David Front Weizmann Institute

2 Database mini workshop: reconstressing contents ● goals ● testing means ● results ● conclusions ● resources/further testing ● related work

3 Database mini workshop: reconstressing goals ● Stress test 100 athena reconstruction clients, each reading the same COOL data ● Reading client is either local to data-server site or remote ● Compare reading means: Oracle/squid/sqlite ● Compare frontier compression levels ● Supply input/ recommend: will Atlas benefit from using squid?

4 Database mini workshop: reconstressing testing means - servers/clients ● Oracle server: DB cooldev – Host: lxfsrk402, Intel Pentium III CPU 1133MHz, 2 CPUs, 1GB ● Frontier/squid server: – Host: lxb5556 Intel Xeon CPU 2.80GHz, 2 CPUs, 2GB ● Client machines: lxb machines: – Pentium III (Coppermine), 999 Mhz, 2 CPUs, 512 MB – 1 'manager' to spawn/monitor work + 4 'worker nodes'.

5 Database mini workshop: reconstressing testing means – SW The client – Richard Hawkings's client, that queries data needed for one reconstruction job: ~50 COOL folders, amounting to ~10MB of payload data 'Verification client' – Framework/scripts to run stress tests on multiple machines, monitor and present results: ● 'Test variables' – things that changed between tests: – Max number of clients running at the same time: 1 – 50 – Connection type: Oracle, squid, sqlite ● Scheduling enhanced not to spawn a new client if load>10 ● Added lemon graphs of involved (server/client) machines.

6 Database mini workshop: reconstressing results In the following graphs: Y access is the average of measures taken by each client X access is max number of clients that should run, the different access modes to the data appear at different colors. where all modes agree, we see only the color of the latest mode

7 Database mini workshop: reconstressing amount of clients that passed For each unit at X axis, 5 clients should run, once after the other. Hence, expected passed are: (1x5=)5, 100, 150, 200, 250 expected:

8 Database mini workshop: reconstressing elapsed time to read - 'Elapsed' is the average number of seconds that each client runs, - Elapsed tends to grow with max-num-clients up to hundreds of seconds - At bigger numbers of clients, squid does better than Oracle. - sqlite is the fastest, but at 50 clients its advantage becomes smaller - A few hundreds of seconds is acceptable, however ‘remote’ clients are expected to have TBD higher ‘elapsed’

9 Database mini workshop: reconstressing throughput - MB/sec * Combined throughput saturates at clients. * The higher the number of clients – the more squid seems to out-perform oracle. * sqlite does better than squid and Oracle (because it uses a copy of the DB on the file system, ) * TBD: Sqlite slowdown at 40/50 clients may be due to client CPU or AFS access constraints

10 Database mini workshop: reconstressing client host load - 'Load' is the number of processes waiting for CPU. - Client load is limited to 10 by purpose, in order to prevent overload

11 Database mini workshop: reconstressing actual number of running clients expected: - At 40 and 50, more sleeping is done before spawning clients, to avoid overload

12 Database mini workshop: reconstressing lemon: squid/frontier server write data max num clients: The amount of writing for 50 squid clients does not seem to be bigger than for 40 clients

13 Database mini workshop: reconstressing lemon: squid/frontier server load plot error, host down Load of squid/frontier server is low, never above 3

14 Database mini workshop: reconstressing Oracle server write data max num clients: Oracle-client reading shows well, Squid reads hardly show at Oracle server (, because of caching)

15 Database mini workshop: reconstressing Oracle server load - Oracle server more loaded than squid/frontier server - At arrow, squid reading does seem to coerce visible load on Oracle server

16 Database mini workshop: reconstressing Oracle server CPU max num clients squid: max clients Oracle: Oracle reads consume more Oracle CPU than squid, but the amount of CPU consumed by squid is higher relative to its low traffic, (maybe because frontier does not use bind variables)

17 Database mini workshop: reconstressing results summary ● Scaling of this testing environment saturates around clients ● 'elapsed' time tends to grow to hundreds of seconds ● squid MEM_HIT snapshot: 93%, meaning that squid caching works well (in particular, the 'select 1 from dual' queries have been observed to be cached) ● Performance bottlenecks: – Clients – sleep to avoid high self load. More clients are needed to further scale – Squid/frontier server – was not highly loaded - did not reach full capacity – Oracle server – worked harder than squid server, but not fully utilized. TBD: Learn Oracle waits events, looking for potential performance improvements.

18 Database mini workshop: reconstressing ● Stress test 100 athena reconstruction clients? – The available testing environment saturates around 50 clients – With actual stronger resources, I guess that 100 is feasible – Care should be taken to make sure that 'elapsed' is acceptable ● Compare data location: remote/local, and frontier compression level – Not done yet – I guess that working remotely, and optimizing compression level, will work in favor of squid conclusions

19 Database mini workshop: reconstressing ● Compare reading means: Oracle/squid/sqlite – squid seems to scale better than Oracle – sqlite is the fastest, but its elapsed grow fast around 50 clients (hitting client load limit) ● Supply input to decide if Atlas will benefit from using squid – For a conclusive recommendation, better do remote testing also – Yet, looking into CMS positive experience and assuming that no hard Atlas show-stoppers appeared at this testing, using squid for tier-2 retrieval seems to me reasonable more conclusions

20 Database mini workshop: reconstressing testing resources/plans - local ● lxb clients at CERN are scarce and 'local', yet easier to configure/control/monitor than 'remote' machines. ● Hence, it is planned to continue 'local' tests, shortly: – control squid compression level – (partially) simulate network overhead by causing local traffic between client and server to go via remote site, or simply sleep – ? Repeat tests with new CORAL version with enhanced frontier performance

21 Database mini workshop: reconstressing testing resources/plans - remote ● Similar testing is planned to be done remotely: – Weizmann may allocate some client resources – Paul Millar from University of Glasgow suggested to arrange allocation of machines for testing

22 Database mini workshop: reconstressing related work ● Richard Hansen is working on causing local traffic between client and server to go via remote site, or simply sleep. ● Submitting similar jobs via the grid simulates the actual workload better than submitting jobs from dedicated clients. Stefan Stonjek and Sasha Vanyashin are looking into that direction.

23 Database mini workshop: reconstressing link ● Raw data related to this presentation is available at at the line: athenaRead_ _ https://webafs3.cern.ch/dfront/cool/vc133new/