L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler SW distribution tests at Lyon Pierre Girard Luisa Arrabito, David Bouvet.

Slides:



Advertisements
Similar presentations
July 2012 Q2 JANUARY TO JUNE 2012 STATISTICS SRI LANKA FREIGHT FORWARDERS ASSOCIATION.
Advertisements

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Delivering Experiment Software to WLCG sites A new approach using the CernVM Filesystem (cvmfs) Ian Collier – RAL Tier 1 HEPSYSMAN.
Line Efficiency     Percentage Month Today’s Date
Unit Number Oct 2011 Nov 2011 Dec 2011 Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012 Jun 2012 Jul 2012 Aug 2012 Sep (3/4 Unit) 7 8 Units.
Market Analysis & Forecasting Trends Businesses attempt to predict the future – need to plan ahead Why?
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
Arne Wiebalck -- VM Performance: I/O
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES CVMFS deployment status Ian Collier – STFC Stefan Roiser – CERN.
Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
CernVM-FS – Best Practice to Consolidate Global Software Distribution Catalin CONDURACHE, Ian COLLIER STFC RAL Tier-1 ISGC15, Taipei, March 2015.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
Windows Server 2008 R2 Oct 2009 Windows Server 2003
Dynamic Extension of the INFN Tier-1 on external resources
Deployment timelines LHCb CMS ATLAS 2007 Dec Nov Oct Sep Aug Jul Jun
Jan 2016 Solar Lunar Data.
Monthly Report For January 2017

Update on Plan for KISTI-GSDC
Grid services for CMS at CC-IN2P3
The 6 steps of data collection:
ITI Portfolio Plan Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Current Date Visibility of ITI Projects ITI Projects.
Glexec/SCAS Pilot: IN2P3-CC status
Marketing Plan Enhancements: Phase 2 Distributor Presentation
Q1 Jan Feb Mar ENTER TEXT HERE Notes

Project timeline # 3 Step # 3 is about x, y and z # 2
Average Monthly Temperature and Rainfall



2017 Jan Sun Mon Tue Wed Thu Fri Sat

Gantt Chart Enter Year Here Activities Jan Feb Mar Apr May Jun Jul Aug
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com


Step 3 Step 2 Step 1 Put your text here Put your text here
Calendar Year 2009 Insure Oklahoma Total & Projected Enrollment
MONTH CYCLE BEGINS CYCLE ENDS DUE TO FINANCE JUL /2/2015
Jan Sun Mon Tue Wed Thu Fri Sat

©G Dear 2008 – Not to be sold/Free to use
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ITEM 1 ITEM 2 ITEM 3
Electricity Cost and Use – FY 2016 and FY 2017
Center for Vision (CVN) on Block 33 Move Planning & Logistics Timeline

Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com


Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Project timeline # 3 Step # 3 is about x, y and z # 2
TIMELINE NAME OF PROJECT Today 2016 Jan Feb Mar Apr May Jun

Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Presentation transcript:

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler SW distribution tests at Lyon Pierre Girard Luisa Arrabito, David Bouvet Yannick Perret, Xavier Canehan Suzanne Poulat, Rolf Rumler Initially presented at Jamboree LHCb March 7 th, 2011 Updated on March 25th for Atlas CAF

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Content AFS Latency story  AFS latency problem schedule  LHCb SetupProject tests  Preliminary conclusions xxx-FS client stress tests  Test suite results  CVMFS tests results Conclusions 2

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler3

AFS latency problem schedule 4 08/07: New LHCb setup timeout problems MayJuneJulyAug.Sept.Oct.Nov.Dec.Jan /05: SetupProject.sh timeout (5%) 17/06: SetupProject.sh timeout (0,5%) New AFS server version AFS Story RO AFS Volumes for LHCb SW Area New AFS client 03/11: ATLAS timeout problems (50% failures) 05/11: Atlas increased its timeout (3600s) 26/11: PARTLY SOLVED LHCb increased its timeout (3600s) Many WN crashes due to IO problem Many (temporary) freezing WNs after OS patch SL5 Story Stable SL5 WNs after new kernel patch LHCb test infrastructure setup for different tunings (AFS, SL5, LRMS) Tests of different kernel parameters Tests Story 07/01: SOLVED CCIN2P3 reduced the number of job slots on recent HW 06/09: CCIN2P3 adding new HW (24 logical cores): 110 WNs

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler LHCb job setup tests 5 Source: L. Arrabito

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler AFS Latency / LHCb job efficiency 6 All-T1s CPU efficiency during the same period Source: SIte Jan 10 Feb 10 Mar 10 Apr 10 May 10 Jun 10 Jul 10 Aug 10 Sep 10 Oct 10 Nov 10 Dec 10 Jan 11 Total CC All T1s Visible effect of adding latest (24-cores) WNs?

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Preliminary conclusions LHCb/ATLAS environment setup is very (too much) FS- intensive  By stracing SetupProject.sh  open()  stat() Investigate on job distribution strategy to avoid too many similar jobs on the same WN  According to “lhcb-alone” and “atlas-excluded” tests results AFS latency problem is now a AFS client scalability problem  Temporary solved by decreasing the number of job slots on the most recent machines, but …  Is that a major concern in the near future ? Have the other sites already experienced the same problem ? 7

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler8

Test suite conditions For each test  Walk of the same directory arborescence (DAVINCI)  Same actions are achieved in the same order  LHCb ProjectSetup-like  stat()  open() –First block is read to ensure the open() is effective Pre-loading the cache (if any) by pre-executing the test once Averaged results are taken from 4 executions 9

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler FS Test Results 10

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler CVMFS test suite conditions Different cache sizes Dedicated SQUID  Used by the tested WN only  With pre-loaded LHCb cache On CVMFS client ( ), before each test  Cache was removed  Service was restarted  Different cache sizes  « ls –lR » on sibling directories to make grow up the cache 11

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler CVMFS Cache Size Tests Results 12

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler CVMFS Min/Max results 13

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Comparison latest CVMFS (0.2.61) 14 NEW (2011/03/25)

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler15

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Conclusions LHCb/Atlas job setup should be optimized Multi-VOs sites should try to implement a fair distribution of VO’ jobs over the cluster WNs  Restrict the number of similar jobs on a WN Issue of shared FS client scalability (Most likely)  Checked with AFS, NFS4.0, and CVMFS  Tests must go on  NFS4.1 (pNFS) still to be tested  With other HWs (for now, only Poweredge C6100)  By virtualizing the WNs (“divide and rule” principle) –First attempt was achieved by basically splitting 24-cores WN into 2x(12-cores VM-WN) –Must be further investigated CVMFS  Interesting for VO SW distribution (without installation job)  But, take care that latency increases with cache size 16

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Questions & Comments 17

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler18

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Other T1s CPU Efficiency From January 2010 to January CERN KIT PIC CNAF NL-T1 RAL Source:

L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler Virtualized WN: divide and rule ? 20