Presentation is loading. Please wait.

Presentation is loading. Please wait.

W.A.Wojcik/CCIN2P3, May 2001 1 Running the multi-platform, multi-experiment cluster at CCIN2P3 Wojciech A. Wojcik IN2P3 Computing Center

Similar presentations


Presentation on theme: "W.A.Wojcik/CCIN2P3, May 2001 1 Running the multi-platform, multi-experiment cluster at CCIN2P3 Wojciech A. Wojcik IN2P3 Computing Center"— Presentation transcript:

1 W.A.Wojcik/CCIN2P3, May 2001 1 Running the multi-platform, multi-experiment cluster at CCIN2P3 Wojciech A. Wojcik IN2P3 Computing Center e-mail: wojcik@in2p3.fr URL: http://webcc.in2p3.fr

2 W.A.Wojcik/CCIN2P3, May 2001 2 IN2P3 Computer Center  Provides the computing and data services for the French high energy and nuclear physicists: IN2P3 – 18 physics labs (in all big towns in France) CEA/DAPNIA  French groups are involved in 35 experiments at CERN, SLAC, FNAL, BNL, DESY and other sites (also astrophysics).  Specific situation: our CC is not directly connected to experimental facilities, like CERN, FNAL, SLAC, DESY, BNL.

3 W.A.Wojcik/CCIN2P3, May 2001 3 General rules  All groups/experiments share the same interactive and batch (BQS) clusters and other type of services (disk servers, tapes, HPSS and networking). Some exceptions later …  /usr/bin and lib (OS and compilers) are local  /usr/local/* on AFS, specific for each platform  /scratch – local tmp disk space  System, group and user profiles define the proper environment

4 W.A.Wojcik/CCIN2P3, May 2001 4 General rules  User has the AFS account with access to the following AFS disk spaces: HOME - backup by CC THRONG_DIR (up to 2GB) - backup by CC GROUP_DIR (n * 2GB), no – backup  Data are on: disks (GROUP_DIR, Objectivity), tapes (xtage system) or in HPSS  Data exchange on the following media: DLT, 9480 Network (bbftp)  ssh/ssf - access to/from external domains recommended.

5 W.A.Wojcik/CCIN2P3, May 2001 5 Supported platforms  Supported platforms: 1. Linux (RedHat 6.1, kernel 2.2.17-14smp) with different egcs compilers (gcc 2.91.66, gcc 2.91.66 with patch for Objy 5.2, gcc 2.95.2 – installed on /usr/local), requested by different experiments 2. Solaris 2.6, 2.7 soon 3. AIX 4.3.2 4. HP-UX 10.20 – end of this service already announced

6 W.A.Wojcik/CCIN2P3, May 2001 6 Support for experiments  About 35 different High Energy, Astrophysics and Nuclear Physics experiments.  LHC experiments: CMS, Atlas, Alice and LHCb.  Big non-CERN experiments: BaBar, D0, STAR, PHENIX, AUGER, EROS II.

7 W.A.Wojcik/CCIN2P3, May 2001 7

8 8

9 9 Disk space  Need to make the disk storage independent of the operating system.  Disk servers based on: A3500 from Sun with 3.4 TB VSS from IBM with 2.2 TB ESS from IBM with 7.2 TB 9960 from Hitachi with 21.0 TB

10 W.A.Wojcik/CCIN2P3, May 2001 10 Mass storage  Supported medias (all in the STK robots): 3490 DLT4000/7000 9840 (Eagles) Limited support for Redwood  HPSS – local developments: Interface with RFIO: – API: C, Fortran (via cfio from CERNLIB) – API: C++ (iostream) bbftp – secure parallel ftp using RFIO interface

11 W.A.Wojcik/CCIN2P3, May 2001 11 Mass storage  HPSS – test and production services $HPSS_TEST_SERVER:/hpsstest/in2p3.fr/… $HPSS_SERVER:/hpss/in2p3.fr/…  HPSS – usage: BaBar - usage via ams/oofs and RFIO EROS II – already 1.6 TB in HPSS AUGER, D0, ATLAS, LHCb Other experiments on tests: SNovae, DELPHI, ALICE, PHENIX, CMS

12 W.A.Wojcik/CCIN2P3, May 2001 12 Networking - LAN  Fast Ethernet (100 Mb full duplex) --> to interactive and batch services  Giga Ethernet (1 Gb full duplex) --> to disk servers and Objectivity/DB server

13 W.A.Wojcik/CCIN2P3, May 2001 13 Networking - WAN  Academic public network “Renater 2” based on virtual networking (ATM) with guaranteed bandwidth (VPN on ATM)  Lyon  CERN at 34Mb (155 Mb in June 2001)  Lyon  US is going through CERN  Lyon  Esnet (via STAR TAP), 30-40 Mb, reserved for the traffic to/from ESnet, except FNAL.

14 W.A.Wojcik/CCIN2P3, May 2001 14 BAHIA - interactive front-end Based on multi-processors:  Linux (RedHat 6.1) -> 10 PentiumII450 + 12 PentiumIII1GHz (2 processors)  Solaris 2.6 -> 4 Ultra-4/E450  Solaris 2.7 -> 2 Ultra-4/E450  AIX 4.3.2 -> 6 F40  HP-UX 10.20 -> 7 HP9000/780/J282

15 W.A.Wojcik/CCIN2P3, May 2001 15 Batch system - BQS Batch based on BQS (CCIN2P3 product)  In constant development, used since 7 years  Posix compliant, platform independent (portable)  Possibilities to define the resources for the job (the class of job is calculated by scheduler as a function of): CPU time, memory CPU bound or I/O bound Platform(s) System resources: local scratch disk, stdin/out size User resources (switches, counters)

16 W.A.Wojcik/CCIN2P3, May 2001 16 Batch system - BQS  Scheduler takes into account: Targets for groups (declared twice a year for the big production runs) Consumption of cpu time in last periods: month, week, day for user and group Proper aging and interleave in the class queues  Possibility to open the worker for any combination of classes.

17 W.A.Wojcik/CCIN2P3, May 2001 17 Batch system - configuration  Linux (RedHat 6.1) -> 96 dual PIII 750MHz + 110 dual PIII1GHz  Solaris 2.6 -> 25 * Ultra60  Solaris 2.7 -> 2 * Ultra60 (test service)  AIX 4.3.2 -> 29 * RS390 + 20 * 43P-B50  HP-UX 10.20 -> 52 * HP9000/780

18 W.A.Wojcik/CCIN2P3, May 2001 18 Batch system – cpu usage

19 W.A.Wojcik/CCIN2P3, May 2001 19 Batch system – Linux cluster

20 W.A.Wojcik/CCIN2P3, May 2001 20 Regional Center for:  EROS II (Expérience de Recherches d’Objets Sombres par effet de lentilles gravitationnelles)  BaBar  Auger (PAO)  D0

21 W.A.Wojcik/CCIN2P3, May 2001 21 EROS II  Raw data (from ESO site in Chili) on DLTs (tar format).  Restructuring of the data from DLT to 3490 or 9480, creation of metadata on Oracle DB.  Data server (on development) - 7TB of data actually, 20TB at the end of experiment – using HPSS + WEB server.

22 W.A.Wojcik/CCIN2P3, May 2001 22 BaBar  AIX and HP-UX not supported by BaBar, Solaris 2.6 with Workshop 4.2 and Linux (RedHat 6.1). Solaris 2.7 in preparation.  Data are stored in ObjectivityDB, import/export of data is done using bbftp. The import/export on the tapes has been abandoned.  Objectivity (ams/oofs) servers (dedicated only to BaBar) have been installed (10 servers).  Usage of HPSS for staging the ObjectivityDB files.

23 W.A.Wojcik/CCIN2P3, May 2001 23 Experiment PAO

24 W.A.Wojcik/CCIN2P3, May 2001 24 PAO - sites

25 W.A.Wojcik/CCIN2P3, May 2001 25 PAO - AUGER  CCIN2P3 is acting as AECC (AUGER European CC).  Access granted to all AUGER users (AFS accounts provided).  CVS repository for AUGER software has been installed at CCIN2P3, access from AFS (from the local and non-local cells) and from non-AFS environment using ssh.  Linux is the preferred platform.  Simulation software based on Fortran programs.

26 W.A.Wojcik/CCIN2P3, May 2001 26 D0  Linux is one of D0 supported platforms and is available at CCIN2P3.  D0 software is using the KAI C++ compiler  Import/export of D0 data (using internal Enstore format) is a complicated work. We will try to use the bbftp as a file transfer program.

27 W.A.Wojcik/CCIN2P3, May 2001 27 CCIN2P3 Import/export CERN CASTOR HPSS SLAC HPSS FNAL ENSTORE SAM BNL HPSS ?? ? ?

28 W.A.Wojcik/CCIN2P3, May 2001 28 Problems  To add the new Objy servers (for other experiments) is very complicated. It needs the new separate machines, with modified port numbers in /etc/services. Under development for CMS.  The OS system versions and levels  The compilers versions (mainly for Objy for different experiments).  Solutions?

29 W.A.Wojcik/CCIN2P3, May 2001 29 Conclusions  The data exchange should be done using the standards (e.g. files or tapes) and common access interfaces (bbftp and rfio are the good examples).  Needs for better coordination and similar requirements on supported system and compiler levels between experiments.  The choice of the CASE technologie is out of the control of our CC acting as Regional Computer Center .  GRID will require more uniform configuration of the distributed elements.  Who can help? HEPCCC? HEPiX? GRID?


Download ppt "W.A.Wojcik/CCIN2P3, May 2001 1 Running the multi-platform, multi-experiment cluster at CCIN2P3 Wojciech A. Wojcik IN2P3 Computing Center"

Similar presentations


Ads by Google