Presentation is loading. Please wait.

Presentation is loading. Please wait.

AliEn central services (structure and operation)

Similar presentations


Presentation on theme: "AliEn central services (structure and operation)"— Presentation transcript:

1 AliEn central services (structure and operation)

2 ALICE Offline Week - July 2008
Central machines 5 32bit 15 64bit 6 Macs 26 machines on 2 * 1Gbit uplinks Different roles, MonALISA services running on them report machine monitoring + each services' status at: 19.2 KVA UPSs (15m..50m backup): ALICE Offline Week - July 2008

3 AliEn services – User interaction
3 Authen 3 Proxy 2 User API Services 4 Jobs API Services ALICE Offline Week - July 2008

4 AliEn services – Internal services
PackMan, IS, Logger, TransferMgr MonALISA repository PackMan, Optimizers (Transfer, Catalogue, Jobs) MySQL – Catalogue, LDAP master MySQL – Task Queue, LDAP slave (currently there are >44M entries in the catalogue, ~100x more than what you have on a PC) Alice::CERN::SE xrootd redirector ALICE Offline Week - July 2008

5 AliEn services – backup
pcalienstorage: 9TB raw / 6TB available for backup MySQL slave for both catalogue and task queue DBs (weekly stop / take snapshot / restart) /backup on all machines mounted over NFS from this machine /opt/alien on all central machines is also mounted from this machine over NFS, with different base paths for each architecture ALICE Offline Week - July 2008

6 ALICE Offline Week - July 2008
Build servers 32bit SLC4 64bit SLC4 32bit OSX 10.5 64bit OSX 10.5 (+Itanium build server in CC) ALICE Offline Week - July 2008

7 ALICE Offline Week - July 2008
DNS load balancing Each machine reports through ML to the central repository the full status of each machine, including: Operational status of each service (tested every 15m) Load on the machine, CPU, memory and swap utilisation No. of connected sockets A weighted score is generated based on the parameters above, updating every minute the CERN DNS aliases with the IP addresses of the machines that are not overloaded. The IP aliases are queried by users or site services when connecting to the central services; by using them we distribute the load evenly between the active machines and limit the damage that can be caused to the central services. TODO: faster reaction times to services not working / overloaded ALICE Offline Week - July 2008

8 DNS load balancing in action
Wed Jul 9 07:23:24 CEST 2008 : alice-proxy Thu Jul 10 13:40:38 CEST 2008 : alice-proxy Thu Jul 10 13:44:52 CEST 2008 : alice-proxy ALICE Offline Week - July 2008

9 ALICE Offline Week - July 2008
Making use of the Macs 6 8-core machines with 8GB of RAM...sounds very tempting! Pablo managed to start both Authen and Proxy on alimacx01 in almost no time, BUT... The services kept crashing very fast: Default ulimit -u : 266 Max ulimit -u : 2500 With these constraints, we cannot use the machines for anything spawning many processes (eg. Proxy). Authen runs fine though, as probably would several other central services. ALICE Offline Week - July 2008

10 ALICE Offline Week - July 2008
Running jobs profile ALICE Offline Week - July 2008

11 ALICE Offline Week - July 2008
Load comparison ”The more jobs, the less problems” ? (not quite, the load is higher when many jobs start / finish, or worse when a SE is not available and cause an avalanche of failing jobs) ALICE Offline Week - July 2008

12 ALICE Offline Week - July 2008
Load at >10k jobs ALICE Offline Week - July 2008

13 Running jobs vs. Load (last 6 months, 2hours averages)
ALICE Offline Week - July 2008

14 ALICE Offline Week - July 2008
Future plans Upgrade old central machines (2+ years) with more modern hardware (8 cores, 16-32GB RAM, fast SAS drives) Use all available resources (especially the Macs) to be prepared to run at least 2x more jobs Install two additional power lines (16A) to accomodate the greedy hardware Maybe install some additional AC unit... ALICE Offline Week - July 2008

15 ALICE Offline Week - July 2008
Last slide :) ALICE Offline Week - July 2008


Download ppt "AliEn central services (structure and operation)"

Similar presentations


Ads by Google