Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010

Similar presentations


Presentation on theme: "ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010"— Presentation transcript:

1 Interactions between sites, management/operations and services required by ALICE
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010 Patricia Méndez Lorenzo (CERN, IT. WLCG Project)

2 Outlook ALICE job submission in 7 points
ALICE services required at the sites Interactions between sites and interoperability Summary and Conclusions ALICE - FAIR Offline Meeting 04-May-2010

3 ALICE submission in 7 points
Job agent infrastructure Submitted through the Grid infrastructure available at the site Real jobs held in a central queue handling priorities and quotas Agents submitted to provide a standard environment (job wrapper) across different systems Real jobs pulled by the sites Automatize operations Extensive monitoring ALICE - FAIR Offline Meeting 04-May-2010

4 WLCG Services required at the ALICE sites
VOBOX With direct access to the experiment software area This area has to be mounted also to all available WNs Computing element front-end able to submit directly to the local batch system gLite sites have to provide the CREAM-CE system (1.5 version at least) Specific ALICE queue behind the CREAM-CE Resource BDII must provide information about this service Xrootd based Storage system CPU requirements 2GB of memory per job for MC and reconstruction jobs Optimization needed for some user analysis jobs ALL Services are already available for SL5/64b ALICE - FAIR Offline Meeting 04-May-2010

5 The VOBOX Primary service required at every site
Service scope: Run (local) specific ALICE services File system access to the experiment software area Mandatory requirement to enter the production Independently of the middleware stack available Same concept valid for gLite, OSG, ARC and generic AliEn sites Same setup requirements for all sites Same behavior for T1 and T2 sites in terms of production Differences between T1 and T2 sites a QoS matter only Applicable to ALL generic services provided at the sites Service related problems should be managed by the site administrators Support ensured at CERN for any of the middleware stacks ALICE - FAIR Offline Meeting 04-May-2010

6 The gLite3.2-VOBOX layer The gLite-VOBOX is a gLite-UI with 2 added values: Automatic renewal of the user proxy Needed to ensure a good behavior of the experiment services voboxd service is the responsible to run the process Direct access to the experiment software area Access restricted to the voms role: lcgadmin User mapped to SINGLE local account Unique service not accepting pool accounts Access to the machine based on a valid voms-aware user proxy gsisshd service is the responsible to provide this access through the port 1975 NOTE: The access port is configurable, please DON’T DO IT ALICE - FAIR Offline Meeting 04-May-2010

7 Example: the gLite-VOBOX
70% of the ALICE VOBOXES are running the gLite middleware Specific CERN IP range From WN To world From WN ALICE responsibility user privileges (gsissh) Installation, maintenance, upgrades and follow up for ALICE services ALICE Services PackMan (9991) MonaLisa(8884) World and WN ClusterMonitor (8084) CE (talks through CM) GENERIC SERVICE: gLite3.2 VOBOX gsisshd (1975) vobox-renewd Site admin responsibility Root privileges Installation, maintenance, upgrades and follow up for OS/generic service ANY and SAM UIs Gridftp (obsolete) Operating system (SL5/64b) Myproxy and voms servers ALICE - FAIR Offline Meeting 04-May-2010

8 gLite-VOBOX: The proxy renewal mechanism
User must have uploaded his proxy before Service Area: DAEMON Connection to Connection to ALICE voms server Build up of a delegated proxy from those registered user proxies User Area $HOME area assigned User must manually register his proxy with the VOBOX (in principle only once…) Loggin via gsissh to the VOBOX from a UI with a valid proxy Myproxy @ CERN Require a delegated plain proxy VOMS Server Require valid voms extension ALICE - FAIR Offline Meeting 04-May-2010

9 gLite-VOBOX operations
Site side Minimal! One of the most stable services Easy installation and configuration Experiment support side Sites are supported at any moment by the CERN team Remember…. The gLite3.2 VOBOX has been created by the ALICE support team in collaboration with the WLCG All VOBOXES are monitored via SAM each hour T0 VOBOXEs are monitored via Lemon with an alarm system behind T1 sites can be notified via GGUS alarm tickets T2 sites will be notified on best effort base ALICE - FAIR Offline Meeting 04-May-2010

10 ALICE Workload Management system
The ALICE tendency in the last two years is to minimize the number of services used to submit job agents Already ensured with ARC, OSG and AliEn native sites gLite sites can also follow this approach Through the CREAM-CE system ALICE is asking for the deployment of this service at all sites since 2008 ALICE has deprecated the gLite3.X-WMS with the deployment of AliEn v.2.18 The WMS submission module still available in AliEn v.2.18 31st of May is the deadline established by ALICE to deprecate the usage of WMS at all sites ALL ALICE sites have to provide a (at least) CREAM-CE1.5 by that date Sites are encouraged to migrate to CREAM1.6 asap ALICE - FAIR Offline Meeting 04-May-2010

11 ALICE and the CREAM-CE 1st experiment to put CREAM-CE in production and provide developers with important feedback 2008 approach: Testing Testing phase (performance and stability) at FZK 2009 approach: AliEn implementation and distribution System available at T0, all T1 (but NIKHEF) and several T2 sites Dual submission (LGC-CE and CREAM-CE) at all sites providing CREAM Second VOBOX was required at the sites providing CREAM to ensure the duality approach asked by the WLCG-GDB 2010 approach: Deprecation of the WMS Currently only 5 sites are pending of the setup of a CREAM-CE The 2nd VOBOX provided in 2009 has been recuperated through the failover mechanism included in AliEnv2.18 Both VOBOXES submitting in parallel to the same local CREAM-CE ALICE is activelly involved in the operation of the service at all sites together with the site admins and the CREAM-CE developers ALICE - FAIR Offline Meeting 04-May-2010

12 ALICE experiences with CREAM-CE
Example: 2009 Christmas time Experiment and sites support on best effort based Experiment Experience Almost the 45% of the Christmas production was executed with CREAM-CE resources only Sites experience Sites admins were asked after the vacations period for the operations required at the local CREAM-CE Most of the sites admins confirmed a negligible effort needed by the system With the increase of the ALICE load in the last months the stability of the system has been confirmed ALICE - FAIR Offline Meeting 04-May-2010

13 Failover mechanism: The use of the 2nd VOBOX
This approach has been included in AliEnv2.18 to take advantage of the 2nd VOBOX deployed in several ALICE sites All sites which provided a CREAM-CE in 2009 had to provide a 2nd VOBOX for ALICE Used to submit to the LCG-CE and the CREAM-CE backend in parallel Message to the sites: Do not retire the 2nd VOBOX, we will keep on using it in failover mode with the 1st VOBOX VOBOX setup in redundant approach Both local VOBOXES are cloned one of each other and submit to the same CREAM-CE local backend ALICE - FAIR Offline Meeting 04-May-2010

14 ALICE Data Management system
(Some highlights only, Fabrizio’s baby) Data access and file transfer: via xrootd Data access T0 and T1 sites: Interfaced with all supported MSS T2 sites: xrootd pure setup (ALICE advice) File transfer T0-T1: raw data transfer for custodial and reconstruction purposes No specific assignments between data and sites T1-T2: reconstructed data and analysis (NOTE: T1-T2 assignment not predefined in the ALICE computing model) Default behavior: Jobs are sent where data are physically stored External files can be locally copied or remotely access based on the user decision Outputs of the jobs locally stored and replicated to (in default) two external sites Choice based on ML status report of the SE and physical proximity ALICE - FAIR Offline Meeting 04-May-2010

15 Sites connection and interoperability
Only to the data management level In terms of job submission and services sites are independent with each other ALICE computing model foresees same service infrastructure at all sites ALICE assumes the T1-T2 connection established by the ROC infrastructure for site control and issue escalation Interoperability between sites Defined to the AliEn level Definition of plug-ins for each middleware stack Same AliEn setup for all sites ALICE - FAIR Offline Meeting 04-May-2010

16 Summary and conclusions
For the data taking, the ALICE approach is to lighten the site requirements in terms of services Come back to the list of service requirements! This approach is quite realistic and has demonstrated to work quite well Able to escalate to the level of available man power at the core CERN and the sites High performance in terms of VOBOXEs, CREAM-CE and xrootd is needed This is critical: if any of these services fails, your site is down Besides the T0 site, the rest of the sites are all redundant One T1 down will not stop the ALICE computing programme Sites are supported by a team of experts at any time This approach has demonstrated its scalability Same team since years with a clear increase of the sites entering in production During the 1st data taking, the ALICE computing model and its infrastructure has demonstrated a high level of stability and performance. It works! ALICE - FAIR Offline Meeting 04-May-2010


Download ppt "ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010"

Similar presentations


Ads by Google