Presentation is loading. Please wait.

Presentation is loading. Please wait.

22 February 2008GS Group Meeting - EIS section GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana.

Similar presentations


Presentation on theme: "22 February 2008GS Group Meeting - EIS section GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana."— Presentation transcript:

1 22 February 2008GS Group Meeting - EIS section GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana ATLAS support Roberto Santinelli LHCb support Andreas Sciaba CMS support Patricia Mendez ALICE support Four INFN funded CERN fellows: Alessandro di Girolamo ATLAS Elisa Lanciotti LHCb Nicolo Magini CMS and ALICE Vincenzo Miccio CMS One ASGC funded visitor: Gang Qin ATLAS In the future we would like to broaden the associations with single experiments where possible e.g. by leveraging common solutions or having limited duration task forces on a particular experiment problem area.

2 22 February 2008GS Group Meeting - EIS section H.Renshall: member of wlcg team preparing for LHC startup (CCRC'08) then production running deputy group leader - attend/contribute to departmental management activities section leader (light administrative load) scientific secretary of the LHCC Computing Resources Review Board and also of the Computing Resource Scrutiny Group IT link person to the LHCb experiment

3 22 February 2008GS Group Meeting - EIS section Simone Campana:

4 Experiment Integration Support for ATLAS ✓ Liaison with WLCG and EGEE In ATLAS organization: Facility Coordinator ✓ Coordinate GRID middleware and ATLAS Distributed SW deployment/updates/upgrades ✓ Primary contact for Tiers Facilities Managers Organize, plan and coordinate ATLAS wide tests ✓ Tier-0 throughput, DDM Functional Test, CCRC08 ✓ Includes scripting/development of tools, debugging, testing, follow up… ➡ This is the most time consuming activity ✓ This activity is strong contact collaboration with Birger and Stephane Jerzequel from ATLAS ➡ A lot of overlap, but also different scopes Follow up of Alessandro’s activity on monitoring ✓ Only little effort now from my side (he is now independent)

5 22 February 2008GS Group Meeting - EIS section Patricia Mendez:

6 22 February 2008GS Group Meeting - EIS section EIS Commitments WLCG Support to the ALICE Experiment –Maintenance and support of ALICE VOBOXES together with the AliEn software distribution –Implementation of gLite middleware within the Alice WMS –Establishment of site/services contacts with the experiment –SAM implementation –ALICE FDR setup and planning

7 22 February 2008GS Group Meeting - EIS section EIS Commitments Support to communities beyond HEP –UNOSAT, Geant4, generic applications (theoretical physics, ITU, Garfield, HARP, QCD...) –Creation and setup of Vos = gear, UNOSAT, geant4 –Resources research and setup Depending on each application requirements –Site/application contact –Gridification of the applications and merge onto the Grid environment

8 22 February 2008GS Group Meeting - EIS section EIS Commitments CCRC`08 exercises and services –Follow up of the ALICE participation as WLCG contact person –Additional tasks as SAM implementation for the experiments, VOBOXES setup, etc EGI proposal –Application support working group EGEE-III

9 22 February 2008GS Group Meeting - EIS section Roberto Santinelli:

10 GS Group Meeting - EIS section Supporting LHCb: setting up LFC distributed service The goal  A redundant and reliable File catalogue service for LHCb based on LFC  A system that best matches the LHCb use cases Implementation  a master LFC at CERN and mirrored replicas at Tier-1 sites using Oracle Streams Several technical aspects to consider  Coherence of data and access control  Latency in the propagation of updates VO support team contributed to the project  Definition of the solution and “acceleration” of all steps in the software lifecycle (whenever this was possible)  Functionality and stress tests.  Readiness of site implementation The distributed LHCb file catalogue was deployed in time for the currently ongoing combined computing challenge (CCRC’08)

11 GS Group Meeting - EIS section Supporting LHCb: site readiness for CCRC and beyond Not only monitoring resources and services (and Writing custom tools for that) But also : Working with sites and WLCG service for fixing problems spawned Negotiating resources channeling problems to/from VO Not only monitoring resources and services (and Writing custom tools for that) But also : Working with sites and WLCG service for fixing problems spawned Negotiating resources channeling problems to/from VO Service classes disk space monitoring charts FTS matrix channels between all T1’s SRMv2

12 GS Group Meeting - EIS section Supporting LHCb: SAM tests LHCb uses the SAM framework to:  Check the availability of Computing Elements Queues, WN hardware and software  Detect Operating System and architecture  Manage the deployment of LHCb software Install (or remove) and publish appropriate software versions Run test simulation, reconstruction, analysis LHCb SAM jobs run with high priority with software manager credentials LHCb sensors integrated in DIRAC infrastructure  When pilot job arrives on WN the testsuite is executed Results are published in SAMDB Nicolo' Magini - Third EGEE User Forum 12

13 22 February 2008GS Group Meeting - EIS section Andrea Sciaba:

14 22 February 2008 CMS contact in EIS – “Grid expert” in CMS Giving advice, solving problems reported by CMS users and developers – Site commissioning Responsible for SAM in CMS – Managing SAM test submission – Interface with SAM and Dashboard developers – Development of CMS SAM tests Debug site problems, mainly those exposed by SAM tests – VO management CMS VO manager, processing registration requests and solving VOMRS/VOMS issues Interface with VOMRS/VOMS developers Middleware testing – gLite WMS, CREAM, job priorities EGEE TCG – “alternate” CMS representative – Giving input on middleware-related issues and future developments OSG/EGEE Interoperability working group – Representing CMS Training and documentation – Editor of the gLite 3 User Guide – Giving tutorials

15 22 February 2008 Nicolo Magini

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Nicolo' Magini - Third EGEE User Forum SAM tests for SRMv2 Start from higher-level functionality: lcg-util tests  SRMv2-get-SURLs For ops/dteam: get path from BDII and corresponding space tokens For VOs: replace with VO-specific plugins. Developed TFC test for CMS  SRMv2-lcg-cp Copy a file to the SRMv2, copy it back  SRMv2-lcg-cr Copy a file to the SRMv2 and register in LFC File Catalog  SRMv2-lcg-gt Get a TransferURL with supported protocols  SRMv2-lcg-gt-rm-gt Verify ability to correctly remove a file from SRMv2  SRMv2-lcg-ls-dir List a directory on SRMv2  SRMv2-lcg-ls List a file on SRMv2 Other lower-level SRMv2 functionality will be added 16 Nicolo' Magini - Third EGEE User Forum

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Nicolo' Magini - Third EGEE User Forum VO support activity for CMS: DDT DebuggingDataTransfers –During CMS CSA07 period:  Define a metric and procedure to commission data transfer links between CMS Tiers ~ 4 MB/s sustained for 5 days  Provide documentation and support on transfer debugging FTS, SRM operations within the CMS PhEDEx middleware  ~ 250 links commissioned in 2007 –Current efforts (CCRC08 and beyond):  Scale up the rates to the requirements for the data taking period 20 MB/s over 24h  Ongoing support for transfer debugging FTS, SRMv2 ecc.  Current global traffic in PhEDEx counting DDT + CCRC transfers is approaching the 2008 requirements for CMS 20 Gbps 17 Nicolo' Magini - Third EGEE User Forum

18 22 February 2008GS Group Meeting - EIS section Gang Qin:

19 22 February 2008GS Group Meeting - EIS section Storage Space Monitor For T0,T1s & T2s For T0,T1s & T2s Reliable space-info: of different storage classes (i.e. Atlas:custodial:nearline, Atlas:replica:online, Atlas etc.) of different storage classes (i.e. Atlas:custodial:nearline, Atlas:replica:online, Atlas etc.) of different time-period of different time-period Last 24hours Last 24hours Last month Last month Last year Last year Current status: Current status: cronjobs running to fetch daily data cronjobs running to fetch daily data still lot of inconsistency, since lots of things are changing (SRMv2 space tokens) still lot of inconsistency, since lots of things are changing (SRMv2 space tokens) To Do: implement SRM2 function to have space info for each space token To Do: implement SRM2 function to have space info for each space token Functions for different storage types Functions for different storage types Cross check between BDII (ldap query) & local command Cross check between BDII (ldap query) & local command DPMdpm-qryconf DPMdpm-qryconf CASTOR stager_qry on site VOBOX CASTOR stager_qry on site VOBOX dcacheno local query command, ldap query (BDII) dcacheno local query command, ldap query (BDII) StoRM no local query command, infos by site admin StoRM no local query command, infos by site admin

20 22 February 2008GS Group Meeting - EIS section Lumber —— Lemon Sensor Monitor the status of user-specified processes Monitor the status of user-specified processes Process status Process status ‘ 0 ’ Everything is OK ‘ 0 ’ Everything is OK ‘ 1 ’ process is not running (temporarily, a restart is tried) ‘ 1 ’ process is not running (temporarily, a restart is tried) ‘ 2 ’ process is closed (i.e. by expert working on the system) ‘ 2 ’ process is closed (i.e. by expert working on the system) ‘ 3 ’ process restart failed ------- ALARM mail sent ‘ 3 ’ process restart failed ------- ALARM mail sent Now in production and running on ATLAS VOBOXes Now in production and running on ATLAS VOBOXes

21 22 February 2008GS Group Meeting - EIS section Alessandro di Girolamo

22 22 February 2008 GS Group Meeting - EIS section ATLAS specific tests integration in the Service Availability Monitor framework Storage & Computing Elements endpoints definition: intersection between GOCDB and TiersOfATLAS (ATLAS specific sites configuration file with Cloud Model)  Different services and endpoints might need to be tested using different VOMS credentials  ATLAS endpoints and paths must be explicitly tested  The LFC of the Cloud (residing in the T1) is used Monitor the availability of ATLAS critical Site Services Verify the correct installation and the proper functioning of the ATLAS software on each site SE: Put, Get and Del for each SRM endpoint CE: GangaRobot on each site: execute a real analysis job based on a MC dataset keep on running also large part of OPS suite

23 22 February 2008 GS Group Meeting - EIS section Tiers of ATLAS integration within the Grid Lumber: a Lemon sensor to monitor the status of critical processes (like DataManagment and monitoring) running on the ATLAS VOBOXes fully integrated into Lemon (Exceptions/Alarms) availability output possible on Service Level Status (SLS) The publication of the availablity status of experiment specific services into monitoring framework like Lemon and SLS is now in progress Great effort in testing the Tiers (Tier0,1 and 2) supporting ATLAS: commissioning of srm2 endpoints installation, configuration and proper functioning verification of the middleware versions and client tools installed Monitor of ATLAS specific critical services

24 22 February 2008GS Group Meeting - EIS section Enzo Miccio

25 22 February 2008GS Group Meeting - EIS section

26 22 February 2008GS Group Meeting - EIS section Elisa Lanciotti

27 22 February 2008GS Group Meeting - EIS section

28 22 February 2008GS Group Meeting - EIS section


Download ppt "22 February 2008GS Group Meeting - EIS section GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana."

Similar presentations


Ads by Google