+ AliEn status report Miguel Martinez Pedreira. + Touching the APIs Bug found, not sending site info from ROOT to central side was causing the sites to.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
ALICE G RID SERVICES IP V 6 READINESS
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
IWay Service Manager 6.1 Product Update Scott Hathaway iWay Software Copyright 2010, Information Builders. Slide 1.
1 Using Subversion: Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Using subversion COMP 2400 Prof. Chris GauthierDickey.
Version Control. What is Version Control? Manages file sharing for Concurrent Development Keeps track of changes with Version Control SubVersion (SVN)
© 2009 IBM Corporation 1 ClearQuest Synchronizer and ClearQuest Bridge Tech Enablement for CLM 4.0 Lorelei Ngooi & Yuhong Yin June 2012.
AliEn Tutorial MODEL th May, May 2009 Installation of the AliEn software AliEn and the GRID Authentication File Catalogue.
G RID SERVICES IP V 6 READINESS
How a little code can help with support.. Chris Barba – Developer at Cimarex Energy Blog:
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Write-through Cache System Policies discussion and A introduction to the system.
Panda Grid Status Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz)
Framework of Job Managing for MDC Reconstruction and Data Production Li Teng Zhang Yao Huang Xingtao SDU
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
G4MICE Status and Plans 1M.Ellis - CM24 - RAL - 31st May 2009  Firstly, a correction to the agenda:  I failed to spot a mistake in the agenda that I.
+ AliEn status report Miguel Martinez Pedreira. + APIs dcache issue having lfn-like pfns root://srm.ndgf.org:1094//alice/cern.ch/user/a/alitrain/PWGJE/Jets_PbPb_2011/104_
Grid Scheduler: Plan & Schedule Adam Arbree Jang Uk In.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
August 28, 1998Handling requests with a trouble ticket system at DESY Zeuthen1 Wolfgang Friebel Motivation The req/reqng request tracking system Enhancements.
Working with AliEn Kilian Schwarz ALICE Group Meeting April
MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS XROOTD news New release New features.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Image Distribution and VMIC (brainstorm) Belmiro Moreira CERN IT-PES-PS.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
STAR Scheduling status Gabriele Carcassi 9 September 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz The future of AliEn.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
The GridPP DIRAC project DIRAC for non-LHC communities.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The AliEn File Catalogue Jamboree on Evolution of WLCG Data &
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
SCDB Update Michel Jouvin LAL, Orsay March 17, 2010 Quattor Workshop, Thessaloniki.
Git workflows: using multiple branches for parallel development SE-2800 Dr. Mark L. Hornick 1.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Creating a simplified global unique file catalogue Miguel Martinez Pedreira Pablo Saiz.
Connect:Direct for UNIX v4.2.x Silent Installation
Running a job on the grid is easier than you think!
Technical Board Meeting, CNAF, 14 Feb. 2004
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Miguel Martinez Pedreira
Storage elements discovery
Pablo Saiz CAF and Grid User Forum
Introduction of Week 3 Assignment Discussion
Topic 15 Lesson 1 – Action queries
Job Application Monitoring (JAM)
Presentation transcript:

+ AliEn status report Miguel Martinez Pedreira

+ Touching the APIs Bug found, not sending site info from ROOT to central side was causing the sites to have network overloads and reduced efficiency Started to touch the ‘untouchable’ unmaintained code The bug fix implied having to modify the access methods Found tricky code hardcoded stuff for SEs redundant calls different use of cache and database tables in the different apis 2 AliEn development - Miguel Martinez Pedreira

+ Touching the APIs First step, adapt the code to process the site and attempt number from jobs read requests also reading from the right table (SEDistance) In the case of APIs specially, we need somewhere to test managed to create a new one in pcalice92 soon after, used new server to add a user api: apiserv08 Spotted a part of the code to select SEs based on ‘whereis’ but then almost same ‘whereis’ repeated idea to cache them (heavy operation) idea to sync it with job optimizers since jobs are splitted based on the inputdata the jobs request to read Reordered cache usage access the same envelope was misused (caching the same information as access) added whereis, same as in optimizer 3 AliEn development - Miguel Martinez Pedreira

+ Touching the APIs Result: we do less ‘whereis’ calls and use the cache better 4

+ Touching the APIs AliEn development - Miguel Martinez Pedreira 5

+ AliEn code unification Playing with the apis code raised the issue of the AliEn versions again decide to start merging sync of SVN and central services more differences than expected... Initial status v2-19 – CVMFS: voboxes + wns CS: v v2-20 (TQ) Job APIs: v , shared for 10 API servers User API: v patches, only api03 6 AliEn development - Miguel Martinez Pedreira

+ AliEn code unification 1. CENTRAL – CVMFS replace site-side parts into the CS installation JobAgent, ClusterMonitor... checked all files anyway created alien.NEW put it on some production services gradually 2. Jobs – Users APIs differences in access code and some manual patches from api03 new version, alien.219_API, put in job apis small issues forming the envelope, coming from whereis result 7 AliEn development - Miguel Martinez Pedreira

+ AliEn code unification 3. APIs + CS + CVMFS specially important authen, access, admin, user commands... Finally the one and only version! running: 1 user API: apiserv08 Authen in db2, JobBroker, JobManager, JobInfoManager, IS in db8 Progressively to the rest What now ? To be put in CVMFS new version to be used/tested explicitly first? SVN? (Name? now alien.FINAL) Scripts, installation To be fully tested... But quite smooth so far Differences in installations, also affect behavior Fresh installation where we put the new code 8 AliEn development - Miguel Martinez Pedreira

+ Certificates In the last months, several problems to access the GRID by several users Missing/outdated certificates in CS and/or API Have to add them manually not updated installations not clear what has to be there just IGTF package? Automatize: cronjob or tool to update some parts of the installations it exists for the CVMFS one 9 AliEn development - Miguel Martinez Pedreira

+ Other items SPLIT jobs not MERGING JOBSTOMERGE now correctly updated jobs splitting into 0 subjobs now to error Fix for ZOMBIEs race condition between insertion-waiting and execution in the node fix in a db field JA env cleanup between jobs CMreport sends more, bigger messages Proxy-init fix JA check for output size Option to disable catalog trace from LDAP 10 AliEn development - Miguel Martinez Pedreira

+ Other items dcache issue having lfn-like pfns root://srm.ndgf.org:1094//alice/cern.ch/user/a/alitrain/PWGJE/Jets_PbPb_2011/104_ /lego_train.C root://srm.ndgf.org:1094//alice/disk/14/41166/ea0fce6a-e98a-11e3-abef-c7fc858f3c77 though to be on new api only, because of new envelope creation but found also on addMirror commands on original user api under investigation G tables maintenance with a ‘high’ estable number of jobs running -> +2M entries/day new table every month aprox 11 AliEn development - Miguel Martinez Pedreira

+ IPv6 Starting next week First step: update PERL version in AliEn and see what crashes Update to xrootd in API Student coming on 7 th July to deal with this Test with IPv6 stack only to make sure it works 12 AliEn development - Miguel Martinez Pedreira

+ Conclusion Still more things to do JDL optimization in DB, Broker queries, improve commands... What to do with v2-20 and v2-21 ? Catalogue conversion takes long HLT/Cloud incoming? (Dario) 13 AliEn development - Miguel Martinez Pedreira

+ [Almost] Birthday About to reach jobs Questions ? AliEn development - Miguel Martinez Pedreira 14