Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,
The LEGO Train Framework
CERN - IT Department CH-1211 Genève 23 Switzerland t Transportable Tablespaces for Scalable Re-Instantiation Eva Dafonte Pérez.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
Service Broker Lesson 11. Skills Matrix Service Broker Service Broker, provides a solution to common problems with message delivery and consistency that.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
AliEn Tutorial MODEL th May, May 2009 Installation of the AliEn software AliEn and the GRID Authentication File Catalogue.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
Working with AliEn Kilian Schwarz ALICE Group Meeting April
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
4 AliEn Artem Harutyunyan Presented by: Predrag Buncic Tackling job.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN – Alice Offline – Thu, 27 Mar 2008 – Marco MEONI - 1 Status of RAW data production (III) ALICE-LCG Task Force weekly.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Ticket review T1 Service Coordination Meeting 2010/10/28.
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
CERN IT Department CH-1211 Genève 23 Switzerland t ALICE XROOTD news New xrootd bundle release Fixes and caveats A few nice-to-know-better.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Patricia Méndez Lorenzo (CERN, IT/GS-EIS) ċ. Introduction  Welcome to the first ALICE T1/T2 tutorial  Delivered for site admins and regional experts.
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Disk Space publication Simone Campana Fernando Barreiro Wahid.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz The future of AliEn.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
+ AliEn status report Miguel Martinez Pedreira. + Touching the APIs Bug found, not sending site info from ROOT to central side was causing the sites to.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The AliEn File Catalogue Jamboree on Evolution of WLCG Data &
AliEn Tutorial ALICE workshop Sibiu 20 th August, 2008 Pablo Saiz.
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
CREAM CE: upgrades in the system  Migration of the ALICE production queue in the CREAM CE: DONE  From pps-cream-fzk.gridka.de:8443/cream-pbs-pps to.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
Care and feeding of the alice grid Torino, Jan 15-16, 2009.
Creating a simplified global unique file catalogue Miguel Martinez Pedreira Pablo Saiz.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagnasco, L. Betev, D. Goyal, A. Grigoras, C.
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Federating Data in the ALICE Experiment
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
U.S. ATLAS Grid Production Experience
UML diagrams for the AliEn job execution part and PackMan service
Running a job on the grid is easier than you think!
Running a job on the grid is easier than you think!
Torrent-based software distribution
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
INFN-GRID Workshop Bari, October, 26, 2004
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Analysis Trains - Reloaded
Simulation use cases for T2 in ALICE
AliEn central services (structure and operation)
Pablo Saiz CAF and Grid User Forum
Presentation transcript:

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn Extreme Job Broker

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 210 Nov 2011 Pablo Saiz ALICE offline week Two tasks: –Tells CE how many JobAgents to submit –Gives user jobs to JobAgents –Using job priorities Using Classads matchmaking Pull mode –Waits for requests Critical service –Without it, user jobs would wait in the queue Multiple instances, in several hosts –Using DB to solve concurrency problems AliEn Job Broker

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 310 Nov 2011 Pablo Saiz ALICE offline week xrootd Job execution Job Manager JOB TASKQUEUE Job Broker CE MonALISA xrootd Site A JOB MonALISA xrootd Site B MonALISA Site C File catalogue LFN GUID Meta data JOB CE JA

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 410 Nov 2011 Pablo Saiz ALICE offline week Current Job Broker concerns Increased number of jobs –More users, more analysis, more resources… Load on server(s) –We could deploy more instances. –No problems with database performance Understand why jobs stay in WAITING

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 510 Nov 2011 Pablo Saiz ALICE offline week Let’s go back to splitting… Splitting per SE –Per combination of SE –Users can specify max number of files Although usually jobs have even less input files Other splitting algorithms are trivial –‘production’  similar jdl in all subjobs –‘file’  one subjob per file –‘directory’  one subjob per directory –‘user defined’  as user specifies

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 610 Nov 2011 Pablo Saiz ALICE offline week Current situation Works nicely if one replica per file Job Manager JOB A bit more complex with 3 SE and 2 replicas Job Manager JOB And a lot more with 50 SE and 3 replicas Job Manager JOB

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 710 Nov 2011 Pablo Saiz ALICE offline week What if...? Split jobs without specifying input data per subjob Decide files to be analyzed in the last moment –Based on location, and files already analyzed

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 810 Nov 2011 Pablo Saiz ALICE offline week Example Site ASite BSite C File 1 File 2 File 3 File 4 File 5 Current schema Submit 4 jobs: File1 File 4 File2File3File 5 Broker per file Submit 3 empty subjobs File1, 2,4,5 When a job starts, analyze as much as possible File 3 If nothing left, just exit

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 910 Nov 2011 Pablo Saiz ALICE offline week Even more options ‘Each subjob with at least 5 files’ –And read them remotely ‘Analyze only 70 out of 100 files’ –And stop once the number has been reached ‘Stop the analysis after 3 hours when 90% of files have been processed’ Ideas from: Jan Fiete Grosse-Oetringhaus

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1010 Nov 2011 Pablo Saiz ALICE offline week Broker per file Benefits: –Less subjobs –Easier to achieve max number entries –Can define min number (remote reading) –Can reduce the tail of execution Challenges: –Keep track of files analyzed –Increase complexity of Broker –Still to be implemented

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1110 Nov 2011 Pablo Saiz ALICE offline week Load on Job Broker Increase with v2-19: –Security: SSL connections

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1210 Nov 2011 Pablo Saiz ALICE offline week Old Classad Matchmaking PrioJobsJDL [executable=“aliroot”; user=‘alidaq’; inputdata=; …; Requirements= TTL>9000 and other.SE=‘ALICE::FZK::SE’ ] 975 [executable=“aliroot”; user=‘psaiz’, inputdata= …; Requirements= TTL>300 and ( other.SE=‘ALICE::Prague::SE’ or other.SE=‘ALICE::Torino::SE’) ] 9520 [executable=“aliroot”; user=‘blabla’; inputdata = …; Requirements= TTL>300 and other.SE=‘ALICE::CERN::SE’ ] … [executable=“aliroot”; user=‘aliprod’; Requirements=TTL>80000] Match all entries, starting with highest priority [ Uname = " server"; CE = "ALICE::CERN::aliendb5"; CloseSE = { "ALICE::CERN::Castor2", "ALICE::CERN::SE", … }; Version = "v "; Type = "machine"; Price = E+00; Host = "aliendb5d"; WNHost = "aliendb5.cern.ch"; LocalDiskSpace = ; Platform = "Linux-x86_64"; TTL = 43200; Packages = { …} ; Requirements = ( other.Type == "Job" ) ] JOB AGENT JDL

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1310 Nov 2011 Pablo Saiz ALICE offline week Current Classad Matchmaking Match all entries with that SE, starting with highest priority PrioJobsJDLSE [executable=“aliroot”; user=‘alidaq’; packages=; …; Requirements= TTL>9000 and other.SE=‘ALICE::FZK::SE’ ] FZK 975 [executable=“aliroot”; user=‘psaiz’, packages=; …; Requirements= TTL>300 and ( other.SE=‘ALICE::Prague::SE’ or other.SE=‘ALICE::Torino::SE’) ] Torino, Prague 9520 [executable=“aliroot”; user=‘blabla’; packages=; …; Requirements= TTL>300 and other.SE=‘ALICE::CERN::SE’ ] CERN … [executable=“aliroot”; user=‘aliprod’; Requirements=TTL>80000]

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1410 Nov 2011 Pablo Saiz ALICE offline week And in v2-20 Let the database do most of matchmaking –Extract most common fields from the JDL: TTL, CE, SE, user, partition, size, packages.. –First match should do it –Less flexibility in requirements… PrioJobsJDLSEUser CE …Partition [executable=“aliroot”; …]FZKalidaqFZK 975 [executable=“aliroot”; … ]Torino, Prague psaizTorino, Prague 9520 [executable=“aliroot”; … ]CERNblablaCERN … [executable=“aliroot”; … ]aliprod

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1510 Nov 2011 Pablo Saiz ALICE offline week Other changes in v2-20 ‘Resubmit’ uses same jobid CE submits JA even if packages not installed Atomic transaction for job assignment

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1610 Nov 2011 Pablo Saiz ALICE offline week Summary AliEn Job Broker –Decides where jobs get executed –Pull model, classads and priorities In next version of AliEn –Broker per file to be analyzed Less subjobs More input dat per subjob More options (min. files, max waiting time) –Simplified classad matching –Atomic job assignment transaction To be deployed in January