PROOF Status and Perspectives G. GANIS CERN / LCG VII ROOT Users workshop, CERN, March 2007.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Database System Concepts and Architecture
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
OPERATING SYSTEM OVERVIEW
Chapter 1 and 2 Computer System and Operating System Overview
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Computer Organization and Architecture
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
Interactive Data Analysis with PROOF Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers CERN.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Can we use the XROOTD infrastructure in the PROOF context ? The need and functionality of a PROOF Master coordinator has been discussed during the meeting.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***,
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
Introduction to the PROOF system Ren é Brun CERN Do-Son school on Advanced Computing and GRID Technologies for Research Institute of.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
A prototype for an extended PROOF What is PROOF ? ROOT analysis model … … on a multi-tier architecture Status New development Prototype based on XRD Demo.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
1 Status of PROOF G. Ganis / CERN Application Area meeting, 24 May 2006.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
October Test Beam DAQ. Framework sketch Only DAQs subprograms works during spills Each subprogram produces an output each spill Each dependant subprogram.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
PROOF on multi-core machines G. GANIS CERN / PH-SFT for the ROOT team Workshop on Parallelization and MultiCore technologies for LHC, CERN, April 2008.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Processes and threads.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Diskpool and cloud storage benchmarks used in IT-DSS
Status of the CERN Analysis Facility
PROOF – Parallel ROOT Facility
Chapter 2: System Structures
Ákos Frohner EGEE'08 September 2008
G. Ganis, 2nd LCG-France Colloquium
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Support for ”interactive batch”
CPU SCHEDULING.
PROOF - Parallel ROOT Facility
Chapter 13: I/O Systems.
Presentation transcript:

PROOF Status and Perspectives G. GANIS CERN / LCG VII ROOT Users workshop, CERN, March 2007

27/03/2007G. Ganis, ROOT Users Workshop2 Outline (Very) quick introduction (Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop3 PROOF in a slide PROOF: Dynamic approach to end-user HEP analysis on distributed systems exploiting the intrinsic parallelism of HEP data (see Backup slides) (Very) quick introduction (Very) quick introduction What’s new since ROOT05 Current developments and plans submaster workersMSS geographical domain topmaster submaster workers MSS submaster workersMSS master client list of output objects (histograms, …) commands,scripts PROOF enabled facility

27/03/2007G. Ganis, ROOT Users Workshop4 PROOF aspects / issues Connection layer Connection layer Xrootd, Authentication, Error handling Xrootd, Authentication, Error handling Software distribution Software distribution Optimized package / class handling Optimized package / class handling Data access Data access Optimized distribution of data on worker nodes Optimized distribution of data on worker nodes Classification / handling of the results Classification / handling of the results Query result manager Query result manager Resource sharing among users Resource sharing among users Client gets one ROOT session on each machine Client gets one ROOT session on each machine Scheduling Scheduling (Very) quick introduction (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop5 What’s new since ROOT05 Connection layer based on XROOTD Connection layer based on XROOTD Coordinator functionality Coordinator functionality Full implementation of “interactive batch” model Full implementation of “interactive batch” model Dataset management Dataset management Packetizer improvements Packetizer improvements Progress in uploading / enabling additional software Progress in uploading / enabling additional software Restructuring of the PROOF modules Restructuring of the PROOF modules Progress in the integration with experiment software Progress in the integration with experiment software PROOF Wiki pages PROOF Wiki pages PROOF Wiki pages PROOF Wiki pages ALICE experience at the CAF ( see J.F. Grosse-Oetringhaus talk) ALICE experience at the CAF ( see J.F. Grosse-Oetringhaus talk)

27/03/2007G. Ganis, ROOT Users Workshop6 Coordinator functionality Independent channel to control the cluster Independent channel to control the cluster Global view Global view Independent access to information (e.g. log files) Independent access to information (e.g. log files) Needed for full implementation of “interactive batch” Needed for full implementation of “interactive batch” Not directly achievable with proofd Not directly achievable with proofd Daemon instance “disappearing” into proofserv Daemon instance “disappearing” into proofserv Session lifetime same as client connection lifetime Session lifetime same as client connection lifetime Parent proofd not aware of childrens Parent proofd not aware of childrens Natural candidate: XROOTD Natural candidate: XROOTD Light weight, industrial strength, networking and protocol handler Light weight, industrial strength, networking and protocol handler What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop7 New connection layer based on XROOTD New PROOF-related protocol: New PROOF-related protocol: XrdProofdProtocol (XPD) XrdProofdProtocol (XPD) XPD launches and controls PROOF sessions (proofserv) XPD launches and controls PROOF sessions (proofserv) Client connection (XrdProofConn) based on XrdClient Client connection (XrdProofConn) based on XrdClient Concept of physical (per client) / logical (per session) connection Concept of physical (per client) / logical (per session) connection Asynchronous reading via dedicated thread Asynchronous reading via dedicated thread Messages read as soon as available and added to a queue Messages read as soon as available and added to a queue setup a control interrupt network independent of OOB setup a control interrupt network independent of OOB Cleaner security system Cleaner security system Physical connection authenticated Physical connection authenticated Associated logical connections inherit the “token” Associated logical connections inherit the “token” Client disconnection / reconnection handled naturally Client disconnection / reconnection handled naturally What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop8 XPD role XrdProofdProtocol: client gateway to proofserv XrdProofdProtocol: client gateway to proofserv XPD links XrdProofdProtocol staticarea MT stuff proofserv Worker servers client PROOF Farm XROOTD links XrdXrootdProtocol files MT stuff client File Server XrdXrootdProtocol: client gateway to files What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop9 XPD communication layer … client xc worker n XrdProofd XS worker 1 XrdProofd proofslave XS master XrdProofd proofserv xc XS xc XRD links TXSocket xc proofslave fork() fork() fork() PROOF Farm What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans client xc

27/03/2007G. Ganis, ROOT Users Workshop10 Stateless connection and “Interactive batch” “Interactive batch”: flexible submission system keeping advantages of interactivity and batch “Interactive batch”: flexible submission system keeping advantages of interactivity and batch If a query is taking too long have the option to abort it, to stop and retrieve the results, or to leave it running on the system coming back later on to browse / retrieve / archive the results If a query is taking too long have the option to abort it, to stop and retrieve the results, or to leave it running on the system coming back later on to browse / retrieve / archive the results Ingredients Ingredients Non-blocking running mode (  v , ROOT05 ) Non-blocking running mode (  v , ROOT05 ) Query result management (  v , ROOT05 ) Query result management (  v , ROOT05 ) Stateless client connection (  v ) Stateless client connection (  v ) Ctrl-Z functionality (soon) Ctrl-Z functionality (soon) What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop11 Exploiting the coordinator: client side Not yet fully exploited: Not yet fully exploited: new functionality added regularly new functionality added regularly Examples: Examples: Log retrieval Log retrieval TProofLog contains log files as TMacro and implements display, grep, save, … functionality TProofLog contains log files as TMacro and implements display, grep, save, … functionality Session reset Session reset Cleanup of user’s entry in the coordinator Cleanup of user’s entry in the coordinator Only way-out when something bad happen Only way-out when something bad happen What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans root[] TProofLog *pl = root[] pl->Grep(“violation”)

27/03/2007G. Ganis, ROOT Users Workshop12 Exploiting the coordinator: server side Static control of resource usage Static control of resource usage Max number of users Max number of users Max number of workers per user Max number of workers per user Access, usage control Access, usage control Role of server Role of server List of users allowed to connect List of users allowed to connect Define ROOT versions available on the cluster Define ROOT versions available on the cluster Extendable to packages Extendable to packages … What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop13 Dataset uploader Optimized distribution of data files on the farm using XROOTD functionality Optimized distribution of data files on the farm using XROOTD functionality By direct upload By direct upload By staging out from mass storage By staging out from mass storage Direct upload Direct upload Sources: local directory, list of URLs Sources: local directory, list of URLs XROOTD/OLBD pool insures optimal distribution XROOTD/OLBD pool insures optimal distribution No special configuration (except for clean-up) No special configuration (except for clean-up) Using a stager Using a stager Requires XROOTD configuration Requires XROOTD configuration e.g. CASTOR for CAF e.g. CASTOR for CAF What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop14 Dataset manager Data-sets are identified by name Data-sets are identified by name Data-sets can be retrieved by name to automatically create TDSet’s Data-sets can be retrieved by name to automatically create TDSet’s What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans root[0] TProof *proof = TProof::Open(“master”); root[1] proof->UploadDataSet(“MCppH”,”/data1/mc/ppH_*”); Uploading file:///data1/mc/ppH_01.root to \ root://poolurl//poolpath/ppH_01.root [TFile::Cp] Total MB |===============| % [6.9 MB/s] root[2] proof->ShowDataSets(); Existing Datasets: MCppH root[]TDSet *dset = new TDSet(proof->GetDataSet(“MCppH”));

27/03/2007G. Ganis, ROOT Users Workshop15 Dataset manager Metadata stored in sandbox on the master Metadata stored in sandbox on the master New sub-directory /dataset New sub-directory /dataset Concept of private / public data-sets Concept of private / public data-sets User’s private definitions User’s private definitions readable / writable by owner only readable / writable by owner only User’s public definitions User’s public definitions readable by anybody readable by anybody Global public definitions Global public definitions Workgroup- / experiment-wide (e.g runs) Workgroup- / experiment-wide (e.g runs) readable by anybody (group restrictions?) readable by anybody (group restrictions?) writable by privileged account writable by privileged account What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop16 Packetizer improvements Packetizer’s goal: optimize work distribution to process queries as fast as possible Packetizer’s goal: optimize work distribution to process queries as fast as possible Standard TPacketizer’s strategy Standard TPacketizer’s strategy first process local files, than try to process remote data first process local files, than try to process remote data End-of-query bottleneck End-of-query bottleneck What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans Active workers Processing time

27/03/2007G. Ganis, ROOT Users Workshop17 New strategy: TAdaptivePacketizer Predict processing time of local files for each worker Predict processing time of local files for each worker Keep assigning remote files from start of the query to workers expected to finish faster Keep assigning remote files from start of the query to workers expected to finish faster Processing time improved by up to 50% Processing time improved by up to 50% Remote packets Samescale Processing rate for all packets for all packets NEW OLD

27/03/2007G. Ganis, ROOT Users Workshop18 Progress in using additional software Package enabling Package enabling Separated behaviour client / cluster Separated behaviour client / cluster Real-time feedback during build Real-time feedback during build Load mechanism extended to single class / macro Load mechanism extended to single class / macro Selectors / macros / classes binaries are now cached Selectors / macros / classes binaries are now cached Decreases initialization time Decreases initialization time API to modify include / library paths on the workers API to modify include / library paths on the workers Use packages globally available on the cluster Use packages globally available on the cluster root[] TProof *proof = TProof::Open(“master”) root[] proof->Load(“MyClass.C”) What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop19 Restructuring of PROOF modules Reduce dependencies Reduce dependencies Better control size of executables (proofserv) Better control size of executables (proofserv) Faster worker startup Faster worker startup First step: First step: Get rid of TVirtualProof and PROOF dependencies in ‘tree’ Get rid of TVirtualProof and PROOF dependencies in ‘tree’ All PROOF in ‘proof’, ‘proofx’, ‘proofd’ All PROOF in ‘proof’, ‘proofx’, ‘proofd’ Still ‘proofserv’ needs a lot of libs Still ‘proofserv’ needs a lot of libs 2nd step (current situation): 2nd step (current situation): Separate out TProofPlayer, TPacketizer, … in ‘proofplayer’ (new libProofPlayer, v ) Separate out TProofPlayer, TPacketizer, … in ‘proofplayer’ (new libProofPlayer, v ) proofserv size on workers reduced by a factor of ~2 at startup proofserv size on workers reduced by a factor of ~2 at startup What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop20 Further optimization of PROOF libs Differentiate setups on client and cluster Differentiate setups on client and cluster Client: Client: Needs graphics Needs graphics May not need all experiment software May not need all experiment software TSelector: compile only Begin() and Terminate() TSelector: compile only Begin() and Terminate() Servers: Servers: Need all experiment software Need all experiment software Do not need graphics Do not need graphics TSelector: do not compile Begin() and Terminate() TSelector: do not compile Begin() and Terminate() Client and Server versions of basic libs Client and Server versions of basic libs What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop21 Additional improvements (incomplete) GUI controller GUI controller Integration of the data set manager Integration of the data set manager Integration of the new features of package manager Integration of the new features of package manager Improved session / query history bookkeeping Improved session / query history bookkeeping Improved user-friendliness of parameter setting Improved user-friendliness of parameter setting Automatic support dynamic environment setting Automatic support dynamic environment setting proofserv is a script launching proofserv.exe proofserv is a script launching proofserv.exe Envs to define the context in which to run Envs to define the context in which to run Useful for experiment specific settings (see later) and/or for debugging purposes (e.g. run valgrind on worker …) Useful for experiment specific settings (see later) and/or for debugging purposes (e.g. run valgrind on worker …) What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans root[] TProof *proof = TProof::Open(“master”) root[] proof->SetParameter(“factor”, 1.1)

27/03/2007G. Ganis, ROOT Users Workshop22 Integration with experiment software Finding, using the experiment software Finding, using the experiment software Environment settings, libraries loading Environment settings, libraries loading Implementing the analysis algorithms Implementing the analysis algorithms TSelector framework TSelector framework Structured analysis and automated interaction with trees (chains) (+) Structured analysis and automated interaction with trees (chains) (+) Tightly coupled with the tree (-) Tightly coupled with the tree (-) New analysis implies new selector New analysis implies new selector Change in the tree definition implies a new selector Change in the tree definition implies a new selector May conflict with existing experiment technologies May conflict with existing experiment technologies Add new layer to hide details irrelevant for the end-user Add new layer to hide details irrelevant for the end-user What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop23 Setting the environment Experiment software available on nodes Experiment software available on nodes Additional dedicated software handled by the PROOF package manager Additional dedicated software handled by the PROOF package manager Allows user to run her/his own modifications Allows user to run her/his own modifications The experiment environment can be set The experiment environment can be set Statically (e.g. ALICE) Statically (e.g. ALICE) before starting xrootd (inherited by proofserv) before starting xrootd (inherited by proofserv) Dynamically (e.g. CMS) Dynamically (e.g. CMS) evaluating a user defined script in front of proofserv evaluating a user defined script in front of proofserv Allows to select different versions at run time Allows to select different versions at run time What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop24 Dynamic environment setting: CMS CMS needs to run SCRAM before proofserv CMS needs to run SCRAM before proofserv PROOF_INITCMD contains the path of a script (NEW) PROOF_INITCMD contains the path of a script (NEW) The script initializes the CMS environment using SCRAM The script initializes the CMS environment using SCRAM TProof::AddEnvVar(“PROOF_INITCMD”, “~maartenb/proj/cms/CMSSW_1_1_1/setup_proof.sh”) #!/bin/sh # Export the architecture export SCRAM_ARCH=slc3_ia32_gcc323 # Init CMS defaults cd ~maartenb/proj/cms/CMSSW_1_1_1. /app/cms/cmsset_default.sh # Init runtime environment scramv1 runtime -sh > /tmp/dummy cat /tmp/dummy What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop25 Examples of implementing analysis algorithms ALICE: ALICE: Generic AliSelector hiding details Generic AliSelector hiding details User’s selector derives from AliSelector User’s selector derives from AliSelector Access to ESD event by member fESD Access to ESD event by member fESD Alternative technology using tasks Alternative technology using tasks See J.F. Grosse-Oetringhaus talk See J.F. Grosse-Oetringhaus talk TAM PHOBOS TAM PHOBOS Based on modularized tasks Based on modularized tasks Separate analysis tasks from interaction with tree Separate analysis tasks from interaction with tree See C. Reed at ROOT05 See C. Reed at ROOT05 What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop26 CMSSW: provides EDAnalyzer for analysis CMSSW: provides EDAnalyzer for analysis Algorithms with a well defined interface can be used with both technologies (EDAnalyzer and TSelector) Algorithms with a well defined interface can be used with both technologies (EDAnalyzer and TSelector) Used in a TSelector templated framework TFWLiteSelector Used in a TSelector templated framework TFWLiteSelector Selector libraries distributed as PAR file Selector libraries distributed as PAR file Analysis algorithms in CMS What’s new since ROOT05 (Very) quick introduction What’s new since ROOT05 Current developments and plans class MyAnalysisAlgorithm { void process( const edm::Event & ); void postProcess( TList & ); void terminate( TList & ); }; // Load framework library gSystem->Load(“libFWCoreFWLite”); // Load TSelector library gSystem->Load(“libPhysicsToolsParallelAnalysis”); TSelector *mysel = new TFWLiteSelector

27/03/2007G. Ganis, ROOT Users Workshop27 Current developments and plans Scheduling Scheduling Consolidation, error handling Consolidation, error handling Improved but still cases when we lose control of the session Improved but still cases when we lose control of the session Processing error report Processing error report Associate to a query an object detailing what went wrong (e.g. data set elements not analyzed) and why Associate to a query an object detailing what went wrong (e.g. data set elements not analyzed) and why Non-input-file-driven based analysis Non-input-file-driven based analysis Current processing is based on tree or object files Current processing is based on tree or object files Local multi-core desktop optimization Local multi-core desktop optimization No daemons, UNIX sockets (no master?) No daemons, UNIX sockets (no master?) GUI: integration in a more general GUI ROOT controller GUI: integration in a more general GUI ROOT controller

27/03/2007G. Ganis, ROOT Users Workshop28 PROOF exploiting multi-cores Alice search for  0 ’s Alice search for  0 ’s 4 GB simulated data 4 GB simulated data Instantaneous rates Instantaneous rates (evt/s, MB/s) (evt/s, MB/s) Clear advantage of Clear advantage of quad core quad core Additional computing Power fully exploited Demo at Intel Quad-Core Launch – Nov 2006

27/03/2007G. Ganis, ROOT Users Workshop29 PROOF: scheduling multi-users Fair resource sharing Fair resource sharing System scheduler not enough if N users >= ~ N workers / 2 System scheduler not enough if N users >= ~ N workers / 2 Enforce priority policies Enforce priority policies Two approaches Two approaches Quota-based worker level load balancing Quota-based worker level load balancing Simple and solid implementation, no central unit Simple and solid implementation, no central unit Group quotas defined in the configuration file Group quotas defined in the configuration file Central scheduler Central scheduler Per-query decisions based on cluster load, resources need by the query, user history and priorities Per-query decisions based on cluster load, resources need by the query, user history and priorities Generic interface to external schedulers planned Generic interface to external schedulers planned MAUI, LSF, … MAUI, LSF, … Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop30 Quota-based worker level load balancing Lower priority processes slowdown Lower priority processes slowdown sleep before next packet request sleep before next packet request Sleeping time proportional to the used CPU time Sleeping time proportional to the used CPU time factor depends on # users and the quotas factor depends on # users and the quotas Example: userA, quota 2/3; userB, quota 1/3 Example: userA, quota 2/3; userB, quota 1/3 After T seconds: After T seconds: CPU(A) = T/2, CPU(B) = T/2 CPU(A) = T/2, CPU(B) = T/2 Sleep B form T/2 seconds Sleep B form T/2 seconds After T + T/2 seconds After T + T/2 seconds CPU(A) = T/2 + T/2 = 2 * CPU(B) = T/2 CPU(A) = T/2 + T/2 = 2 * CPU(B) = T/2 General case of N users brings a tri-diagonal linear system General case of N users brings a tri-diagonal linear system Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop31 Quota-based worker level load balancing Group quotas defined in the xrootd configuration file Group quotas defined in the xrootd configuration file Factors recalculated by the master XPD each time that a user start or ends processing Factors recalculated by the master XPD each time that a user start or ends processing Only active users considered Only active users considered A low priority user will get 100% of resources when alone A low priority user will get 100% of resources when alone Under linux processes SCHER_RR system scheduling enforced Under linux processes SCHER_RR system scheduling enforced The default, dynamic, SCHED_OTHER scheme screws up the all idea, as sleeping processes get higher priority at restart The default, dynamic, SCHED_OTHER scheme screws up the all idea, as sleeping processes get higher priority at restart xpd.group tpc usra,usrb xpd.grpparam tpc quota:70% Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop32 Demo Same sample analysis (h1 slightly slowed-down) repeated for 20 times Same sample analysis (h1 slightly slowed-down) repeated for 20 times 2 users 2 users gganis: reserved quota 70% gganis: reserved quota 70% ganis: taking what left ganis: taking what left Histogram show processing rate in MB/s Histogram show processing rate in MB/s Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop33 Demo Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/2007G. Ganis, ROOT Users Workshop34 Central scheduling Entity running on master XPD, loaded as plug-in Entity running on master XPD, loaded as plug-in Abstract interface XrdProofSched defined Abstract interface XrdProofSched defined Input: Input: Query info (via XrdProofServProxy ->proofserv) Query info (via XrdProofServProxy ->proofserv) Cluster status via OLBD control network Cluster status via OLBD control network Policy Policy Output: Output: List of workers to continue with List of workers to continue with Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans class XrdProofSched { … public: virtual int GetWorkers(XrdproofServProxy *xps, std::list &wrks)=0; … };

27/03/2007G. Ganis, ROOT Users Workshop35 Central scheduling Current developments and plans (Very) quick introduction What’s new since ROOT05 Current developments and plans TProofPlayer(session) DatasetLookup TProof ClientMasterScheduler TPacketizer(query) XPD PLB (olbd) Schematic view Schematic view Needed ingredients: Needed ingredients: Full exploitation of the OLBD network Full exploitation of the OLBD network Come&Go functionality for workers Come&Go functionality for workers …

27/03/2007G. Ganis, ROOT Users Workshop36 Summary Several improvements in PROOF since ROOT05 Several improvements in PROOF since ROOT05 Coordinator functionality Coordinator functionality Data set manager Data set manager Resource control Resource control ALICE is stress testing the system in LHC environment using a test-CAF at CERN ALICE is stress testing the system in LHC environment using a test-CAF at CERN a lot of useful feedback a lot of useful feedback Efforts now concentrated on Efforts now concentrated on Further consolidation and optimization Further consolidation and optimization Scheduling Scheduling PROOF is steadily improving: getting ready for LHC data PROOF is steadily improving: getting ready for LHC data

27/03/2007G. Ganis, ROOT Users Workshop37 Credits PROOF team PROOF team M. Ballintijn, B. Bellenot, L. Franco, G.G., J. Iwaszkiewizc, F. Rademakers M. Ballintijn, B. Bellenot, L. Franco, G.G., J. Iwaszkiewizc, F. Rademakers J.F. Grosse-Oetringhaus, A. Peters (ALICE) J.F. Grosse-Oetringhaus, A. Peters (ALICE) A. Hanushevsky (SLAC) A. Hanushevsky (SLAC)

27/03/2007G. Ganis, ROOT Users Workshop38 Backup See also presentations at previous ROOT workshops and at CHEPxx See also presentations at previous ROOT workshops and at CHEPxx

27/03/2007G. Ganis, ROOT Users Workshop39 The ROOT data model: Trees & Selectors Begin() Create histos, … Define output list Process() preselection analysis Terminate() Final analysis (fitting, …) output list Selector loop over events OK event branch leaf branch leaf 12 n last n read needed parts only Chain branch leaf Backup

27/03/2007G. Ganis, ROOT Users Workshop40 Motivation for PROOF Provide an alternative, dynamic, approach to end- user HEP analysis on distributed systems Provide an alternative, dynamic, approach to end- user HEP analysis on distributed systems Typical HEP analysis is a continuous refinement cycle Typical HEP analysis is a continuous refinement cycle Data sets are collections of independent events Data sets are collections of independent events Large (e.g. ALICE ESD+AOD: ~350 TB / year) Large (e.g. ALICE ESD+AOD: ~350 TB / year) Spread over many disks and mass storage systems Spread over many disks and mass storage systems Exploiting intrinsic parallelism is the only way to analyze the data in reasonable times Exploiting intrinsic parallelism is the only way to analyze the data in reasonable times Implement algorithm Run over data set Make improvements Backup

27/03/2007G. Ganis, ROOT Users Workshop41 The PROOF approach catalog Storage PROOF farm scheduler query MASTER PROOF query: data file list, myAna.C files feedbacks final outputs (merged)  farm perceived as extension of local PC  same syntax as in local session  more dynamic use of resources  real time feedback  automated splitting and merging Backup

27/03/2007G. Ganis, ROOT Users Workshop42 PROOF design goals Transparency Transparency Minimal impact on the ROOT user habits Minimal impact on the ROOT user habits Scalability Scalability Full exploitation of the available resources Full exploitation of the available resources Adaptability Adaptability Cope transparently with heterogeneous environments Cope transparently with heterogeneous environments Preserve Real-time interaction and feedback Preserve Real-time interaction and feedback Intended for Intended for Central Analysis Facilities Central Analysis Facilities Departmental workgroup computing facilities (Tier-2’s) Departmental workgroup computing facilities (Tier-2’s) Multi-core / multi-disk desktops Multi-core / multi-disk desktops Backup

27/03/2007G. Ganis, ROOT Users Workshop43 PROOF dynamic load balancing Pull architecture guarantees scalability Pull architecture guarantees scalability Adapts to variations in performance Adapts to variations in performance Worker 1Worker N Master Backup

27/03/2007G. Ganis, ROOT Users Workshop44 PROOF intrinsic scalability Strictly concurrent user jobs Strictly concurrent user jobs at CAF (100% CPU used) at CAF (100% CPU used) In-memory data In-memory data Dual Xeon, 2.8 GHz Dual Xeon, 2.8 GHz CMS analysis CMS analysis 1 master, 80 workers 1 master, 80 workers Dual Xeon 3.2 GHz Dual Xeon 3.2 GHz Local data: 1.4 GB / node Local data: 1.4 GB / node Non-Blocking GB Ethernet Non-Blocking GB Ethernet 1 user 2 users 4 users 8 users I. Gonzales, Cantabria Backup

27/03/2007G. Ganis, ROOT Users Workshop45 PROOF essentials: what can be done? Ideally everything made of independent tasks Ideally everything made of independent tasks Currently available: Currently available: Processing of trees Processing of trees Processing of independent objects in a file Processing of independent objects in a file Tree processing and drawing functionality complete Tree processing and drawing functionality complete // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // MySelec is a TSelector root[1] c->Process(“MySelec.C+”); // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // Start PROOF and tell the chain // to use it root[1] TProof::Open(“masterURL”); root[2] c->SetProof() // Process goes via PROOF root[3] c->Process(“MySelec.C+”); PROOFLOCAL Backup

27/03/2007G. Ganis, ROOT Users Workshop46 The PROOF target Short analysis using local resources, e.g. - end-analysis calculations - visualization Long analysis jobs with well defined algorithms (e.g. production of personal trees) Medium term jobs, e.g. analysis design and development using also non-local resources  Optimize response for short / medium jobs  Perceive medium as short Backup

27/03/2007G. Ganis, ROOT Users Workshop47 PROOF: additional remarks Intrinsic serial overhead small Intrinsic serial overhead small requires reasonable connection between a (sub-)master and its workers requires reasonable connection between a (sub-)master and its workers Hardware considerations Hardware considerations IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better than 1 big one IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better than 1 big one Good amount of RAM for efficient data caching Good amount of RAM for efficient data caching Data access is The Issue: Data access is The Issue: Optimize for data locality, when possible Optimize for data locality, when possible Low-latency access to mass storage Low-latency access to mass storage Backup

27/03/2007G. Ganis, ROOT Users Workshop48 PROOF: data access issues Low latency in data access is essential for high performance Low latency in data access is essential for high performance Not only a PROOF issue Not only a PROOF issue File opening overhead File opening overhead Minimized using asynchronous open techniques Minimized using asynchronous open techniques Data retrieval Data retrieval caching, pre-fetching of data segments to be analyzed caching, pre-fetching of data segments to be analyzed Recently introduced in ROOT for TTree Recently introduced in ROOT for TTree Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluated Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluated Backup

27/03/2007G. Ganis, ROOT Users Workshop49 PROOF: PAR archive files Allow client to add software to be used in the analysis Allow client to add software to be used in the analysis Simple structure Simple structure package/ package/ Source / binary files Source / binary files package/PROOF-INF/BUILD.sh package/PROOF-INF/BUILD.sh How to build the package (makefile) How to build the package (makefile) package/PROOF-INF/SETUP.C package/PROOF-INF/SETUP.C How to enable the package (load, dependencies) How to enable the package (load, dependencies) A PAR is a gzip’ed tar-ball of the package tree A PAR is a gzip’ed tar-ball of the package tree Versioning support being added Versioning support being added Backup

27/03/2007G. Ganis, ROOT Users Workshop50 PROOF essentials: monitoring Internal Internal File access rates, packet latencies, processing time, etc. File access rates, packet latencies, processing time, etc. Basic set of histograms available at tunable frequency Basic set of histograms available at tunable frequency Client temporary output objects can also be retrieved Client temporary output objects can also be retrieved Possibility of detailed tree for further analysis Possibility of detailed tree for further analysis MonALISA-based MonALISA-based Each host reports Each host reports CPU, memory, CPU, memory, swap, network swap, network Each worker reports Each worker reports CPU, memory, evt/s, CPU, memory, evt/s, IO vs. network rate IO vs. network rate pcalimonitor.cern.ch:8889 pcalimonitor.cern.ch:8889 pcalimonitor.cern.ch:8889 Network traffic between nodes Backup

27/03/2007G. Ganis, ROOT Users Workshop51 PROOF GUI controller Allows full on-click control Allows full on-click control define a new session define a new session submit a query, execute submit a query, execute a command a command query editor query editor create / pick up a chain create / pick up a chain choose selectors choose selectors online monitoring of feedback histograms online monitoring of feedback histograms browse folders with results of query browse folders with results of query retrieve, delete, archive functionality retrieve, delete, archive functionality Backup