1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***,

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
H.G.Essel: Go4 - J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel, S.Linev CHEP 2004 Go4 v2.8 Analysis Design.
June, 20013rd ROOT Workshop1 PROOF and ROOT Grid Features Fons Rademakers.
A Presentation Management System for Collaborative Meetings Krzysztof Wrona (ZEUS) DESY Hamburg 24 March, 2003 ZEUS Electronic Meeting Management System.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 PROOF & GRID Update Fons Rademakers. 2 Parallel ROOT Facility The PROOF system allows: parallel execution of scripts parallel analysis of trees in a.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Sept 11, 2003ROOT Day1, Suzanne Panacek39 ROOT An object oriented HEP analysis framework. Day 1.
ROOT An object oriented HEP analysis framework.. Computing in Physics Physics = experimental science =>Experiments (e.g. at CERN) Planning phase Physics.
PROOF - Parallel ROOT Facility Kilian Schwarz Robert Manteufel Carsten Preuß GSI Bring the KB to the PB not the PB to the KB.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Interactive Data Analysis with PROOF Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers CERN.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Evolution of Parallel Programming in HEP F. Rademakers – CERN International Workshop on Large Scale Computing VECC, Kolkata.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CMS pixel data quality monitoring Petra Merkel, Purdue University For the CMS Pixel DQM Group Vertex 2008, Sweden.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
ROOT Future1 Some views on the ROOT future ROOT Workshop 2001 June 13 FNAL Ren é Brun CERN.
H.G.Essel: Go4 - J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel, S.Linev CHEP 2003 GSI Online Offline Object Oriented Go4.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
H.G.Essel: Go4 - J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel, S.Linev ROOT 2002 GSI Online Offline Object Oriented Go4.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Status Report on the Validation Framework S. Banerjee, D. Elvira, H. Wenzel, J. Yarba Fermilab 15th Geant4 Collaboration Workshop 10/06/
A prototype for an extended PROOF What is PROOF ? ROOT analysis model … … on a multi-tier architecture Status New development Prototype based on XRD Demo.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
1 Status of PROOF G. Ganis / CERN Application Area meeting, 24 May 2006.
March 13, 2006PROOF Tutorial1 Distributed Data Analysis with PROOF Fons Rademakers Bring the KB to the PB not the PB to the KB.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
Overview, Major Developments, Directions1 ROOT Project Status Major developments Directions NSS05 Conference 25 October Ren é Brun CERN Based on my presentation.
S.Linev: Go4 - J.Adamczewski, H.G.Essel, S.Linev ROOT 2005 New development in Go4.
September, 2002CSC PROOF - Parallel ROOT Facility Fons Rademakers Bring the KB to the PB not the PB to the KB.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Sept. 2000CERN School of Computing1 PROOF and ROOT Grid Features Fons Rademakers.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Go4 Workshop J.Adamczewski-Musch, S.Linev Go4 advanced features.
ROOT and PROOF Tutorial Arsen HayrapetyanMartin Vala Yerevan Physics Institute, Yerevan, Armenia; European Organization for Nuclear Research (CERN)
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
ROOT : Outlook and Developments WLCG Jamboree Amsterdam June 2010 René Brun/CERN.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Go4 v2.2 Status & Overview CHEP 2003
PROOF – Parallel ROOT Facility
Chapter 2: System Structures
G. Ganis, 2nd LCG-France Colloquium
Support for ”interactive batch”
PROOF - Parallel ROOT Facility
Gordon Erlebacher Florida State University
Presentation transcript:

1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***, Derek Feichtinger****, Gerardo Ganis**, Guenter Kickinger**, Andreas Peters**, Fons Rademakers** * - MIT ** - CERN *** - FNAL **** - PSI

2 Marek BiskupACAT2005PROO F Outline ■ Data analysis model of ROOT ■ Overview of PROOF ■ Recent developments ■ Future plans - Interactive-Batch data analysis

3 Marek BiskupACAT2005PROO F ROOT Trees ■ Tree – main data structure of ROOT ■ Set of records (entries) ■ Record may contain basic C types (int, double, arrays) and any C++ object, polymorphic object, collection, stl collection, etc, e.g.: stl::list tracks; Electrons ➔ Int_t NoElectrons; ➔ Double_tMomentum[NoElectrons][4]; ➔ Float_tPosition[NoElectrons][4]; Muons ➔ Int_tNoMuons; ➔ … ■ Provide efficient access to partial entry data ■ Typical size < 2GB

4 Marek BiskupACAT2005PROO F Trees in memory and in files Each Leaf is an object (c++ object, array, basic type). Each Branch groups several Leafs/Branches. tr T.Fill() T.GetEntry(6) T Memory Branch Leaf

5 Marek BiskupACAT2005PROO F Tree data storage on disk Tree Header describing data structure Buffer 1 Buffer 2 … Branch data split into compressed Buffers

6 Marek BiskupACAT2005PROO F ROOT Trees - GUI 8 leaves of a branch named ‘Electrons’ Double-click to histogram a leaf 8 Branches of tree named ‘T’

7 Marek BiskupACAT2005PROO F Tree Friends tr Public read Public read User Write Entry # 8 Behave in exactly the same way as a single Tree!

8 Marek BiskupACAT2005PROO F ROOT Chains ■ A typical Tree: < 2GB – you can process it on your laptop ■ Chain – list of trees e.g files – the processing takes long time! File 1, entries File 2, entries File 3, entries Behave in exactly the same way as a single Tree!

9 Marek BiskupACAT2005PROO F Tree Viewer Drag and drop variables to create expressions And click the Draw button

10 Marek BiskupACAT2005PROO F Chain.Draw() chain.Draw() is a function called by the GUI for drawing chain.Draw( “sumetc:nevent:nrun”, “”, “col”); chain.Draw( “nevent:nrun”, “”, “lego”);

11 Marek BiskupACAT2005PROO F Advanced data processing ■ Preprocessing and initialization ■ Processing each entry (loop over all files and entries in each file) ■ Post processing and clean-up Do not assume anything about the order in which Process() is called for different entries! Selectors contain only the functions important for processing We read only one branch

12 Marek BiskupACAT2005PROO F ROOT Analysis Model Client Local file Remote file (dCache, Castor, RFIO, Chirp) Rootd/xrootd server ROOT standard model  Files analyzed on a local computer  Remote data accessed via remote fileserver (rootd/xrootd)

13 Marek BiskupACAT2005PROO F Data transfer takes time. PROOF Bring the KiloBytes to the PetaBytes and not the PetaBytes to the KiloBytes ■ Parallel interactive analysis of ROOT Data ■ Using the same ROOT Selectors (transparency!) ■ Execution on clusters of heterogeneous computers (scalability!) Normal Laptop/PC can process up to 10MB/s. Current experiments and LHC need much more

14 Marek BiskupACAT2005PROO F PROOF Basic Architecture Slaves Client Master Files Commands, scripts Histograms, plots Single-Cluster mode  The Master divides the work among the slaves  After the processing finishes, merges the results (histograms, scatter plots)  And returns the result to the Client

15 Marek BiskupACAT2005PROO F Workflow for tree analysis Initialization Process Wait for next command Slave 1 Process(“ana.C”) Packet generator Initialization Process Wait for next command Slave NMaster GetNextPacket () SendObject(histo) Add histograms Return results 0, , , , , ,40 440,50 590,60 Process(“ana.C”)

16 Marek BiskupACAT2005PROO F PROOF and Selectors No user’s control of the entries loop! Many Trees are being processed Initialize each slave The code is shipped to each slave and SlaveBegin(), Init(), Process(), SlaveTerminate() are executed there The same code works also without PROOF.

17 Marek BiskupACAT2005PROO F PROOF Sequential mode Client Master Files Commands, scripts Histograms, plots, canvases  The Master executes scripts (Selectors) and returns results to the Client  Canvases will be fetched from the Master automatically  Pseudo-remote desktop (better than XWindow for WAN) From the users point of view it works in the same way as the standard proof mode The canvas is automatically displayed after the processing has finished Executes the selector and creates an off-screen canvas

18 Marek BiskupACAT2005PROO F PROOF – Drawing a histogram Chains may be also created automatically by a query to a grid catalog

19 Marek BiskupACAT2005PROO F GUI and real time feedback Feedback histogram, updated every (e.g.) 1 second Chain definition (header) is fetched from the PROOF master

20 Marek BiskupACAT2005PROO F Current Limitations of PROOF Processing blocks the client Permanent connection to the master. No dynamic usage of the GRID. ■ Intended for interactive usage: Typical queries time – several minutes. ■ Designed to work on a local cluster with static configuration. Originally:

21 Marek BiskupACAT2005PROO F Typical Queries Interactive/Batch queries GUI Commands scripts Batch connected connected or disconnected

22 Marek BiskupACAT2005PROO F Analysis session snapshot What are planning to implement: AQ1: 1s query produces a local histogram AQ2: a 10mn query submitted to PROOF1 AQ3->AQ7: short queries AQ8: a 10h query submitted to PROOF2 BQ1: browse results of AQ2 BQ2: browse temporary results of AQ8 BQ3->BQ6: submit 4 10mn queries to PROOF1 CQ1: Browse results of AQ8, BQ3->BQ6 Monday at 10h15 ROOT session On my laptop Monday at 16h25 ROOT session On my laptop Wednesday at 8h40 session on any web browser

23 Marek BiskupACAT2005PROO F Planned features ■ Session disconnect and reconnect ■ Asynchronous queries ■ Start-up of slaves via Grid job scheduler ■ Allow slaves to join/leave the computation ■ Slaves calling out to master (firewalls)

24 Marek BiskupACAT2005PROO F PROOF on the GridPROOF USER SESSION PROOF SLAVE SERVERS PROOF MASTER SERVER PROOF SLAVE SERVERS Guaranteed site access through PROOF Sub-Masters calling out to Master (agent technology) PROOF SUB-MASTER SERVERS PROOF PROOF PROOF Grid/Root Authentication Grid Access Control Service TGrid UI/Queue UI Proofd Startup Grid Service Interfaces Grid File/Metadata Catalogue Client retrieves list of logical files (LFN + MSN)

25 Marek BiskupACAT2005PROO F Summary ■ ROOT is a powerful analysis framework with very efficient data storage mechanisms. ■ PROOF works well for interactive parallel ROOT data analysis on a local cluster Fully integrated with ROOT – you can use chains with PROOF in the same way as locally. You can use the same Selectors you’ve written for local processing. But it was designed for short-duration interactive queries. ■ PROOF is evolving: we plan to accommodate longer running queries. Disconnect from and reconnect to a running query. Non-Blocking queries. Dynamic configuration (using the GRID).

26 Marek BiskupACAT2005PROO F Questions