Download presentation
Presentation is loading. Please wait.
Published byAlannah Morris Modified over 9 years ago
A prototype for an extended PROOF What is PROOF ? ROOT analysis model … … on a multi-tier architecture Status New development Prototype based on XRD Demo G. Ganis / CERN PH-SFT, June 2005
The ROOT analysis model: Trees Main data structure in ROOT, extending the concept of PAW ntuple Collection of independent entries Organized in Leafs (basic type, array, C++ object) Branches (collection of Leafs / Branches)
The ROOT analysis model: Trees (cnt’d) Efficient access to portions of entry data Several facilities to work with trees Tree friends ( TTree::AddFriend ): extend an existing tree without touching it e.g. an experiment read-only tree with user-specific branches / leafs Tree chains ( TChain ) list of trees to make tree size virtually unbounded (typical size of single tree is < 2 GB) In all cases the result behaves exactly as a single tree
The ROOT analysis model: Selector TSelector : main tool to define the data processing strategy Simple structure Framework automatically generated for a tree tree->MakeSelector(“MySelector”) void MySelector::Begin(TTree *tree) { // method called before starting the event loop fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt); fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.); } Bool_t MySelector::Process(Long64_t entry) { // Method called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt); } void MySelector::Terminate() { // method called when the event loop is over fPtHist->Draw(); } Read only what is needed by the algorithm
The ROOT analysis model: h1 analysis example { // localProcessing.C // Define the data set TChain a("h42"); a.Add("/home/ganis/rootdata/dstarmb.root"); a.Add("/home/ganis/rootdata/dstarp1a.root"); a.Add("/home/ganis/rootdata/dstarp1b.root"); a.Add("/home/ganis/rootdata/dstarp2.root"); // Process the selector a.Process("h1analysis.C"); } root [0].x localProcessing.C Starting h1analysis with process option: Processing file: /home/ganis/rootdata/dstarmb.root Processing file: /home/ganis/rootdata/dstarp1a.root Processing file: /home/ganis/rootdata/dstarp1b.root Processing file: /home/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01 Real time 0:00:17.563133, CP time 5.880
PROOF Why ? Data to be analyzed only rarely can be all local Data transfer of full data sets takes time Goal: provide a tool for interactive analysis on a heterogeneous cluster exploit inter-independence of entries in a tree basic parallelism achieved by splitting the data into packets of variable size distributed to participant nodes Focus on: Transparency same selectors, … on PROOF as in local session Scalability linear scaling up to large number of workers (tested up to 1000) Adaptability cope automatically with different cluster configurations and varying running conditions / perfomances Motto: Bring the KiloBytes to the PetaBytes and not the PetaBytes to the KiloBytes
PROOF: architecture
PROOF: connection layer … client slave 1 master proofserv proofd proofslave proofd slave n proofslave proofd fork() execv() parent proofd (always running) child proofd (transforming in proofserv / proofslave) proofserv / proofslave : TProofServ instances
PROOF: simplified message flow
PROOF: workflow
PROOF: data access strategies Each slave get assigned, as much as possible, packets representing data in local files If no (more) local data, get remote data via (x)rootd, rfiod or dCache (needs good LAN, like GB eth) In case of SAN/NAS just use round robin strategy
PROOF: processing algorithms TSelector adapted to PROOF Natural additions Input list: code to be run, … Output list: results Methods to initialize and finalize processing within a slave Method to init a tree void MySelector::Begin(TTree *tree){ // called in the client for local inits } void MySelector::SlaveBegin(TTree *tree) { // called in each slave before processing fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.); fOutput->Add(fPtHist); } void MySelector::Init(TTree *tree) { // called at each tree change fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt); } Bool_t MySelector::Process(Long64_t entry){ // called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt); } void MySelector::SlaveTerminate() { // called in each slave after processing } void MySelector::Terminate() { // called in the client after processing fPtHist->Draw(); } Defines the list of objects wanted back Objects with Merge() method are automatically merged in Terminate The modified TSelector works also in non-PROOF sessions
PROOF: the data Data set: dedicated class TDSet Specifies a collection of files with objects Understands logical file names Could be return by a query to a database or file catalog or … API very close to TChain { // proofProcessing.C // Define the data set TDSet a(“TTree”,"h42"); a.Add(“root://"); a.Add(“root://"); a.Add(“root://"); a.Add(“root://"); // Process the selector a.Process("h1analysis.C"); }
root[0] gROOT->Proof(“”) PROOF set to parallel mode (10 slaves) root[1].x proofProcessing.C Starting h1analysis with process option: Processing file: /tmp/ganis/rootdata/dstarp1a.root Processing file: /tmp/ganis/rootdata/dstarp2.root Starting h1analysis with process option: Processing file: //tmp/ganis/rootdata/dstarmb.root Processing file: //tmp/ganis/rootdata/dstarp1b.root Processing file: //tmp/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01 root[2] PROOF: running the query Executing …
PROOF: additional features Possibility to upload and / or build additional packages packed as PAR file (Proof ARchive, as Java JAR …) gProof->UploadPackage(“MyPackage.par”) gProof->EnablePackage(“MyPackage”) Cache system to minimize the number of file transfers File identity and integrity using message digest technology Feedback information at configurable time intervals
PROOF: realtime feedback Feedback histogram, updated every (e.g.) 1 second Chain definition (header) is fetched from the PROOF master
PROOF on clusters PROOF can use “resource brokers” to find out where to start the slaves PROOF can use file catalogs to locate the files to be analysed Concrete examples: Interface with Condor Computing-On-Demand system master start the slaves as COD jobs PEAC: PROOF-Enabled Analysis Cluster Complete event analysis solution: data catalog, resource broker, PROOF TGrid: abstract Grid interface for all Grid services Concrete implementation for Alien // Connect TGrid *alien = TGrid::Connect(“alien”); // Query TGridResult *res = alien->Query(“lfn:///alice/simulation/2001-04/V0.6*.root“); // Data set TDSet *treeset = new TDSet("TTree", "AOD"); treeset->Add(res); // use files in result set to find remote nodes gROOT->Proof(res); treeset->Process(“myselector.C”);
PROOF: current limitations Originally intended for short queries TDSet::Process blocks until is done Stateful connection everything is lost if the connection is lost or cut Originally designed for a local cluster static configuration Robustness of some components Interrupt control-flow based on Out-Of-Band messages Authentication when different protocols are required at different steps Sandbox when user account not available Documentation
PROOF: team for new developments Maarten Ballintijn Marek Biskup Rene Brun Derek Feichtinger (ARDA) G.G. Guenter Kickinger Andreas Peters (ARDA) Fons Rademakers
PROOF: new development fields Interactive batch stateless connection non blocking queries Robusteness Get rid of OOB messages Setup/ configuration issues zero-config setup allow slaves to come and go Grid interfacing efficient use of grid information (catalogs, resource brokers, …) Performance issues targeted read ahead, improved caching, query estimators Authentication Adopt XROOTD framework Analysis issues: Tree friends, event lists, indices GUI, Browsing
Typical query-time distribution
XPD: communication layer for PROOF based on XROOTD Transfer of state from the client to the PROOF cluster requires a manager on the cluster side keeping track of existing sessions and query submissions XROOTD (in ROOT since v 4.01.02), provides a generic main component (xrd) for handling of networking issues and protocol scheduling, and utilities tools (forking, error handling, security, …) on which the manager can be based on Candidate to introduce interactive-batch mode: possibility to leave a session if a query takes too long and reconnect later to pick-up the results non-blocking query submission: possibility to detach from the query while being processed (even for potentially short queries) more robust authentication system
How does XROOTD work Multi-component server based on a multi-thread architecture xrd component: provides networking, thread management, protocol scheduling Minimal sets of threads: Acceptor: opens connection; matches the protocol; submits job to scheduler Pollers: react to any activity on open links; submit job to scheduler Scheduler: schedules work to be done (jobs) Worker(s): wait for job to be done Buffer manager: dynamically optimizes use of memory buffers Workers created / destroyed following needs Links not attached to a specific worker: first worker free takes the job Jobs ≡ data/information to be processed for a given link
How does XROOTD work accept WN scheduler BM XROOTD XrdJob poller files links XrdXrootdProtocol one XrdXrootdProtocol instance per physical connection (i.e. per client session) client gateway to the files: used to communicate with all the files the client wants to access on that specific server
How does XPROOFD work accept WN scheduler XPROOFD XrdJob poller proofserv links XrdProotdProtocol one XrdProotdProtocol instance per physical connection (i.e. per client session) client gateway to proofserv static area keeps all the relevant information about a user and its activities on the cluster static area
XPROOFD: communication layer … client xc slave n XrdProofd PO slave 1 XrdProofd proofslave PO master XrdProofd proofserv PO xc PO xc XRD pollers TXPSocket xc proofslave xc fork()
Basic ingredients Client side: new class TXPSocket TSocket interface understanding the new communication protocol new class TXProofMgr reflects the status of a client vis-à-vis of a given cluster start / attach sessions, described by TProof instances (no more unique) Server side: new implementation of XrdProtocol, XrdProofdProtocol client gateway to the cluster, one-to-one relation to TXProofMgr static area to describing the persistent information (server lifetime) new class XrdProofSrv proxy to the external processor (proofserv), submitted queries, results, … one per external processor
TXPSocket Separate thread for receiving messages Intensive use of unsolicited messages normal asynchronous messages (i.e. in Collect) interrupts (no OOB) ping functionality Synchronous and asynchronous messages posted in separate queues Interrupt handler waken up with internal SIGURG (from reader to main thread) Ping treated as a special interrupt (level 0)
TXPSocket – Reader thread sync msg async msg interrupts SIGURG Post event recv() TCP connection
XPD: Demo! Results achieved with the realistic prototype Multi-sessions Disconnect / Reconnect Process: blocking query Submit: non-blocking query Finalize results from different sessions Archive results to /afs using same daemon as file server
XPD: what next Deep test of the communication layer latencies synchronization problems Test with large realistic number of slaves Alternatives for internal connection Enable authentication XROOTD load balancing?
Other studies Advanced prototype using a communication layer based on memory mapped message queue technology (A. Peters, D. Feichtinger): full state in message queues nice recovery features multi-thread master queue insertion, configuration, scheduler, packetizer client frontend slave splitting in supervisor and processors not attached to a specific user better use of resources
Summary Lot of activity going on to improve the PROOF system Working prototype with a communication layer based on XROOTD exists interactive batch, multi-session, reconnect Alternative studies may provided good solutions for some issues Goal: have the new system in good shape for ROOT05
Similar presentations
© 2025 Inc.
All rights reserved.