Presentation is loading. Please wait.

Presentation is loading. Please wait.

3 rd May’03Nick Brook – 4 th LHC Symposium1 Data Analysis – Present & Future Nick Brook University of Bristol Generic Requirements & Introduction Expt.

Similar presentations


Presentation on theme: "3 rd May’03Nick Brook – 4 th LHC Symposium1 Data Analysis – Present & Future Nick Brook University of Bristol Generic Requirements & Introduction Expt."— Presentation transcript:

1 3 rd May’03Nick Brook – 4 th LHC Symposium1 Data Analysis – Present & Future Nick Brook University of Bristol Generic Requirements & Introduction Expt specific approaches:

2 3 rd May’03Nick Brook – 4 th LHC Symposium2 Detectors: ~2 orders of magnitude more channels than today Triggers must choose correctly only 1 event in every 400,000 High Level triggers are software-based Computer resources will not be available in a single location Complexity of the Problem

3 3 rd May’03Nick Brook – 4 th LHC Symposium3 Complexity of the Problem Major challenges associated with: Communication and collaboration at a distance Distributed computing resources Remote software development and physics analysis

4 3 rd May’03Nick Brook – 4 th LHC Symposium4 Analysis Software System Reconstruction Selection Analysis Re-processing 3 per year Iterative selection Once per month Different Physics cuts & MC comparison ~1 time per day ~25 Individual per Group Activity (10 6 –10 8 events) New detector calibrations Or understanding Trigger based and Physics based refinements Algorithms applied to data to get results 30 kSI2000sec/event 1 job year 30 kSI2000sec/event 1 job year 30 kSI2000sec/event 3 jobs per year 30 kSI2000sec/event 3 jobs per year 0.25 kSI2000sec/event ~20 jobs per month 0.25 kSI2000sec/event ~20 jobs per month 0.1 kSI2000sec/event ~500 jobs per day 0.1 kSI2000sec/event ~500 jobs per day Monte Carlo 50 kSI2000sec/event ~20 Groups’ Activity (10 9  10 7 events) 2GHz ~ 700 SI2000 Experiment- Wide Activity (10 9 events)

5 3 rd May’03Nick Brook – 4 th LHC Symposium5 Data Management tools Detector/Event Display Data Browser Analysis job wizards Generic analysis Tools Reconstruction Simulation LCGtools GRID Framework Distributed Data Store & Computing Infrastructure Expttools Stable User Interface Coherent set of basic tools and mechanisms Software development and installation Analysis Software System

6 3 rd May’03Nick Brook – 4 th LHC Symposium6 Philosophy we want to perform analysis from day 1 (now) ! building on Grid tools/concepts to simplify distributed environment Time Hype Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity Trigger

7 3 rd May’03Nick Brook – 4 th LHC Symposium7 Data Challenges & Production Tools All experiments have well-developed production tools for co-ordinated data challenges e.g. CHEP talks on DIRAC – Distributed Infrastructure with Remote Agent Control Tools provide management of workflows, job submission, monitoring, book-keeping, …

8 3 rd May’03Nick Brook – 4 th LHC Symposium8 AliEn (ALIce ENvironment) is an attempt to gradually approach and tackle computing problems at LHC scale and implement ALICE Computing Model Main features –Distributed file catalogue built on top of RDBMS –File replica and cache manager with interface to MSS CASTOR,HPSS,HIS… AliEnFS – Linux file system that uses AliEn File Catalogue and replica manager –SASL based authentication which supports various authentication mechanisms (including Globus/GSSAPI) –Resource Broker with interface to batch systems LSF,PBS,Condor,BQS,… –Various user interfaces command line, GUI, Web portal –Package manager (dependencies, distribution…) –Metadata catalogue –C/C++/perl/java API –ROOT interface (TAliEn) –SOAP/Web Services –EDG compatible user interface Common authentication Compatible JDL (Job description language) based on CONDOR ClassAds

9 3 rd May’03Nick Brook – 4 th LHC Symposium9 (…) DBIDBD RDBMS (MySQL) LDAP V.O. Packages & Commands Perl CorePerl Modules External Libraries File & Metadata Catalogue SOAP/XML CE SE Logger Database Proxy Authentication RB User Interface ADBI Config Mgr Package Mgr Web Portal User Application API (C/C++/perl) CLI GUI AliEn Core Components & servicesInterfacesExternal software Low levelHigh level FS AliEn Architecture

10 3 rd May’03Nick Brook – 4 th LHC Symposium10  ALICE have deployed a distributed computing environment which meets their experimental needs Simulation & Reconstruction Event mixing Analysis  Using Open Source components (representing 99% of the code), internet standards (SOAP,XML, PKI…) and scripting language (perl) has been a key element - quick prototyping and very fast development cycles  close to finalizing AliEn architecture and API  OpenAliEn?

11 3 rd May’03Nick Brook – 4 th LHC Symposium11 PROOF – The Parallel ROOT Facility  Collaboration between core ROOT group at CERN and MIT Heavy Ion Group  Part of and based on ROOT framework  Uses heavily ROOT networking and other infrastructure classes  Currently no external technologies Motivation: ointeractive analysis of very large sets of ROOT data files on a cluster of computers ospeed up the query processing by employing parallelism oto extend from a local cluster to a wide area “virtual cluster” - GRID. oanalyze a globally distributed data set and get back a “single” result with “single” query

12 3 rd May’03Nick Brook – 4 th LHC Symposium12 PROOF – Parallel Script Execution root Remote PROOF Cluster proof TNetFile TFile Local PC $ root ana.C stdout/obj node1 node2 node3 node4 $ root root [0].x ana.C $ root root [0].x ana.C root [1] gROOT->Proof(“remote”) $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] chain->Process(“ana.C”) ana.C #proof.conf slave node1 slave node2 slave node3 slave node4 *.root proof proof = master server *.root proof proof = slave server TFile

13 3 rd May’03Nick Brook – 4 th LHC Symposium13 PROOF & the Grid

14 3 rd May’03Nick Brook – 4 th LHC Symposium14 Converter Algorithm Event Data Service Persistency Service Data Files Algorithm Transient Event Store Detec. Data Service Persistency Service Data Files Transient Detector Store Message Service JobOptions Service Particle Prop. Service Other Services Histogram Service Persistency Service Data Files Transient Histogram Store Application Manager Converter Gaudi – ATLAS/LHCb software framework

15 3 rd May’03Nick Brook – 4 th LHC Symposium15 GAUDI Program GANGA GUI JobOptions Algorithms Collective & Resource Grid Services Histograms Monitoring Results GANGA: Gaudi ANd Grid Alliance Joint Atlas and LHCb project, Based on the concept of Python bus: use different modules whichever are required to provide full functionality of the interface use Python to glue this modules, i.e., allow interaction and communication between them

16 3 rd May’03Nick Brook – 4 th LHC Symposium16 Python Software Bus Server Bookkeeping DB Production DB EDG UI PYTHON SW BUS XML RPC server XML RPC module GANGA Core Module OS Module Athena\ GAUDI GaudiPython PythonROOT PYTHON SW BUS GUI Job Configuration DB Remote user (client) Local Job DB LAN/WAN GRID LRMS

17 3 rd May’03Nick Brook – 4 th LHC Symposium17  Most of base classes are developed. Serialization of objects (user jobs) is implemented with the Python pickle module.  GaudiApplicationHandler can access Configuration DB for some Gaudi applications (Brunel). It is implemented with the xmlrpclib module. Ganga can create user-customized Job Options files using this DB.  DaVinci and AtlFast application handlers are implemented  Various LRMS are implemented - allows to submit and to get simple monitoring information for a job on several batch systems.  Much of GRID-related functionality is already implemented in GridJobHandler using EDG testbed 1.4 software. Ganga can submit, monitor, and get output from GRID jobs.  JobsRegistry class provides jobs monitoring via multithreaded environment based on Python threading module  GUI available - using wxPython extension module  ALPHA release available Current Status

18 3 rd May’03Nick Brook – 4 th LHC Symposium18 … …… … Reconstruction, L1, HLT ORCA DST CMS analysis/production chain … …… … Digitization ORCA Digis: raw data bx Analysis Iguana/ Root/PAW Ntuples: MC info, tracks, etc DST stripping ORCA ………… MB … …… … MC ntuples Event generation PYTHIA  b/  e/  JetMet Calibration Detector simulation OSCAR Detector Hits MB … …… …

19 3 rd May’03Nick Brook – 4 th LHC Symposium19 Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Local disk User Tier 1/2 Tier 0/1/2 Tier 3/4/5 TAGs/AODs data flow Physics Query flow Production data flow TAG and AOD extraction/conversion/transport services Data extraction Web service(s) Local analysis tool: Iguana/ROOT/… Web browser Query Web service(s) Tool plugin module CMS components and data flows

20 3 rd May’03Nick Brook – 4 th LHC Symposium20  Grid-enabling the working environment for physicists' data analysis  Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence.  The server will provide a remote API to Grid tools: Client RPC Web Server Clarens Service http/https  The Virtual Data Toolkit: Object collection access  Data movement between Tier centres using GSI-FTP  CMS analysis software (ORCA/COBRA),  Security services provided by the Grid (GSI)  No Globus needed on client side, only certificate Current prototype is running on the Caltech proto-Tier2 CLARENS – a CMS Grid Portal

21 3 rd May’03Nick Brook – 4 th LHC Symposium21 CLARENS  Proxy escrow  Client access available from wide variety of languages  PYTHON  C/C++  Java application  Java/Javascript browser-based client  Access to JetMET data via SQL2ROOT  Root access to remote data files  Access to files managed by San Diego SC storage resource broker (SRB) Several web services applications have been built on the Clarens web service architectures:

22 3 rd May’03Nick Brook – 4 th LHC Symposium22 Summary all 4 expts have successfully “managed” distributed production many lessons learnt – not only by expt but useful feedback to m/w providers a large degree of automisation achieved Expts moving onto next challenge – analysis Chaotic, unmanaged access to data & resources Tools already (being) developed to aid Joe Bloggs Success will be measured in terms: Simplicity, stability & effectiveness Access to resources Management & access to data Ease of development of user applications


Download ppt "3 rd May’03Nick Brook – 4 th LHC Symposium1 Data Analysis – Present & Future Nick Brook University of Bristol Generic Requirements & Introduction Expt."

Similar presentations


Ads by Google