Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva.

Similar presentations


Presentation on theme: "Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva."— Presentation transcript:

1 Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva max.sang@cern.ch

2 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 2 Introduction to HEP zAccelerators produce high intensity, high energy beams of particles like protons or electrons. zDetectors are huge, multi-layered electronic devices constructed around the points where the beams collide with targets or other beams. zPlanned and constructed by multinational collaborations of hundreds of people over several years. zOnce operational, they run for years (e.g. LEP program 1989-2000).

3 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 3 The Large Hadron Collider 27km circumference 100m below surface First beam 2006 CERN Eight underground caverns for detectors

4 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 4 CMS z Under construction now - ready 2006 z 21 m long, 15 m diameter z 12500 tons z As much iron as the Eiffel Tower z 1900 physicists from 31 countries

5 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 5 Introduction to HEP (II) z‘Events’ are like photographs of individual subatomic interactions taken by the detectors. zEvents produced at high rates (kHz-MHz) for months at a time with minimal human intervention. Analysis continues for years. zFundamental physics processes are quantum (probabilistic). They are uncorrelated (consecutive events unconnected) but occur at a wide range of frequencies - some very rare. Some are more interesting than others...

6 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 6 Introduction to HEP (III) zData are grouped into runs, periods, years. Calibrations, detector faults, beam conditions, etc. are associated with certain time periods, e.g. “The calorimeter was off during run 1234” z‘Event Generators’ simulate the collisions and and produce the final state particles. zThese are processed by simulated detectors to produce ‘Monte Carlo data’ for comparison with what we see in the real thing. Iterative process of comparison, tuning, model verification.

7 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 7 Extracting the Data zPassage of particles through detector components produces ionisation which is amplified to a detectable level. zFront-end electronics turn pulses into digits. zHardware processing turns digits into ‘hits’. zSoftware turns hits into ‘tracks’, ‘clusters’ etc. zMulti-level trigger/filter decides what events to keep (sometimes only one event in 10 7 ). z‘Online reconstruction’  storage.

8 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 8 The LEP Era (Started 1989) zFour detectors (300 people each) producing y50 kHz collision rate  5 Hz storage rate. yEvent size ~100kB, reconstructed by small farm of O(10) very high-end workstations. z< 500 GB/year/experiment yStored on tape (with disk caching) at CERN. yAnalysed on mainframes by remote batch jobs. yNtuples (  100MB) returned to user for more (interactive) analysis and calculation. Plots produced for presentations and papers.

9 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 9 The LHC Era (Starts 2006) z4 detectors (6k people in total) y50 MHz collision rate  100 Hz storage rate. y500 GB/s raw data rate after triggering. yEvent size 1-2 MB, reconstructed by farm of 1k PCs. z1 PB/year/experiment in 2007, increasing rapidly. Total by 2015 for all detectors = 100 PB. zSearches may look for single events in 10 7. Every user (in 30 countries) will want to eat millions of events at a single sitting, with reasonably democratic data access.

10 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 10 Physicists are also Programmers zAll data analysis done using computers zThe physicists are all ‘programmers’, but almost none of them have any formal CS training ySome will be very experienced (usually F77). Will write lots of code for reconstruction, triggering etc. yOthers write more modest programs for their own data analysis. ySome will be fresh graduate students who’ve never written a line of code. zOur job is to help them do physics.

11 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 11 What Software do they Need? zExperiment-specific code yTriggering, data acquisition, slow controls, reconstruction, new ‘physics code’ yMostly written by the experimentalists without assistance zEvent generators yHighly technical, constantly in flux yWritten by phenomenologists We don’t help with these!

12 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 12 What Software do they Need?(II) zSpecialised HEP tools yDetector simulation tools, relativistic kinematics,... zGeneral purpose scientific tools with a HEP slant yData visualisation, histogramming,... zGeneral purpose technical libraries yRandom numbers, matrices, geometry, analytical statistics, 2D and 3D graphics,... We do help with these!

13 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 13 The Situation in ~1995 zMillions of lines of F77, some of it very technical zThousands of man-years of debugging zUsers know and love/hate the software, and they don’t want to change zSerious and unavoidable maintenance commitment for old code - F77 is here to stay! zShrinking manpower in IT division zNot long until the start of the LHC programme. Change now or wait until 2020!

14 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 14 The Old Software zLargely home-grown in 70s and 80s: yPersistent storage and memory management: ZEBRA yCode management: PATCHY yScripting: KUIP/COMIS yHistograms and Ntuples: HBOOK yDetector simulation: GEANT 3 yFitting & Minimisation: MINUIT yMathematics, random numbers, kinematics: MATHLIB yGraphics: HIGZ/HPLOT yVisualisation and interactive analysis: PAW

15 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 15 The Anaphe Project zProvide a modern, object-oriented, more flexible, more powerful replacement for CERNLIB with fewer people in less time. zIdentify areas where commercial and/or Open Source products can (or must) be used instead of home-grown solutions zConcentrate efforts on HEP-specific tasks zUse object-oriented techniques and plan for very long term maintenance and evolution zDetector simulation is a separate project (v. big)

16 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 16 Commodity Solutions Luckily, computing has also evolved. What can we get off-the-shelf? zOpen Source tools yCode management (CVS) yGraphics (Qt, OpenGL) yScripting (Python, Perl) zCommercial products yPersistency (Objectivity OODB) yMathematics (Nag library ‘CERN edition’)

17 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 17 HEP Community Developments Not everything is being done solely at CERN! zCLHEP - C++ class libraries for HEP yRandom numbers y3D geometry, vectors, matrices, kinematics yUnits and dimensions yGeneric HEP classes (particles, decay chains etc) zGenerators being moved (slowly) to C++ zThe competition (JAS, Open Scientist, Root)

18 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 18 Anaphe C++ Libraries (I) zFitting: FML (fitting and minimisation library) yFlexible, extensible library based on Gemini engine yGemini - core fitting engine based on Nag or MINUIT zHistograms: HTL (histogram template library) yHistograms are statistical distributions of measured quantities - the workhorse of HEP analysis. Must be flexible, extensible and very efficient.

19 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 19 Anaphe C++ Libraries (II) zQPlotter: Graphics package yFor drawing histograms and more yBased on Qt (superset of Motif) zNtupleTag yExtends concept of ntuple (~ static table of data) yCan add with new columns as you work yCan navigate back to original events ySmart clustering of data ySee Zsolt’s presentation...

20 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 20 Interactive Analysis zAnalysis in HEP = ‘Data Mining’ yExtract parameters from large multi-dimensional samples. zTypical tasks: yPlot one or more variables with cuts on yet others - exploring the variable space. yPerform statistical tests on distributions (fitting, moments etc.) yProduce histograms etc. for papers or talks.

21 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 21 Interactive Analysis (II) zAlmost all analyses begin as interactive ‘playing’ with the data and progress organically to large, complex, CPU intensive procedures. zStep 1: single commands to a script interpreter e.g. “plot x for all events with y > 5” zStep 2: multi-command scripts/macros zStep 3: procedures can be translated into C++ functions and called interactively zStep 4: user can build new libraries and interact with them through the command line (etc...)

22 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 22 Interactive Analysis (III) zThe progression from command line, to macro, to compiled library, should be smooth and simple. zDoing the easy things should be easy to allow rapid development and prototyping of algorithms. zDoing complex things then becomes significantly easier than starting from scratch in C++ zDistributed analysis must also be possible (see Kuba’s talk)

23 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 23 Lizard (I) zInteractive environment for data analysis using the other Anaphe components yFirst prototype (with limited functionality) available since CHEP 2000 yRe-design started in April 2000 yBeta version October 2000 yFull version out since June 2001 yMuch more work and testing to do, but already approaching (and surpassing) PAW functionality zEmbedded in Python

24 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 24 Lizard (II) zArchitecture: yEverything interacts with everything else through their abstract interfaces so the implementation is hidden. y‘Commander’ C++ classes load the implementation classes at run time and become proxies for them. yUse SWIG to generate ‘shadow’ classes from the Commander header files. These are compiled into the Python library and become accessible as new Python objects. ySwapping components at run time becomes trivial.

25 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 25 Lizard Screenshot

26 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 26 Behind the Scenes User Python Controller Shadow classes C++ interfaces C++ implementations Automatically generated by SWIG AIDA Interfaces Anaphe implementations

27 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 27 AIDA zUse of abstract interfaces promotes weak coupling between components. zAIDA (Abstract Interfaces for Data Analysis) project is extending this to community-wide standard interfaces which will allow use of C++ components in Java and vice versa. zDevelopers only need to learn one way of interacting with a ‘histogram’, which works with all compliant implementations.

28 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 28 Summary zHEP has (and has always had) serious computing requirements zThe old model (F77 monoliths) is no longer workable in the LHC era zNew software in C++ and Java uses modern software design to plan for the long term zAnaphe is CERN IT division’s contribution yFlexible, extensible, modular, efficient zThe LHC is coming and we must be ready!

29 Aug 2001Max Sang, CERN/IT, max.sang@cern.ch 29 Further information zMore information about the detectors and HEP in general yhttp://cmsinfo.cern.ch yhttp://cern.ch/atlas zCERN IT Division yhttp://cern.ch/IT zThe Anaphe project yhttp://cern.ch/Anaphe


Download ppt "Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva."

Similar presentations


Ads by Google