Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Similar presentations


Presentation on theme: "Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation."— Presentation transcript:

1 Computing and LHCb Raja Nandakumar

2 The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation  Study cp-violation  Indirect evidence of new physics  There are many other questions (of course)  The LHCb experiment has been built  Hope to answer some of these questions

3 The LHCb detector February 2002 Cavern ready for detector installation August 2008

4 How the data looks

5 The detector records …  >1 Million channels of data every bunch crossing  25ns between bunch crossings  Trigger reduces to about 2000 events/sec  ~7 Million events / hour  25 KB/s raw event size  4.3 TB/day  Not as much as ATLAS / CMS but still …  Assuming continuous operation  Breaks for fills, etc.  These events will need to be farmed out of CERN  Reconstructed and stripped at Tier-1s  Then replicated to all LHCb Tier-1 sites  Finally available for user analysis

6 The LHCb computing model CERN Production (T2/T1/T0) Simulation + digitization.digi Reconstruction (T1 / T0).rdst.digi Stripping (T1 / T0).dst.rdst T1 / T0.dst FTS User Analysis (T1/T0)

7 LHCb job submission  Computing distributed all over the world  Particle physics is collaborative across institutes in various nations  Both cpu, storage available at various sites  Welcome to the world of grid computing  Take advantage of distributed resources  Set up a framework for other disciplines also  Fault tolerant job execution.  Also used by Medicine, Chemistry, Space science, …  LHCb interface : DIRAC

8 What the user sees …  Submit job to the “grid”  Ganga (ATLAS/LHCb)  Sometimes needs a lot of persuasion  Usually the job comes back successful  On occasion problems seen  Frequently wrong parameters, code, …  Correct and resubmit

9 What the user does not see …

10 Requirements of DIRAC  Fault tolerance  Retries  Duplication  Failover  Guard against possible grid problems …  Network, timeouts  Drive failures  Systems hacked  Bugs in code  If it cannot go wrong, it still will  Caching  Watchdogs  Logs  Overloaded machine, service  Thread safety  Fire, Cooling problems

11 Submitting jobs on the grid  Two ways of submitting jobs  Push jobs out to a site’s batch system  The grid is a simple multiple batch system  Job waits at the site until it runs  Lose control of jobs when they leave us (LHCb)  Many things can change in the time between job submission and running  We only see the batch systems / queues  We do not see the status of the grid in real time  Cause of low success rate – previous experience  Load on site  Site temporary downtime  Change in job priority within the experiment  Pull jobs into the site  Pilot jobs

12 Pilot jobs  “Wrapper” jobs  Submitted to a site  If site is available, free & there are waiting jobs  Pilot job returns information at current time  Job may have resource requirements too …  Look at local environment and request job from DIRAC  DIRAC returns job with highest priority matching available resource  Internal job prioritisation within DIRAC  Has latest information on experiment priorities  Exit after a short delay if no matching job found  Have fine grained (level of worker node) view of the grid  Very high job success rate  Pioneered by LHCb  Very simple requirements for sites

13  Does all on previous slide  Refinements still needed (as always)  Job prioritisation still static  Dynamic job prioritisation on the way  Basic logs all in place  Not everything easy to view for user / shifter  Being improved  More improvements in resilience upcoming  DIRAC portal : http://lhcbweb.pic.eshttp://lhcbweb.pic.es  All needed information for LHCb users  Locating data, Job monitoring, …  Restricted information for outsiders  Grid privacy issues  Ganga + DIRAC the only official LHCb grid interface  Will support any reasonable use case

14 Successes …  A single machine is the DIRAC server  No particular load issues seen

15 Analysis also going on Comparison of different monte carlo

16 The occasional problem  Black hole worker nodes  Bad environment that cannot match jobs  Sink for our pilot jobs  Once sink for production jobs also  Migration from sl3 to sl4  Introduce short sleep time before pilot exits  DOS attack on CERN servers  Software being downloaded from CERN  Was done if software was not available locally  Now users do not install software

17 We donot understand …  Very very preliminary  Still working on understanding this  “Same” class of cpu-s at different sites CPU time scaled median for the cpu class

18 Now over to ATLAS …


Download ppt "Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation."

Similar presentations


Ads by Google