Big Computing in High Energy Physics

Slides:

Advertisements

Similar presentations

1 The K2K Experiment (Japan) Production of neutrinos at KEK and detection of it at Super-K Data acquisition and analysis is in progress Goals Detection.

Advertisements

Guy Almes¹, Daniel Cruz¹, Jacob Hill 2, Steve Johnson¹, Michael Mason 1 3, Vaikunth Thukral¹, David Toback¹, Joel Walker² 1.Texas A&M University 2.Sam.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback January 2015 Big Computing and the Mitchell Institute for Fundamental.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

January 2011 David Toback, Texas A&M University Texas Junior Science and Humanities Symposium 1 David Toback Texas A&M University Texas Junior Science.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.

March 2011 David Toback, Texas A&M University Davidson Scholars 1 David Toback Texas A&M University Davidson Scholars March 2011 The Big Bang, Dark Matter.

Computer Science Prof. Bill Pugh Dept. of Computer Science.

HEP Prospects, J. Yu LEARN Strategy Meeting Prospects on Texas High Energy Physics Network Needs LEARN Strategy Meeting University of Texas at El Paso.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback Texas A&M University Research Topics Seminar September 2012 Cosmology.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.

25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.

Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

I590 Physics Comments. Comments I Hadn’t heard much about this before Contribution to fundamental knowledge generally considered worth it “Particle Physics”

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.

8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback Texas A&M University Mitchell Institute for Fundamental Physics.

Supercomputing in High Energy Physics Horst Severini OU Supercomputing Symposium, September 12-13, 2002.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University.

Large Hadron Collider Big Science for the 21 st Century David Lantz.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

…building the next IT revolution From Web to Grid…

Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.

Outline  Higgs Particle Searches for Origin of Mass  Grid Computing  A brief Linear Collider Detector R&D  The  The grand conclusion: YOU are the.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

John Womersley 1/13 Fermilab’s Future John Womersley Fermilab May 2004.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

October David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy January 2016 Big Computing in.

1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.

1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.

Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

Particle Physics Sector Young-Kee Kim / Greg Bock Leadership Team Strategic Planning Winter Workshop January 29, 2013.

HIGH ENERGY PHYSICS DATA WHERE DATA COME FROM, WHERE THEY NEED TO GO In their quest to understand the fundamental nature of matter and energy, Fermilab.

Grid Computing at NIKHEF Shipping High-Energy Physics data, be it simulated or measured, required strong national and trans-Atlantic.

Stanford Linear Accelerator

A Dutch LHC Tier-1 Facility

The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez

CERN presentation & CFD at CERN

for the Offline and Computing groups

Cosmology and Particle Physics in a Single Experiment

Dagmar Adamova, NPI AS CR Prague/Rez

LHC DATA ANALYSIS INFN (LNL – PADOVA)

Readiness of ATLAS Computing - A personal view

LQCD Computing Operations

US CMS Testbed.

OffLine Physics Computing

Big Computing in High Energy Physics David Toback

Prospective graduate student open house

Grid Computing Done by: Shamsa Amur Al-Matani.

Particle Physics Theory

Stanford Linear Accelerator

Using an Object Oriented Database to Store BaBar's Terabytes

Stanford Linear Accelerator

Presentation transcript:

Big Computing in High Energy Physics SECOND ANNUAL TEXAS A&M RESEARCH COMPUTING SYMPOSIUM David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy June 2018 October 2011

Big Computing in High Energy Physics Outline Particle Physics and Big Computing Big Collaborations and their needs: What makes it hard? What makes it easy? Individual Experiments Collider Physics/LHC/CMS Dark Matter/CDMS Experiment Phenomenology How we get our needs met: Brazos Cluster Accomplishments and Status Lessons learned and Requirements for the future Conclusions June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Big Computing in High Energy Physics Particle Physics and Big Computing High Energy Physics = Particle Physics All activities within the Mitchell Institute Includes both theory and experiment, as well as other things like String Theory and Astronomy Big picture of out Science: Making discoveries at the interface of Astronomy, Cosmology and Particle Physics Big Computing/Big Data needs from 4 user/groups CMS Experiment at the Large Hadron Collider (LHC) Dark Matter Search using the CDMS Experiment High Energy Phenomenology Other (mostly CDF experiment at Fermilab, and Astronomy group) June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

What makes it Easy? What makes it Hard? Advantages: Physics goals well defined Algorithms well defined, much of the code is common We already have the benefits of lots of high quality interactions with scientists around the world Lots of world-wide infrastructure and brain power for data storage, data management, shipping and processing Massively parallel data (high throughput, not high performance) Disadvantages Need more disk space than we have (VERY big data) Need more computing than we have (VERY big computing) Need to move the data around the world for analysis, for 1000’s of scientists at hundreds of institutions Political and security issues Need to support lots of code we didn’t write Being able to bring, store and process lots of events locally is a major competitive science advantage Better students, more and higher profile results, more funding etc. National labs provide huge resources, but they aren’t enough June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

CMS Experiment at the LHC (Discovery of the Higgs) Collider Physics at CERN/Fermilab have often been the big computing drivers in the world (brought us the WWW and still drive Grid computing worldwide) Experiments have a 3-tiered, distributed computing model on the Open Science Grid to handle the 10’s of petabytes and hundreds of millions of CPU hours each month Taking data now At A&M: Run jobs as one of many GRID sites as part of an international collaboration (we are a Tier 3) 4 Faculty, with ~20 postdocs + students June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Dark Matter Searches with the CDMS Experiment Much smaller experiment (~100 scientists), but same computing issues Only 100’s of Tb, and just millions of CPU-hrs /Month 3 faculty, ~15 postdoc + students Today: Most computing done at A&M, most data storage at Stanford Linear Accelerator Center (SLAC) Big roles in Computing Project Management Future: Will start taking data again in 2020 Experiment will produce 10’s Tb/month Will share event processing with national labs (SLAC, Fermilab, SNOLab and PNNL) We will be a Tier 2 center June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics 6

Particle Theory/ Phenomenology Particle Phenomenologist's at MIST do extensive theoretical calculations and VERY large simulations of experiments: Collider data to see what can be discovered at LHC Dark Matter detectors for coherent neutrino scattering experiments Interface between Astronomy, Cosmology and Particle Physics Just need high throughput, a few Gb of memory and a few 10’s of Tb Jobs don’t talk to each other June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Overview of Brazos* and Why we use it (and not somewhere else) HEP not well suited to existing Supercomputing at A&M because of experimental requirements Jobs and data from around the world (Grid Computing/Open Science Grid) Firewall issues for external users Automated data distribution and local jobs regularly accessing remote databases Well matched to the Brazos cluster: high THROUGHPUT, not high PERFORMANCE Just run LOTS of independent jobs on multi-Tb datasets Have gotten a big bang for our by buck by cobbling together money to become a stakeholder Purchased ~$300k of computing/disk over the last few years from the Department of Energy, ARRA, NHARP, Mitchell Institute and College of Science 336 Compute nodes/3992 cores for the cluster, Institute owns 800 cores, can run on other cores opportunistically(!) Priority over idle!!! ~300 Tb of disk, Institute owns about 150Tb, can use extra space if available Can get ~1Tb/hour from Fermilab, 0.75Tb/hr from SLAC Links to places around the world (CERN-Switzerland, DESY-Germany, CNAF-Spain, UK, FNAL-US, Pisa, CalTech, Korea, France Etc.) Accepts jobs from Open Science Grid (OSG) Excellent support from Admins – Johnson has a well oiled machine of a team! More detail on how we run at: http://collider.physics.tamu.edu/mitchcomp *Special Thanks to Mike Hall and Steve Johnson for their Leadership and Management June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Fun Plots about how well we’ve done Cumulative CPU Hours More than 50M core-hours used! CPU-hrs per month Picked up speed with new operating system and sharing rules Many months over 1.5M core-hours/month June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

More fun numbers: Sharing the Wealth Best single month 1.95M Total core-hours used 1,628,972 core hours Katrina Colletti (CDMS) Lots of individual users: 7 over 1M integrated core-hrs 35 over 100k More than 70 over 1k Well spread over Dark Matter, LHC, Pheno and other June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Big Computing in High Energy Physics Students Already had half a dozen PhD’s using the system, with many graduations coming in the next year or so Two of my students who helped bring up the system have gone on to careers in Scientific Computing: Mike Mason: Now Computing professional at Los Alamos Vaikunth Thukral Now Computing professional at Stanford Linear Accelerator Center (SLAC) on LSST (Astronomy) June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Big Computing in High Energy Physics Some Lessons Learned Monitoring how quickly data gets transferred can tell you if there are bad spots in the network locally as well as around the world Found multiple bad/flaky boxes in Dallas using PerfSonar Monitoring how many jobs each user has running tells you how well the batch system is doing fair-share and load balancing Much harder than it looks, especially since some users are very “bursty”: They don’t know exactly when they need to run, but when they do they have big needs NOW (telling them to plan doesn’t help) Experts that know both the software and the Admin is a huge win Useful to have users interface with local software experts (my students) as the first line of defense before bugging Admins at A&M and elsewhere around the world Hard to compete with national labs as they are better set up for “collaborative work” since they trust collaborations, but we can’t rely on them alone Upside to working at the lab: Much more collective disk and CPU, important data stored locally Downside: No one gets much of the disk or CPU (most of our users could use both, but choose to work locally if they can). Need something now? Usually too bad Different balancing of security with ability to get work done is difficult June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Big Computing in High Energy Physics Online Monitoring Constantly interrogate the system Disks up? Jobs running? Small data transfers working? Run short dummy-jobs for various test cases Both run local jobs as well as accept automated jobs from outside, or run jobs that write off-site Automated alarms for “first line of defense” team, but can be sent to the Admin team Send email as well as make the monitoring page Red More detail about our monitoring at http://hepx.brazos.tamu.edu/all.html June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Big Computing in High Energy Physics Conclusions High Energy Particle Physicists are (and have been) Leaders in Big Data/Big Computing for decades Local users continue this tradition at A&M by effectively using the Brazos cluster for our High Throughput needs We are happy to help others Want to help us? The ability to get data here quickly has ameliorated short term problems for now, but we need much more disk and CPU Have used Brazos and Ada for Simulations, but Ada has limited our ability to store data and run jobs. Getting Ada on the OSG might make us try again. Priority over idle would make it worth it The amount of red tape to get jobs in, and allow our non-A&M colleagues to run, has been significant (but not insurmountable). Slowly getting better Provide software support in addition to Admins Bottom line: Been happy with the Brazos cluster (thanks Admins!) as it helped us discover the Higgs Boson, and we hope it will be well supported as our needs grow June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics

Big Computing in High Energy Physics Abstract High Energy Particle Physicists are (and have been) leaders in Big Data/Big Computing for decades. In this talk we will focus on the Big Collaborations (including the Large Hadron Collider that recently discovered the Higgs boson) and their needs, as well as how we work with the rest of our collaborators doing dark matter searches, astronomy and large scale theoretical calculations/simulations. We will discuss our use of the Brazos cluster for the bulk of our computing needs because it has both allowed us to cope with our High Throughput requirements as well as our issues with working with collaborators, data and software from around the world in a grid computing environment. Finally, we will present some results on how well things have worked, as well as some comments about what has worked and what would be helpful in the future. June 2018 Research Computing Symposium David Toback Big Computing in High Energy Physics