Alexei Klimentov Brookhaven National Laboratory

Slides:

Advertisements

Similar presentations

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Advertisements

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Workload Management Massimo Sgaravatto INFN Padova.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

Assessment of Core Services provided to USLHC by OSG.

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.

Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.

EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,

Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Extending the ATLAS PanDA Workload Management System for New Big Data Applications XLDB 2013 Workshop CERN, 29 May 2013 Alexei Klimentov Brookhaven National.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

Overview of ASCR “Big PanDA” Project Alexei Klimentov Brookhaven National Laboratory September 4, 2013, Arlington, TX PanDA UTA.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

PanDA & BigPanDA Kaushik De Univ. of Texas at Arlington BigPanDA Workshop, CERN October 21, 2013.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

BigPanDA Status Kaushik De Univ. of Texas at Arlington Alexei Klimentov Brookhaven National Laboratory OSG AHM, Clemson University March 14, 2016.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

Workload Management Workpackage

Overview of the Belle II computing

PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Readiness of ATLAS Computing - A personal view

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

US CMS Testbed.

WLCG Collaboration Workshop;

Introduction to Cloud Computing

Cloud Computing R&D Proposal

New strategies of the LHC experiments to meet

Presentation transcript:

Alexei Klimentov Brookhaven National Laboratory Extending the ATLAS PanDA Workload Management System For New Big Data Applications XXIV International Symposium on Nuclear Electronics and Computing Alexei Klimentov Brookhaven National Laboratory September 12, 2013, Varna

Main topics Introduction Large Hadron Collider at CERN ATLAS experiment ATLAS Computing Model and Big Data Experiment Computing Challenges PanDA : Workload Management System for Big Data 9/12/13 NEC 2013

Enter a New Era in Fundamental Science The Large Hadron Collider (LHC), one of the largest and truly global scientific projects ever built, is the most exciting turning point in particle physics. CMS LHCb Exploration of a new energy frontier Proton-proton and Heavy Ion collisions at ECM up to 14 TeV ALICE LHC ring: 27 km circumference TOTEM LHCf MOEDAL ATLAS Korea and CERN / July 2009 3

Proton-Proton Collisions at the LHC LHC delivered billions of collision events to the experiments from proton-proton and proton-lead collisions in the Run 1 period (2009-2013) → collisions every 50 ns = 20 MHz crossing rate 1.6 x 1011 protons per bunch at Lpk ~ 0.8x1034/cm2/s ≈ 35 pp interactions per crossing – pile-up → ≈ 109 pp interactions per second !!! in each collision ≈ 1600 charged particles produced enormous challenge for the detectors and for data collection/storage/analysis Raw data rate from LHC detector : 1PB/s This translates to Petabytes of data recorded world-wide (Grid) The challenge how to process and analyze the data and produce timely physics results was substantial, but at the end resulted in a great success GRID Candidate Higgs decay to four electrons recorded by ATLAS in 2012. 8/6/13

Our Task Theory Reality ATLAS Physics Goals We use experiments to inquire about what “reality” (nature) does Reality ATLAS Physics Goals Explore high energy frontier of particle physics Search for new physics Higgs boson and its properties Physics beyond Standard Model – SUSY, Dark Matter, extra dimensions, Dark Energy, etc Precision measurements of Standard Model parameters We intend to fill this gap Theory The goal is to understand in the most general; that’s usually also the simplest. - A. Eddington 9/12/13 NEC 2013

ATLAS Experiment at CERN A Thoroidal LHC ApparatuS is one of the six particle detectors experiments at Large Hadron Collider (LHC) at CERN One of two multi-purpose detectors The project involves more than 3000 scientists and engineers from 38 countries ATLAS has 44 meters long and 25 meters in diameter, weighs about 7,000 tons. It is about half as big as the Notre Dame Cathedral in Paris and weighs the same as the Eiffel Tower or a hundred 747 jets ATLAS Collaboration 3000 scientists 174 Universities and Labs From 38 countries More than 1200 students Bldg.40 CERN 6 Floors 8/6/13 Big Data Workshop. Knoxville, TN

ATLAS . Big Data Experiment 150 million sensors deliver data … 8/6/13 Big Data Workshop

Some numbers from ATLAS data Rate of events streaming out from High-Level Trigger farm ~400 Hz each event has a size of the order of 1.5MB about 107 events in total per day will have roughly 170 “physics” days per year thus about 109 evts/year, a few Pbyte “prompt” processing Reco time per event on std. CPU: < 30 sec (on CERN batch node) increases with pileup (more combinatorics in the tracking) simulating a few billions of events are mostly done at computing centers outside CERN Simulation very CPU intensive ~4 million lines of code (reconstruction and simulation) ~1000 software developers on ATLAS 9/12/13 NEC 2013

Reduce the data volume in stages Higgs Selection using the Trigger Level 1: Not all information available, coarse granularity Level 2: Reconstruct events Improved ability to reject events Level 3: High quality reconstruction algorithms, using information from all detectors 400 Hz 8/6/13 NEC 2013

ATLAS High-Level Trigger (Part) Total of 15,000 cores in 1,500 machines Reduce data volume in stages Select ONLY ‘interesting’ events Initial data rate (50 ns) : 40 000 000 events/s Selected and stored 400 events/s Disk Buffer Grid We really do throw away 99.9999% of LHC data before writing it to persistent storage 6/27/2013 Big Data in High Energy Physics

Data Analysis Chain Have to collect data from many channels on many sub-detectors (millions) Decide to read out everything or throw event away (Trigger) Build the event (put info together) Store the data Reconstruct data Analyze them Start with the output of reconstruction Apply event selection based on reconstructed objects quantities Estimate efficiency of selection Estimate background after selection Make plots do the same with a simulation correct data for detector effects Make final plots Compare data and theory Two main types of physics analysis at LHC Searching for new particles Making precision measurements Searches statistically limited More data is the way of improving the search If don’t see anything new set limits on what you have excluded Precision measurements Precision often limited by the systematic uncertainties Precision measurements of Standard Model parameters allows important tests of the consistency of the theory ~TB ~KB

Like looking for a single drop of water from the Geneve Jet d’Eau over 2+ days 9/12/2013 NEC 2013

ATLAS Data Challenge The ATLAS Data Challenge 800,000,000 proton-proton interactions per second 0.0002 Higgs per second ~150,000,000 electronic channels Starting from this event… We are looking for this “signature” Selectivity: 1 in 1013 Like looking for 1 person in a thousand world populations Or for a needle in 20 million haystacks!

Data Volume and Data Storage 1 event size 1.5 MByte x Rate 400 Hz Taking into account LHC duty cycle Order of 3 PBytes per year per experiment ATLAS total data storage 130+ PetaBytes distributed between O(100) computing centers “New” physics is rare and interesting events are like single drop from the Jet d’Eau 9/12/2013 NEC 2013

Big Data in the arts and humanities Letter of Benjamin Franklin to Lord Kames, April 11, 1767. Franklin warned British official what would happen if the English kept trying to control the colonists by imposing taxes, like the Stamp Act. He warned that they would revolt. The political, scientific and literary papers of Franklin comprise approx. 40 volumes containing approx. 100,000 documents.

Big Data in the arts and humanities George W. Bush Presidential Library: 200 million e-mails 4 million photographs A.Prescott slide

Big Data Has Arrived at an Almost Unimaginable Scale Library of congress Business e-mails sent per year 3000 PBytes Content Uploaded to Facebook each year. 182 PBytes ATLAS Managed Data Volume 130 PBytes Climate ATLAS Annual Data Volume 30 PBytes LHC Annual 15 PBytes Nasdaq Google search index 98 PBytes Youtube 15 PBytes Health Records 30 PBytes US census http://www.wired.com/magazine/2013/04/bigdata 9/12/13 NEC 2013

ATLAS Computing Challenges A lot of data in a highly distributed environment. Petabytes of data to be treated and analyzed ATLAS Detector generates about 1PB of raw data per second – most filtered out in real time by the trigger system Interesting events are recorded for further reconstruction and analysis As of 2013 ATLAS manages ~130 PB of data, distributed world-wide to O(100) computing centers and analyzed by O(1000) physicists Expected rate of data influx into ATLAS Grid ~40 PB of data per year in 2015 Very large international collaboration 174 Institutes and Universities from 38 countries Thousands of physicists analyze the data ATLAS uses grid computing paradigm to organize distributed resources A few years ago ATLAS started Cloud Computing RnD project to explore virtualization and clouds Experience with different cloud platforms : Commercial (Amazon, Google), Academic, National Now we are evaluating how high-performance and super-computers can be used for data processing and analysis 9/12/2013 NEC 2013

ATLAS Computing Model The LHC experiments rely on distributed computing resources World LHC Computing Grid – a global solution, based on Grid technologies/middleware Tiered structure Tier0 (CERN), 11 Tier1s, 140 Tier2s Capacity 350,000 CPU cores 200 PB of disk space 200 PB of tape space In ATLAS sites grouped into clouds for organizational reasons Possible communications Optical private network T0:T1 T1:T1 National network T1-T2 Restricted communications Inter-cloud T1:T2 Inter-cloud T2:T2 12/9/2013 Alexei Klimentov K.De slide

ATLAS Data Volume ATLAS data volume on the Grid sites (multiple replicas) TBytes 131.5 PBytes Derived data Raw data Simulated data Formats End Run 1 Data Distribution patterns are discipline dependent. (ATLAS RAW data volume ~3 PB/year, ATLAS Data Volume on Grid sites 131.5 PBytes)

How does 3 PB become 130+ PB Duplicate the raw data (not such a bad idea) Add a similar volume of simulated data (essential) Make a rich set of derived data products (some of them larger than the raw data) Re-create the derived data products whenever the software has been significantly improved (several times a year) and keep the old versions for quite a while Place up to 15§ copies of the derived data around the world so that when you “send jobs to the data” you can send it almost anywhere Do the math! § now far fewer copies due to demand-driven temporary replication – but much more reliance on wide-area networks

Processing the Experiment Big Data The simplest solution in processing LHC data is using data affinity for the jobs Data is staged to the site where the compute resources are located and data access by analysis code from local, site-resident storage However In distributed computing environment we don’t have enough disk space to host all our data at every Grid site Thus we distribute (pre-place) our data across our sites The popularity of data sets is difficult predict in advance Thus computing capacity at a site might not match the demand for certain data sets Different approaches are being implemented Dynamic or/and on demand data replication Dynamic : if certain data is popular over Grid (i.e. processed or/and analyzed often) make additional copies on other Grid sites (up to 15 replicas) On-demand : User can request local or additional data copy Remote access The popular data can be accessed remotely Both approaches have the underlying scenario that puts the WAN between the data and the executing analysis code 9/12/13 NEC 2013

PanDA – Production and Data Analysis System ATLAS computational resources are managed by PanDA Workload Management System (WMS) PanDA project was started in fall of 2005 by BNL and UTA groups Production and Data Analysis system An automated yet flexible workload management system which can optimally make distributed resources accessible to all users Adopted as the ATLAS wide WMS in 2008 (first LHC data in 2009) for all computing applications. Adopted by AMS in 2012, in pre-production by CMS. Through PanDA, physicists see a single computing facility that is used to run all data processing for the experiment, even though data centers are physically scattered all over the world. PanDA is flexible Insulates physicists from hardware, middleware and complexities of underlying systems In adapting to evolving hardware and network configuration Major groups of PanDA jobs Central computing tasks are automatically scheduled and executed Physics groups production tasks, carried out by group of physicists of varying size are also processed by PanDA User analysis tasks Now successfully manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users 9/12/2013 NEC 2013

PanDA Philosophy PanDA Workload Management System design goals Deliver transparency of data processing in a distributed computing environment Achieve high level of automation to reduce operational effort Flexibility in adapting to evolving hardware, computing technologies and network configurations Scalable to the experiment requirements Support diverse and changing middleware Insulate user from hardware, middleware, and all other complexities of the underlying system Unified system for central Monte-Carlo production and user data analysis Support custom workflow of individual physicists Incremental and adaptive software development 9/12/13 NEC 2013

PanDA. ATLAS Workload Management System Production managers Data Management System (DQ2) PanDA server production job https Local Replica Catalog (LFC) Local Replica Catalog (LFC) Local Replica Catalog (LFC) https submitter (bamboo) https Logging System define pull task/job repository (Production DB) https https https NDGF analysis job pilot https OSG job pilot submit arc ARC Interface (aCT) EGEE/EGI pilot condor-g End-user pilot scheduler (autopyfactory) Worker Nodes

ATLAS Distributed Computing. Number of concurrently running PanDA jobs (daily average). Aug 2012 - Aug 2013 Jobs Analysis MC Simulation 150k -> Includes central production and data (re)processing, user and group analysis on WLCG Grid Running on ~100,000 cores worldwide, consuming at peak 0.2 petaflops Available resources fully used/stressed

ATLAS Distributed Computing. Number of completed PanDA jobs (daily average. Max 1.7M jobs/day). Aug 2012 - Aug 2013 Jobs Analysis 1M -> Jobs types

ATLAS Distributed Computing. Data Transfer Volume in TB (weekly average). Aug 2012 - Aug 2013 TBytes CERN to BNL data transfer time . An average 3.7h to export data from CERN 6 PBytes ->

PanDA’s Success PanDA was able to cope with increasing LHC luminosity and ATLAS data taking rate Adopted to evolution in ATLAS computing model Two leading experiments in HEP and astro-particle physics (CMS and AMS) has chosen PanDA as workload management system for data processing and analysis. ALICE is interested in PanDA evaluation for Grid MC Production and Lеadership Computing Facilities. PanDA was chosen as a core component of Common Analysis Framework by CERN-IT/ATLAS/CMS project PanDA was cited in the document titled “Fact sheet: Big Data across the Federal Government” prepared by the Executive Office of the President of the United States as an example of successful technology already in place at the time of the “Big Data Research and Development Initiative” announcement 9/12/13 NEC 2013

Evolving PanDA for Advanced Scientific Computing Proposal titled “Next Generation Workload Management and Analysis System for BigData” – Big PanDA was submitted to ASCR DoE in April 2012. DoE ASCR and HEP funded project started in Sep 2012. Generalization of PanDA as meta application, providing location transparency of processing and data management, for HEP and other data-intensive sciences, and a wider exascale community. Other efforts PanDA : US ATLAS funded project Networking : Advance Network Services There are three dimensions to evolution of PanDA Making PanDA available beyond ATLAS and High Energy Physics Extending beyond Grid (Leadership Computing Facilities, Clouds, University clusters) Integration of network as a resource in workload management 9/12/13 NEC 2013

“Big PanDA” work plan Factorizing the code Extending the scope Factorizing the core components of PanDA to enable adoption by a wide range of exascale scientific communities Extending the scope Evolving PanDA to support extreme scale computing clouds and Leadership Computing Facilities Leveraging intelligent networks Integrating network services and real-time data access to the PanDA workflow 3 years plan Year 1. Setting the collaboration, define algorithms and metrics Year 2. Prototyping and implementation Year 3. Production and operations 9/12/2013 NEC 2013

“Big PanDA”. Factorizing the Core Evolving PanDA pilot Until recently the pilot has been ATLAS specific, with lots of code only relevant for ATLAS To meet the needs of the Common Analysis Framework project, the pilot is being refactored Experiments as plug-ins Introducing new experiment specific classes, enabling better organization of the code E.g. containing methods for how a job should be setup, metadata and site information handling etc, that is unique to each experiment CMS experiment classes have been implemented Changes are being introduced gradually, to avoid affecting current production PanDA instance @Amazon Elastic Compute Cloud (EC2) Port PanDA database back to mySQL VO independent It will be used as a test-bed for non-LHC experiments PanDA Instance with all functionalities is installed and running at EC2. Database migration from Oracle to MySQL is finished. The instance is VO independent. LSST MC production is the first use-case for the new instance Next step will be refactoring PanDA monitoring package 9/12/13 NEC 2013

“Big PanDA”. Extending the scope Why we are looking for opportunistic resources ? More demanding LHC environment after 2014 Higher energy, more complex collisions… We plan to record, process and analyze more data Physics motivated (Higgs coupling measurement, Physics beyond Standard Model – SUSY, Dark Matter, extra dimensions, Dark Energy, etc) The demands on computing resources to accommodate the Run2 physics needs increase HEP now risks to compromise physics because of lack of computing resources$. Has not been true for ~20 years Several venues to explore in the next years Optimizing/changing our workflows, both on analysis and on the Grid High-Performance Computing Leadership Computing Facilities Research and Commercial Clouds “ATLAS@home” Common characteristic to the opportunistic resources: we have to be agile in how we use them Quick onto them (software, data and workloads) when they become available Quick off them when they’re about to disappear Robust against their disappearing under us with no notice Use them until they disappear – don’t allow holes with unused cycles, fill them with fine grained workloads $ I.Bird (WLCG Leader) presentation at HPC and super-computing workshop for Future Science Applications (BNL, Jun 2013) 9/12/13 NEC 2013

Spikes in demand for computational resources Can significantly exceed available ATLAS Grid resources Lack of resources slows down pace of discovery 9/12/13 NEC 2013

“Big PanDA”. Extending the scope Compute Engine (GCE) preview project Google allocated additional resources for ATLAS for free ~5M cpu hours, 4000 cores for about 2 month, (original preview allocation 1k cores) Resources are organized as HTCondor based PanDA queue Centos 6 based custom built images, with SL5 compatibility libraries to run ATLAS software Condor head node, proxies are at BNL Output exported to BNL SE Work on capturing the GCE setup in Puppet Transparent inclusion of cloud resources into ATLAS Grid The idea was to test long term stability while running a cloud cluster similar in size to Tier 2 site in ATLAS Intended for CPU intensive Monte-Carlo simulation workloads Planned as a production type of run. Delivered to ATLAS as a resource and not as an R&D platform. We also tested high performance PROOF based analysis cluster 9/12/13 NEC 2013

Running PanDA on Google Compute Engine We ran for about 8 weeks (2 weeks were planned for scaling up) Very stable running on the Cloud side. GCE was rock solid. Most problems that we had were on the ATLAS side. We ran computationally intensive jobs Physics event generators, Fast detector simulation, Full detector simulation Completed 458,000 jobs, generated and processed about 214 M events Failed and Finished Jobs reached throughput of 15K jobs per day most of job failures occurred during start up and scale up phase 5/29/2013 Alexei Klimentov

“Big PanDA” for Leadership Computing Facilities Expanding PanDA from Grid to Leadership Class Facilities (LCF) will require significant changes in our system Each LCF is unique Unique architecture and hardware Specialized OS, “weak” worker nodes, limited memory per WN Code cross-compilation is typically required Unique job submission systems Unique security environment Pilot submission to a worker node is typically not feasible Pilot/agent per supercomputer or queue model Tests on BlueGene at BNL and ANL. Geant4 port to BG/P PanDA project at Oak-Ridge National Laboratory LCF Titan

Leadership Computing Facilities. Titan Slide from Ken Read 8/6/13 Big Data Workshop

“Big PanDA” project on ORNL LCF Get experience with all relevant aspects of the platform and workload job submission mechanism job output handling local storage system details outside transfers details security environment adjust monitoring model Develop appropriate pilot/agent model for Titan Collaboration between ANL, BNL, ORNL, SLAC, UTA, UTK Cross-disciplinary project - HEP, Nuclear Physics , High-Performance Computing

“Big PanDA”/ORNL Common Project. ATLAS PanDA Coming to Oak-Ridge Leadership Computing Facilities

“Big PanDA” at ORNL PanDA on Oak-Ridge Leadership Computing Facilities PanDA & multicore jobs scenario D.Oleynik PanDA on Oak-Ridge Leadership Computing Facilities PanDA deployment at OLCF was discussed and agreed, including AIMS project component Cyber-Security issues were discussed both for the near and longer term. Discussion with OLCF Operations ROOT based analysis is tested Payloads for TITAN CE (followed by discussion in ATLAS) 9/4/13 NEC 2013

Adding Network Awareness to PanDA LHC Computing model for a decade was based on MONARC model Assumes poor networking Connections are seen as not sufficient or reliable Data needs to be preplaced. Data comes from specific places Hierarchy of functionality and capability Grid sites organization in “clouds” in ATLAS Sites have specific functions Nothing can happen utilizing remote resources on the time of running job Canonical HEP strategy : “Jobs go to data” Data are partitioned between sites Some sites are more important (get more important data) than others Planned replicas A dataset (collection of files produced under the same conditions and the same SW) is a unit of replication Data and replica catalogs are needed to broker jobs Analysis job requires data from several sites triggers data replication and consolidation at one site or job splitting on several jobs running on all sites A data analysis job must wait for all its data to be present at the site The situation can easily degrade into a complex n-to-m matching problem There was no need to consider network as a resource in WMS in static data distribution scenario New networking capabilities and initiatives in the last 2 years (like LHCONE) Extensive standardized monitoring from network performance monitoring (perfSONAR) Traffic engineering capabilities Rerouting of high impact flows onto separate infrastructure Intelligent networking Virtual Network On Demand Dynamic circuits …and dramatic changes in computing models From strict hierarchy of connections becomes more of a mesh Data access over wide area “no division” in functionality between sites We would like to benefit from new networking capabilities and to integrate networking services with PanDA. We start to consider network as a resource on similar way as for CPUs and data storage 9/12/2013 NEC 2013

Network as a resource Optimal site selection to run PanDA jobs Take network capability into account in jobs assigning and task brokerage Assigned -> Activated jobs workflow Number of assigned jobs depend on number of running jobs – can we use network status to adjust rate up/down? Jobs are reassigned if transfer times out (fixed duration) – can knowledge of network status help reduce the timeout? Task brokerage Free disk space in Tier1 Availability of input dataset (a set of files) The amount of CPU resources = the number of running jobs in the cloud (static information system is not used) Downtime at Tier1 Already queued tasks with equal or higher priorities High priority task can jump over low priority tasks Can knowledge of network help Can we consider availability of network as a resource, like we consider storage and CPU resources? What kind of information is useful? Can we consider similar (highlighted )factors for networking? 9/12/2013 NEC 2013

Intelligent Network Services and PanDA Quick re-run a prior workflow (bug found in reconstruction algo) Site A has enough job slots but no input data Input data are distributed between sites B,C and D, but sites have a backlog of jobs Jobs may be sent to site A and at the same time virtual circuits to connect sites B,C,D to site will be built. VNOD will make sure that such virtual circuits have sufficient bandwidth reservation. Or data can be accessed remotely (if connectivity between sites is reliable and this information is available from perfSONAR) In canonical approach data should be replicated to site A HEP computing is often described as an example of parallel workflow. It is correct on the scale of worker node (WN) . WN doesn’t communicate with other WN during job execution. But the large scale global workflow is highly interconnected, because each job typically doesn’t produce an end result in itself. Often data produced by a job serve as input to a next job in the workflow. PanDA manages workflow extremely well (1M jobs/day in ATLAS). The new intelligent services will allow to dynamically create the needed data transport channels on demand. 9/12/2013 NEC 2013

Intelligent Network Services and PanDA In BigPanDA we will use information on how much bandwidth is available and can be reserved before data movement will be initiated In Task Definition user will specify data volume to be transferred and deadline by which task should be completed. The calculations of (i) how much bandwidth to reserve, (ii) when to reserve, and (iii) along what path to reserve will be carried out by Virtual Network On Demand (VNOD). Measurement sources Sonar PerfSonar XRootD Site Status Board Raw data, historical data Grid Information System Averaged network data for atlas sites SchedConfigDB Averaged network data for panda sites Brokerage Site selection module PanDA A.Petrosyan 9/12/2013 NEC 2013

Conclusions The ATLAS experiment Distributed Computing and Software performance was a great success in LHC Run 1 The challenge how to process and analyze the data and produce timely physics results was substantial, but at the end resulted in a great success ASCR gave us a great opportunity to evolve PanDA beyond ATLAS and HEP and to start BigPanDA project Project team was set up The work on extending PanDA to LCF has started Large scale PanDA deployments on commercial clouds are already producing valuable results Strong interest in the project from several experiments (disciplines) and scientific centers to have a joined project.

Acknowledgements Many thanks to J.Boyd, K.De, B.Kersevan, T.Maeno, R.Mount, P.Nilsson, D.Oleynik, S.Panitkin, A.Petrosyan, A.Prescott, K.Read, T.Wenaus for slides and materials used in this talk NEC 2013 9/12/13