Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Kashif Jalal CA-240 (072) Web Development Using ASP.NET CA – 240 Kashif Jalal Welcome to week – 2 of…
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Workload Management Massimo Sgaravatto INFN Padova.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Department of Computer Science 1 CSS 496 Business Process Re-engineering for BS(CS)
Apache Jakarta Tomcat Suh, Junho. Road Map Tomcat Overview Tomcat Overview History History What is Tomcat? What is Tomcat? Servlet Container.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
MC, REPROCESSING, TRAINS EXPERIENCE FROM DATA PROCESSING.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
A Distributed Computing System Based on BOINC September - CHEP 2004 Pedro Andrade António Amorim Jaime Villate.
Cluster Reliability Project ISIS Vanderbilt University.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
K. De UTA Grid Workshop April 2002 U.S. ATLAS Grid Testbed Workshop at UTA Introduction and Goals Kaushik De University of Texas at Arlington.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
Component Technology. Challenges Facing the Software Industry Today’s applications are large & complex – time consuming to develop, difficult and costly.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
ATLAS Grid Data Processing: system evolution and scalability D Golubkov, B Kersevan, A Klimentov, A Minaenko, P Nevski, A Vaniachine and R Walker for the.
Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
OSG Area Coordinator’s Report: Workload Management October 6 th, 2010 Maxim Potekhin BNL
Shifters Jamboree Kaushik De ADC Jamboree, CERN December 4, 2014.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Production System 2 manpower and funding issues Alexei Klimentov Brookhaven National Laboratory Aug 19, 2013 Production System Technical Meeting CERN.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
Virtualization and Clouds ATLAS position
PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL
DCC Workshop Input from Computing Coordination
Job Processing Database consolidation Task recovery De-cronification
Job workflow Pre production operations:
Univ. of Texas at Arlington BigPanDA Workshop, ORNL
LCG middleware and LHC experiments ARDA project
PROCESS - H2020 Project Work Package WP6 JRA3
Chapter 1 (pages 4-9); Overview of SDLC
Support for ”interactive batch”
Presentation transcript:

Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13, 2012

Background  Distributed production requires many different ATLAS specific SW components/applications  Athena and Transformations – core software  ProdSys – task management system  AMI – Production Tags and Metadata  PanDA – job execution system  DQ2 – data management system  Monitoring of tasks, data and jobs  They utilize common tools like Globus, VDT, XRootD, Dcache, CVMFS, … deployed at our facilities Kaushik De 2November 13, 2012

Overview  Many distributed production components used in ATLAS are being upgraded after ~5 years of continuous use  In this talk we will focus on their evolution in  Athena on many fronts: AthenaMP, Athena64, AthenaGPU, AthenaPhi, Athena event service  trf -> tf  DQ2 -> Rucio  ProdSys -> ProdSys II  PanDA -> CAF  PanDA -> BigData  New monitoring capabilities Kaushik De 3November 13, 2012

AthenaXX  Many future paths for Athena driven by hardware – will not talk about them here  Interesting topic for distributed production – event service  Basic unit of measurement in HEP is events – not bits, bytes or files  Multi-core is the new paradigm (same as the old one)  Caching technologies may be best optimized at event level  Started discussions during SW week for event service  Client-server architecture in Athena desirable long term  PanDA server with Athena client will be first step to try November 13, 2012 Kaushik De 4

Job Transforms  Job transforms – trf – workflow wrapper around Athena  All production jobs use trf  Most major ATLAS workloads are supported  Including multi-step jobs  New workloads like overlay, FTK … are being added  Major changes underway  See recent talks by Graeme Stewart  &resId=0&materialId=slides&confId= &resId=0&materialId=slides&confId=  alId=slides&confId= alId=slides&confId=  Highlights of future changes in next few slides November 13, 2012 Kaushik De 5

November 13, 2012 Kaushik De 6

November 13, 2012 Kaushik De 7

November 13, 2012 Kaushik De 8

November 13, 2012 Kaushik De 9

November 13, 2012 Kaushik De 10

November 13, 2012 Kaushik De 11

November 13, 2012 Kaushik De 12

November 13, 2012 Kaushik De 13

November 13, 2012 Kaushik De 14 &resId=2&materialId=slides&confId=169697

November 13, 2012 Kaushik De 15

November 13, 2012 Kaushik De 16

November 13, 2012 Kaushik De 17

November 13, 2012 Kaushik De 18

What is ProdSys  Task management system  Interface to request production tasks  Generate jobs for execution by PanDA  Manage task completion  Consisting of many scripts  Web interface for task request  Bulk task submission interface  Auto generation of jobs from tasks  Scripts for task completion  Interacts with AMI and DQ2  And add-ons  Task-list creation scripts developed by production managers  Task monitoring November 13, 2012 Kaushik De 19

Current System November 13, 2012 Kaushik De 20 Production Manager Submits Tasks Jobs ProdSys Jobs PanDA User Bamboo User

What is ProdSys II  Split ProdSys into two parts  DEfT – task request and task definition  Some components will be taken from current ProdSys  JeDi – dynamic job definition and task execution  Integrated with PanDA (replaces Bamboo)  Will also be the engine for user analysis tasks  Need to work closely with Transforms & Rucio groups  All three systems should evolve together  Integration with monitoring  Will be planned from the beginning Kaushik De 21November 13, 2012

Future System November 13, 2012 Kaushik De 22 Production Manager DEfT PanDA User JeDi User

DEfT  Key features  Web UI for simplified interactive task request  Task request system based on physics requirements  Managers/users insulated from execution details  Deprecate/remove script based task submission  Error checking of task requests  Built-in authentication and approval mechanisms  Creates task according to a new simplified schema Kaushik De 23November 13, 2012

Tasks, Meta-tasks, Basket-tasks  New extensions to the concept of task  Task – basic unit  Input dataset -> Output dataset  Meta-task – chain of tasks, which will be auto-generated  Manager/user makes single request  Successive processing steps (transforms) created by DEfT  Intermediate steps in chain may be specified as transient  Basket-task – group of related tasks (eg. same tag)  Manager/user can define basket of tasks  Manager/user makes single request for execution  Ability to clone tasks, meta-tasks and basket-tasks  From pervious tasks, meta-tasks and basket-tasks  Or from predefined templates Kaushik De 24November 13, 2012

JeDi  Key features  JeDi will be core component of PanDA  Generate jobs dynamically from DEfT tasks  Jobs are defined to match execution environment and specified constraints(eg. number of cores, duration, file size, dataset size…)  Number of events varies per job  Jobs are not predefined with fixed number of events – key feature  PanDA responsible for optimal task execution  PanDA responsible for task completion  Auto-merging if requested  Data will be collected by PanDA to optimize job execution and completion (expanded concept of scout jobs) Kaushik De 25November 13, 2012

Common Analysis Framework  Task force to evaluate suitability of PanDA for a LHC common user analysis framework  Latest report: sessionId=19&resId=1&materialId=slides&confId= sessionId=19&resId=1&materialId=slides&confId= sessionId=19&resId=1&materialId=slides&confId= November 13, 2012 Kaushik De 26

November 13, 2012 Kaushik De 27

November 13, 2012 Kaushik De 28

November 13, 2012 Kaushik De 29

November 13, 2012 Kaushik De 30

November 13, 2012 Kaushik De 31

November 13, 2012 Kaushik De 32

Conclusion  Many updates/improvements planned  Some applications will be completely re-written  But based on past 5 years of LHC experience  Plans and teams are in place  Will lead to better software running at facilities  Waiting for current LHC run to end  Stay tuned for more November 13, 2012 Kaushik De 33