Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.

Slides:



Advertisements
Similar presentations
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Advertisements

The PanDA Distributed Production and Analysis System Torre Wenaus Brookhaven National Laboratory, USA ISGC 2008 Taipei, Taiwan April 9, 2008 Torre Wenaus.
OSG Public Storage and iRODS
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
Publication and Protection of Site Sensitive Information in Grids Shreyas Cholia NERSC Division, Lawrence Berkeley Lab Open Source Grid.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
My Name: ATLAS Computing Meeting – NN Xxxxxx A Dynamic System for ATLAS Software Installation on OSG Sites Xin Zhao, Tadashi Maeno, Torre Wenaus.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
OSG Area Coordinator’s Report: Workload Management May14 th, 2009 Maxim Potekhin BNL
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Architectural Framework Presentation Vincenzo Ciaschini CNAF 15/5/06.
PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.
J. Shank DOSAR Workshop LSU 2 April 2009 DOSAR Workshop VII 2 April ATLAS Grid Activities Preparing for Data Analysis Jim Shank.
EGEE is a project funded by the European Union under contract INFSO-RI DGAS Grid accounting L.Gaido on behalf of A.Guarise LCG Workshop November.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
HTCondor Accounting Update
Daniele Bonacorsi Andrea Sciabà
Review of the WLCG experiments compute plans
Gridpp37 – 31/08/2016 George Ryall David Meredith
Job monitoring and accounting data visualization
DGAS A.Guarise April 19th, Athens
gLite Information System
U.S. ATLAS Grid Production Experience
Design rationale and status of the org.glite.overlay component
Key Activities. MND sections
ATLAS Cloud Operations
ALICE Monitoring
PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL
Workload Management System
Practical: The Information Systems
POW MND section.
DCC Workshop Input from Computing Coordination
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Accounting at the T1/T2 Sites of the Italian Grid
Panda-based Software Installation
Experiment Dashboard overviw of the applications
Giuseppe Patania Nov, Martina Franca (Ta)‏
Job workflow Pre production operations:
ADC Requirements and Recommendations for Sites
Discussions on group meeting
Monitoring of the infrastructure from the VO perspective
D. van der Ster, CERN IT-ES J. Elmsheuser, LMU Munich
Vanessa Tosello (IFREMER), Flavian Gheorghe (MARIS)
Leigh Grundhoefer Indiana University
Danilo Dongiovanni INFN-CNAF
EGEE Middleware: gLite Information Systems (IS)
DGAS Today and tomorrow
Grid Engine Riccardo Rotondo
Information System (BDII)
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008 Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008

Torre Wenaus, BNL 2 Panda Basics Launched 8/05 to achieve scalable data-driven WMS Production 12/05 Integrated with data mgmt Pilot-based ‘CPU harvesting’ Analysis as well as production Automation, monitoring, low operations manpower Insulate users (end- and VO-) from grid complexity, problems Lower entry threshold OSG program since 9/06 VO-neutral Condor integration Cautious in its dependencies Proven components Launched 8/05 to achieve scalable data-driven WMS Production 12/05 Integrated with data mgmt Pilot-based ‘CPU harvesting’ Analysis as well as production Automation, monitoring, low operations manpower Insulate users (end- and VO-) from grid complexity, problems Lower entry threshold OSG program since 9/06 VO-neutral Condor integration Cautious in its dependencies Proven components Workload management system for Production ANd Distributed Analysis Developed by U.S.ATLAS – now adopted ATLAS-wide

Torre Wenaus, BNL 3 Operations Monitoring Panda Monitoring Link

Torre Wenaus, BNL 4 Workflow Monitoring

Torre Wenaus, BNL 5 Error Reporting, Tracking

Torre Wenaus, BNL 6

7 Job Information Detailed information from job specification schema

Torre Wenaus, BNL 8 User Level Monitoring - ‘My Panda’

Torre Wenaus, BNL 9 Non-ATLAS OSG Usage Currently CHARMM protein folding application. Soliciting others User/VO does - job submission, using simple http-based Python client - pilot submission, such that pilots carry their DN identity - queue group (tag) organization they require BNL/ATLAS/OSG provides - Panda service/DB infrastructure; same as used by US ATLAS - Panda monitoring, VO customization possible - Configured machine(s) for VO pilot submission Madison) - Support from ~3 FTE pool at BNL - Future: Data mgmt and data-driven workflow User/VO does - job submission, using simple http-based Python client - pilot submission, such that pilots carry their DN identity - queue group (tag) organization they require BNL/ATLAS/OSG provides - Panda service/DB infrastructure; same as used by US ATLAS - Panda monitoring, VO customization possible - Configured machine(s) for VO pilot submission Madison) - Support from ~3 FTE pool at BNL - Future: Data mgmt and data-driven workflow

Torre Wenaus, BNL 10 Usage Accounting Accounted by ‘Panda site’ Corresponding to queue(s) at a physical site, or a VO Accounted by ‘Panda site’ Corresponding to queue(s) at a physical site, or a VO

Torre Wenaus, BNL 11 Queue Info DB Site/queue status, configuration info Loaded from various sources: grid info services, data management configuration, Panda configuration Automatic control of current queue status from BDII Or, operator-driven queue status via http interface Data for intelligent brokerage: e.g. available releases, memory Site performance statistics gathering The basis of dynamic brokerage Dynamic pilot rate controls Site/queue status, configuration info Loaded from various sources: grid info services, data management configuration, Panda configuration Automatic control of current queue status from BDII Or, operator-driven queue status via http interface Data for intelligent brokerage: e.g. available releases, memory Site performance statistics gathering The basis of dynamic brokerage Dynamic pilot rate controls

Torre Wenaus, BNL 12 Panda Monitoring Usage by OSG The obvious way: use Panda WMS! As CHARMM does But can monitoring be used independently of Panda doing the workload management? Some motivations: Uniform OSG-wide usage monitoring/reporting/job diagnostics Managing resource controls and quotas: usage data gathering; defining, applying and enforcing quotas VO-specific data reporting and presentation Answer is yes, quite easily, if there is interest Through simple http based data submission to Panda DBs The obvious way: use Panda WMS! As CHARMM does But can monitoring be used independently of Panda doing the workload management? Some motivations: Uniform OSG-wide usage monitoring/reporting/job diagnostics Managing resource controls and quotas: usage data gathering; defining, applying and enforcing quotas VO-specific data reporting and presentation Answer is yes, quite easily, if there is interest Through simple http based data submission to Panda DBs

Torre Wenaus, BNL 13 Panda Monitoring Outside Panda Panda job submission interface is based on http’ing an info packet defining the job to the Panda server Could use the same interface to define a job to Panda for monitoring purposes only Job status updates would be sent to Panda the same way So current job state, job time per state etc. can be recorded Because Panda DB schema remain unchanged, Panda monitoring works out of the box While also being customizable based on VO-specific job info sent with the job definition Similarly the usage reporting and performance summarizing tools would be available Give us a guinea pig VO/application and we can try this out Panda job submission interface is based on http’ing an info packet defining the job to the Panda server Could use the same interface to define a job to Panda for monitoring purposes only Job status updates would be sent to Panda the same way So current job state, job time per state etc. can be recorded Because Panda DB schema remain unchanged, Panda monitoring works out of the box While also being customizable based on VO-specific job info sent with the job definition Similarly the usage reporting and performance summarizing tools would be available Give us a guinea pig VO/application and we can try this out

Torre Wenaus, BNL 14 Site Performance Probes Primarily, Panda monitoring tracks/reports actual workflow of VO specific applications Recent extensions include data management and release installation applications (for ATLAS) Further extension could be VO specific test probes Special pilot jobs probe VO specific functionalities Runs at regular intervals at all sites Panda monitoring provides integrated interface to site performance through pilot probes On our ToDo list for ATLAS – could become generic tool for all VO’s Primarily, Panda monitoring tracks/reports actual workflow of VO specific applications Recent extensions include data management and release installation applications (for ATLAS) Further extension could be VO specific test probes Special pilot jobs probe VO specific functionalities Runs at regular intervals at all sites Panda monitoring provides integrated interface to site performance through pilot probes On our ToDo list for ATLAS – could become generic tool for all VO’s