OSG Area Coordinator’s Report: Workload Management March 25 th, 2010 Maxim Potekhin BNL 631-344-3621

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
The PanDA Distributed Production and Analysis System Torre Wenaus Brookhaven National Laboratory, USA ISGC 2008 Taipei, Taiwan April 9, 2008 Torre Wenaus.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Object-oriented Analysis and Design Stages in a Software Project Requirements Writing Analysis Design Implementation System Integration and Testing Maintenance.
ITB Release Process Suchandra Thapa Computation Institute / University of Chicago.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Production Coordination Staff Retreat July 21, 2010 Dan Fraser – Production Coordinator.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
PanDA Monitor Development ATLAS S&C Workshop by V.Fine (BNL)
LBNE/Daya Bay utilization of Panda: project review and status report PAS Group Meeting November 12, 2010 Maxim Potekhin for BNL Physics Applications Software.
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL
OSG Technology Area Brian Bockelman Area Coordinator’s Meeting February 15, 2012.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
OSG Production Report OSG Area Coordinator’s Meeting Nov 17, 2010 Dan Fraser.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
Production Coordination Area VO Meeting Feb 11, 2009 Dan Fraser – Production Coordinator.
OSG Area Coordinator’s Report: Workload Management May14 th, 2009 Maxim Potekhin BNL
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no Workflow repository, user.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Alex Undrus – Shifters Meeting – 16 Oct ATLAS Nightly System Integration LS1 Ugrade SIT Task Force Objective: increase efficiency, flexibility,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
OSG Storage VDT Support and Troubleshooting Concerns Tanya Levshina.
OSG Area Coordinator’s Report: Workload Management October 6 th, 2010 Maxim Potekhin BNL
OSG Area Coordinator’s Report: Workload Management August 20 th, 2009 Maxim Potekhin BNL
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
Integration TestBed (iTB) and Operations Provisioning Leigh Grundhoefer.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
OSG Area Coordinator’s Report: Workload Management June 3 rd, 2010 Maxim Potekhin BNL
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.
POW MND section.
Readiness of ATLAS Computing - A personal view
Cloud Computing R&D Proposal
Leigh Grundhoefer Indiana University
Presentation transcript:

OSG Area Coordinator’s Report: Workload Management March 25 th, 2010 Maxim Potekhin BNL

2 Workload Management: Overview Areas of Activity: ITB, Engagement, upgrade of Panda monitoring Current Initiatives  Panda/ITB:  expanded ITB job submission to as many sites as we could (see next slide for status)  Atlas: ran new release of Atlas software on ITB platform for validation  Engagement:  DUSEL Long Base Neutrino Experiment (LBNE), and Daya Bay: Working out file system access on RACF (they work on the RHIC side of BNL cluster)  Ongoing support of CHARMM and expansion of job submission to more sites, new project starting  SBGRID – performed initial Panda setup, starting testing  UW Milwaukee – project didn’t get funding, canceled effort  Upgrade of Panda monitoring system  Renewed focus in ATLAS, weekly meetings  VOs expressed interest in a simpler, leaner monitoring interface, we need to accommodate this  Mics:  Panda section of the “OSG Job Submission Document” in preparation

3 Workload Management: Panda/ITB Panda ITB Status:  Standing by for set up at 3 sites, 5 operational  Allows for convenient monitoring of ITB jobs execution from one central Web location  Access to logs and output files stored in Panda Monitor makes the process self-documenting  Better overall organization

4 Workload Management: Engagement Setting up Panda for feasibility tests by SBGRID  Currently this is a test on a limited number of priority sites  Does not require significant diversion of effort on either SBGRID or OSG side  Useful discussions at All-Hands Meeting at FNAL  Initial Panda setup done since then (table below)  Testing to commence within days CHARMM  Currently ramping up production levels  Need 250k CPU-hours for the current run Daya Bay/LBNE (2 groups at BNL)  Resolving site, pilot generator and file system set up issues (slightly different mechanism required in Panda)

5 Workload Management:Panda Monitoring upgrade Workload Management: Panda Monitoring upgrade Cooperation with CERN:  Personnel changes and a renewed focus on the design of the Dashboard, and Panda data feed into it  Weekly meetings held  Previous effort in investigating Google Web Toolkit indicates that a different platform needs to be chosen (too much overhead and learning curve)  Our approach has always been that any development effort we undertake in monitoring area, will serve the needs of not only Atlas but other VOs using Panda  Based on our previous design work, we continue with AJAX/JSON approach for the API, and view Panda monitoring as a Web service delivering data to multiple other clients and services on demand  In a nutshell, our preferred solution is Django serving JSON

6 Overview Workload Management Accomplishments since last report:  Completion of Panda/ITB setup (the so-called Panda Robot) and readiness to quickly expand to more sites as they become available  Support of the new CHARMM production run  Started working with SBGRID. Panda setup done for main SBGRID sites, testing commencing  Panda section of the “OSG Job Submission Document” close to final (effort led by D.Fraser)  Better documentation of Panda usage for non-Atlas VOs  Issues / Concerns  Carryover from previous report: Panda monitoring upgrade likely to have a scope that would require more manpower to get it right, and such upgrade does remain a priority according to feedback we get  Need for specialized setup for Daya Bay/LBNE