OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL 631-344-3621

Slides:

Advertisements

Similar presentations

Metadata Progress GridPP18 20 March 2007 Mike Kenyon.

Advertisements

19/06/2002WP4 Workshop - CERN WP4 - Monitoring Progress report

15 April 2010John Hover Condor Week Enhancements to Condor-G for the ATLAS Tier 1 at BNL John Hover Group Leader Experiment Services (Grid Group)

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.

Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.

OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL

ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.

Grid Workload Management Massimo Sgaravatto INFN Padova.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Development of a noSQL storage solution for the Panda Monitoring System “Database Futures” Workshop at CERN June 7 th, 2011 Maxim Potekhin Brookhaven National.

David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

CERN Physics Database Services and Plans Maria Girone, CERN-IT

LBNE/Daya Bay utilization of Panda: project review and status report PAS Group Meeting November 12, 2010 Maxim Potekhin for BNL Physics Applications Software.

OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL

MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.

Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -

Remote Site C Pilot Scheduler Pilots and Pilot Schedulers Jobs Statistics Production Dashboard Dynamic Data Movement Monitor Panda Server (Apache) Development.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Storage cleaner: deletes files on mass storage systems. It depends on the results of deletion, files can be set in states: deleted or to repeat deletion.

CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.

A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,

OSG Area Coordinator’s Report: Workload Management May14 th, 2009 Maxim Potekhin BNL

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.

The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.

The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Julia Andreeva on behalf of the MND section MND review.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.

November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /

Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.

OSG Area Coordinator’s Report: Workload Management March 25 th, 2010 Maxim Potekhin BNL

OSG Area Coordinator’s Report: Workload Management October 6 th, 2010 Maxim Potekhin BNL

April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.

Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.

OSG Area Coordinator’s Report: Workload Management August 20 th, 2009 Maxim Potekhin BNL

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

PCAP Close Out Feb 2, 2004 BNL. Overall  Good progress in all areas  Good accomplishments in DC-2 (and CTB) –Late, but good.

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

Site Services and Policies Summary Dirk Düllmann, CERN IT More details at

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL

A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at

First results of the feasibility study of the Cassandra DB application in Panda Job Monitoring ATLAS S&C Week at CERN April 5 th, 2011 Maxim Potekhin for.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.

CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

OSG Area Coordinator’s Report: Workload Management June 3 rd, 2010 Maxim Potekhin BNL

A Scalable and Resilient PanDA Service for Open Science Grid Dantong Yu Grid Group RHIC and US ATLAS Computing Facility.

Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.

LCG middleware and LHC experiments ARDA project

Data Lifecycle Review and Outlook

Presentation transcript:

OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL

2 Summary of Workload Management: Panda WBS item : Panda Pilot interface with site authentication/authorization systems  Resolved a number of configuration/permission issues on various sites used by Atlas, glexec capable pilot ready for production WBS item : light-weight data movement  Completed as gridFTP plug-in for existing clients (Daya Bay). We’ll be looking toward Globus for future implementation and integration (lots of progress made there, spoke with Steve Tuecke at OSG AHM in March 2011), further planning will depend on demand (coordinating with Atlas). WBS item 2.2.4: Panda Monitoring  As indicated in the previous report, the monitoring effort was reshaped to use and enhance previously existing components as opposed to a complete overhaul, and is now being handled by Atlas personnel. There may be re-use of components previously created for this work item, but for now it needs to be struck out of WBS (or marked otherwise), and the effort transferred to (which is the case “de facto”).

3 Summary of Workload Management: Panda WBS item : support of Daya Bay/LBNE  Currently running production of approximately 1500 jobs per week, on 2 Condor pools at BNL  Working on Condor issues (pilots not matching certain nodes), some tuning needed  Active monitoring of disk space on submission node and that of the logger web service WBS item 2.2.6: Scalability of Panda DB  Significant progress has been made in configuration and stress testing of a noSQL solution (Cassandra) on two different test facilities, one at CERN and another at BNL  Based on query patterns, the design of data and indexes has been modified  Deployed a 3-node cluster at BNL and performed data load of a significant and large part of Panda job data (1 year worth)  Collected metrics and where possible, provided comparison to similar queries done against the production instance of Oracle DB at CERN  Analysis still under way, the aim being to determine whether we need more horizontal scaling with potentially a large number of smaller nodes  Good level of support from RACF at BNL, collaboration with Atlas personnel at CERN  Now officially a Task Force mandate for R&D in Atlas Distributed Computing Organization with specific deliverables later in 2011 (essentially a working database)