April 26, 2007 1 Executive Director Report Executive Board 4/26/07 Things under control Things out of control.

Slides:



Advertisements
Similar presentations
Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
May 9, 2008 Reorganization of the OSG Project The existing project organization chart was put in place at the beginning of It has worked very well.
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Open Science Ruth Pordes Fermilab, July 17th 2006 What is OSG Where Networking fits Middleware Security Networking & OSG Outline.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
OSG Area Coordinators Meeting Security Team Report Kevin Hill 08/14/2013.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 12/21/2011.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Quarterly report ScotGrid Quarter Fraser Speirs.
LIGO Applications Kent Blackburn (Robert Engel, Britta Daudert)
Plans, Management, Metrics, Ruth Pordes Fermilab Open Science Grid Joint Oversight Team Meeting February 20th 2007.
LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Discussion Topics DOE Program Managers and OSG Executive Team 2 nd June 2011 Associate Executive Director Currently planning for FY12 XD XSEDE Starting.
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Production Coordination Staff Retreat July 21, 2010 Dan Fraser – Production Coordinator.
OSG Finance Board P roject & Resource Management Report Chander Sehgal 02 November 2007.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
Key Project Drivers - FY10 Ruth Pordes, June 15th 2009.
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
Partnerships & Interoperability - SciDAC Centers, Campus Grids, TeraGrid, EGEE, NorduGrid,DISUN Ruth Pordes Fermilab Open Science Grid Joint Oversight.
Job and Data Accounting on the Open Science Grid Ruth Pordes, Fermilab with thanks to Brian Bockelman, Philippe Canal, Chris Green, Rob Quick.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Grid Operations Lessons Learned Rob Quick Open Science Grid Operations Center - Indiana University.
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Open Science Grid & its Security Technical Group ESCC22 Jul 2004 Bob Cowles
Production Coordination Area VO Meeting Feb 11, 2009 Dan Fraser – Production Coordinator.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
External Communication & Coordination Ruth Pordes
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab October 25, 2005.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 02/13/2012.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
PCAP Close Out Feb 2, 2004 BNL. Overall  Good progress in all areas  Good accomplishments in DC-2 (and CTB) –Late, but good.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Summary of OSG Activities by LIGO and LSC LIGO NSF Review November 9-11, 2005 Kent Blackburn LIGO Laboratory California Institute of Technology LIGO DCC:
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
What is OSG? (What does it have to do with Atlas T3s?) What is OSG? (What does it have to do with Atlas T3s?) Dan Fraser OSG Production Coordinator OSG.
OSG User Group August 14, Progress since last meeting OSG Users meeting at BNL (Jun 16-17) –Core Discussions on: Workload Management; Security.
Key Project Drivers - FY10 Ruth Pordes, June 15th 2009
Open Science Grid Progress and Status
Readiness of ATLAS Computing - A personal view
Leigh Grundhoefer Indiana University
Open Science Grid at Condor Week
Presentation transcript:

April 26, Executive Director Report Executive Board 4/26/07 Things under control Things out of control

April 26, Under Control OSG is operating. We are solving problems. Usage is increasing overall. Our main stakeholders are getting the throughput they need. We are meeting some of our major milestones. We are getting a small but steady stream of new contributors. We have an active technical team both in the project and the consortium.

April 26, Throughput Gratia reporting steadily increasing.. Due to more coverage or more throughput? Not all of them are OSG jobs.. Need better matching to CEs and filters. Daily/Weekly reports help - but not sufficient. (Look at Gratia daily report.) s CPU Wallclock hous/week, last 6 months

April 26, The Sites (84 including 8 SEs) AGLT2GRASE-NYU-BENCHNERSC-PDSFUARK_ACE ASGC_OSGGRASE-RIT-GCLUSTERNWICG-NotreDameUC_ATLAS_MWT2 BNL_ATLAS_1GRASE-SB-SBNYSGRIDOSG_INSTALL_TEST_2UC_Teraport BNL_ATLAS_2GRASE-SU-CLUSTER04OSG_LIGO_PSUUCR-HEP BU_ATLAS_Tier2GRASE-UR-NEBULAOU_OCHEP_SWT2UCSandiegoOSG-Prod-SE CIT_CMS_T2GROW-PRODOU_OSCER_OSGUCSDT2 CIT_CMS_T2:srm_v1GROW-UNI-POUHEP_OSGUF-HPC DARTMOUTHHAMPTONUPROD_SLACUFlorida-IHEPA FIU-PGHEPGRID_UERJPurdue-LearUFlorida-PG FNAL_FERMIGRIDisuhepPurdue-PhysicsUIC_PHYSICS FNAL_GPFARMIU_ATLAS_Tier2Purdue-RCACUNM_HPC FSU-HEPLehigh_coralRiceUSATLAS_dCache_at_BNL gpnjayhawkLTU_CCTSMU_PHYUSCMS-FNAL-WC1-CE GRASE-ALBANY-NYSLTU_OSGSPRACEUSCMS-FNAL-WC1-SE GRASE-BINGHAMTONMCGILL_HEPSPRACE-SEUTA-DPCC GRASE-CCR-U2MIT_CMSSTAR-BhamUVA-HEP GRASE-CORNELL- CTCNYSGRID MIT_CMS:srm_v1STAR-BNLUVA-sunfire GRASE-GENESEO-OSGMWT2_IUSTAR-SAO_PAULOUWMadisonCMS GRASE-GENESEO-ROCKSMWT2_UCSTAR-WSUUWMadisonCMS-SE GRASE-MARIST-nysgrid11NebraskaT2_Nebraska_StorageUWMilwaukee GRASE-NU-CARTMANNERSC-DavinciTTU-ANTAEUSVanderbilt

April 26, Daily Success/Failure reports Summary of the job exit status (midnight to midnight central time) for For Condor the value used is taken from 'ExitCode' and NOT from 'Exit Status ’. Includes EGEE VOs running on CMS Tier-1 site (shared with OSG) - clearly a (one of many) problem.

April 26, Usage by owners - daily report - need to make this correct.

April 26, Milestones from Feb 2007 at JOT, updated for Apr EB WBSNameDate Define Operational Metrics for Year 11/1/ Release Security Plan1/1/ Release OSG /27/ Production use of OSG by one additional science community3/31/ OSG-TeraGrid software using common Globus and Condor releases. 4/2/ Complete deployment and registration of 15 Storage Resources using srm/dCache from VDT. 6/10/ Release OSG /15/ Report on Operational Metrics for Year 19/1/ Production use of OSG by a 2 nd additional science community9/28/07 √ Draft under review Provisioning and final testing in progress √ ITB starting tests now SRM V1.1 * 3)Is this true? 1) Still in draft. 4)How do we meet this? 2)Not yet met?

April 26, Upcoming Science Milestones: the metrics we are working on. Support for CMS Job Throughput 50Kjobs/dayWürthwein 4/1/07 Support for CMS Job Throughput 75Kjobs/dayWürthwein 5/1/07 Support for CMS Job Throughput 100K jobs/dayWürthwein 6/1/07 CMS site validations using the SAM infrastructureWürthwein 6/1/07 Support for ATLAS Throughput of 20-30K Jim Shank 7/3/07 LIGO: Binary Inspiral Analysis runs on OSG Warren Anderson 6/15/07 ATLAS: Validation of OSG infrastructure and extensions in full-chain production challenge. Jim Shank 6/15/07 CMS: Full support for opportunistic use of OSG resources for MC production and data processing. Lothar Bauerdick 6/15/07 STAR: Migration of >80% of simulation to OSGJerome Lauret6/15/07 CDF: Full use of OSG for MC Ashutosh Kotwal 6/15/07 D0: Full use of OSG sites for D0 reprocessing in 2007 (in progress) Brad Abbott6/15/07 SDSS: Fit all spectra beyond data release 5, QSO fitting project (+now DES simulations/data transfer) Chris Stoughton 6/15/07

April 26, LIGO: Binary Inspiral Analysis runs on OSG 1st 2 milestones already met. What needs to happen to meet this? Does this include the analysis being accepted by LIGO Science Collaboration ?

April 26, Ready to be asked about success? ATLAS: Validation of OSG infrastructure and extensions in full- chain production challenge. CMS: Full support for opportunistic use of OSG resources for MC production and data processing. CDF: Full use of OSG for MC SDSS: Fit all spectra beyond data release 5, QSO fitting project (+now DES simulations/data transfer)

April 26, STAR: Migration of >80% of simulation to OSG Troubleshooting and STAR have put a lot of effort into solving the end to end data problems between LBNL and BNL. STAR SRM End-point within BNL firewall causing problems that are not getting solved. Can STAR piggyback on ATLAS SRM end-points outside the firewall? Activity to get STAR applications running on non-DRM SEs successful with quite some effort on FermiGrid. Will write this up. Phone meeting last week with Jerome laid out plan for STAR to integrate their grid scheduler with simulation application by mid- May.

April 26, D0: Full use of OSG sites for D0 reprocessing in 2007 With continued focus D0 estimate they will finish reprocessing at the end of May. While they are getting much better than expected throughput from OSG Sites the D0 end-to-end infrastructure is still not able to get full efficiency; Analysis of faillures and effort to mitigate them is lacking.

April 26, Getting VOs to be able to rely on successfully running on Sites that Inform the VO is supported. Engage: This is a perfect example for how "disconnected" our infrastructure is when getting a VO and site connected takes "high powers" to be involved... Miron  We must make this a priority now and assign an “owner” of the problem whose only job is to solve it.  Who and what are we going to drop?

April 26, The Sites (84 including 8 SEs) AGLT2GRASE-NYU-BENCHNERSC-PDSFUARK_ACE ASGC_OSGGRASE-RIT-GCLUSTERNWICG-NotreDameUC_ATLAS_MWT2 BNL_ATLAS_1GRASE-SB-SBNYSGRIDOSG_INSTALL_TEST_2UC_Teraport BNL_ATLAS_2GRASE-SU-CLUSTER04OSG_LIGO_PSUUCR-HEP BU_ATLAS_Tier2GRASE-UR-NEBULAOU_OCHEP_SWT2UCSandiegoOSG-Prod-SE CIT_CMS_T2GROW-PRODOU_OSCER_OSGUCSDT2 CIT_CMS_T2:srm_v1GROW-UNI-POUHEP_OSGUF-HPC DARTMOUTHHAMPTONUPROD_SLACUFlorida-IHEPA FIU-PGHEPGRID_UERJPurdue-LearUFlorida-PG FNAL_FERMIGRIDisuhepPurdue-PhysicsUIC_PHYSICS FNAL_GPFARMIU_ATLAS_Tier2Purdue-RCACUNM_HPC FSU-HEPLehigh_coralRiceUSATLAS_dCache_at_BNL gpnjayhawkLTU_CCTSMU_PHYUSCMS-FNAL-WC1-CE GRASE-ALBANY-NYSLTU_OSGSPRACEUSCMS-FNAL-WC1-SE GRASE-BINGHAMTONMCGILL_HEPSPRACE-SEUTA-DPCC GRASE-CCR-U2MIT_CMSSTAR-BhamUVA-HEP GRASE-CORNELL- CTCNYSGRID MIT_CMS:srm_v1STAR-BNLUVA-sunfire GRASE-GENESEO-OSGMWT2_IUSTAR-SAO_PAULOUWMadisonCMS GRASE-GENESEO-ROCKSMWT2_UCSTAR-WSUUWMadisonCMS-SE GRASE-MARIST-nysgrid11NebraskaT2_Nebraska_StorageUWMilwaukee GRASE-NU-CARTMANNERSC-DavinciTTU-ANTAEUSVanderbilt

April 26, Accuracy and correctness of accounting information Why can’t we fix:  the “unknowns”,  Errors in VO names: “uscms-cms”, “engagement-engage” “LIGO- ligo” Need follow up to daily and weekly reports and more easy access to longer term throughput reports. Do we need a full day and sub-group review to focus on the Gratia information and deliverables ? Do we need Philippe to report to the ET meetings weekly?

April 26, Sum of requests, OSGs actual capacity & planning to meet the demands. D0Continue to need ~1500 CPUs til end of may. Just need to keep them stable. Engage500 CPUhours a day or 500 Job Slots continuously when needed. See next slide CompBioGridFrom Ops meeting minutes:13 sites that they can access with jobmanager Fork. 300 batch slots for peak is their goal. They will send the GOC a list of sites where they can run successfully currently. It will be two to three months time before they get up to a production level and start dealing with issues. CMSRamping up production now. Focussed peak use planned for July and August. Will include opportunistic use of non-CMS sites. What are the actual needs? How will this affect use of CMS sites by other VOs? ATLASJim Shank says all ATLAS sites will be fully occupied by ATLAS for the forseeable future. CHARMMNeed to track expected vs delivered throughput; it can’t be unbounded. Need to keep this table up to date. Need to make expectations and constraints clear to the VO. Need to review at ET meetings and have more quantitative information. Will I do this ?

April 26, Knowing and dealing with Policies Starting to gather policies from sites for D0 now. Plan is small steps. Does this give it enough bandwidth and priority?

April 26, Other things?