Executive Director Report Ruth Pordes OSG Council Meeting, August 5 th 2008
Summary We have met the performance and functionality needs for initial LHC data taking. 2 We have released the baseline production version of OSG software. We have an energetic, solid team: committed, open and collaborative. LIGO, D0, Engage production usage/efficiency /usability have made progress. A potentially significant Campus and User entrant in the Structural Biology Grid (SBGrid). (need to understand expectations).
August 5th, 2008 Outline Accomplishments – each one generating a question. Interactions with the Joint Oversight Team Security Report 3
August 5th, 2008 Deliverables to CCRC’08, ATLAS Full Dress Rehearsal (FDR-2),CMS Computing & Analysis Challenge (CSA08) met currently “hard” to separate out OSG specific contributions? Up to 120 TB/day transfers over a month on >100 links many including Tier-2 end-point. Robust job management and execution on Tier-2s - job throughputs of >40,000/day. 4 End-to-End Data Transfer Throughput(ATLAS) 1.0 GByte/Sec Scaled to ~450 users across ~ 30 Tier2s. Simultaneous with end-to-end Cosmic Ray running. Simultaneous simulation production of >10Mevents/week. Mix of Production and Analysis Jobs(CMS) 80,000 / day
August 5th, 2008 LIGO usage increasing following focus on WS-Gram deployment/testing for 5 Note: Migration of majority of use from Nebraska to BNL to Purdue. why?
August 5th, 2008 Chemistry - Andrew Shultz, University of Buffalo. Application to model virial coefficients of water. Anticipate research highlight/publication this summer. 78,000 jobs consuming average of about 100 CPU days/day over 6 months. 6 User registered to the NYSGrid VO: So, how do we know it is Chemistry? Would running as part of the Engage VO help?
August 5th, 2008 Computational Biology: Protein Folding/Structure 7 Assistant professor and 2 students running fairly steadily. ~620,000 CPU wallclock hours in 2008 (average 120 cpudays/day for ~210 days). Expect research highlight in the next few months. When does a set of individuals become a community/VO?
August 5th, 2008 Engage use is growing. Use continues to be cyclic. More sites used in 2008 than in 2007 sitenameCPU Days MIT_CMS8,344 USCMS-FNAL-WC1-CE4,201 UCSDT21,975 UCR-HEP1,749 FNAL_CDFOSG_21,745 BNL_ATLAS_11,625 FNAL_GPFARM1,437 FNAL_CDFOSG_31,286 CIT_CMS_T21,218 Purdue-RCAC1,210 FNAL_DZEROOSG_21,209 FNAL_DZEROOSG_11,109 FNAL_GPGRID_11,047 NERSC-Jacquard958 TTU-ANTAEUS658 FNAL_CDFOSG_4588 UCLA_Saxon_Tier3441 Nebraska380 GLOW280 SBGrid-Harvard-East220 OCI-NSF185 UFlorida-HPC154 Clemson-IT121 UWMilwaukee105 8
August 5th, 2008 Request from D0 for access to local Storage Resources to improve efficiency Council asked ET to help. With OSG 1.0 opportunistic & reserved storage more widely supported. US ATLAS, US CMS Tier-2s offered to let D0 use storage up to 1 TB /site over 3 ATLAS and 3 CMS sites. Help from the OSG site admins, users group, storage, D0 throughput increased from 3.6 to 5 and 4 M events /week over the past 2 weeks. efficiency is >50% some days and then (last weekend) can go down to ~28%. D0 and OSG continue to track down specific problems. D0 gathering efficiency plots over time d0.fnal.gov/~snow/jobscan/effs.html d0.fnal.gov/~snow/jobscan/effs.html How should OSG organize and prioritize ongoing support/help for robust, effective use of “grown-up” VOs? 9
August 5th, 2008 Recorded use ~15,000 CPUweeks/week. Lack of availability? ability? need? to use cycles from locally reduced use by LHC 10
August 5th, 2008 Major Software Release in June 2008 OSG 1.0 Expect to maintain this as the main baseline software version. Many sites upgraded quickly – confidence or timeliness? Testing included configuration & simple tests of Opportunistic Storage (dCache, Bestman). Enabling US LHC Tier-2 site availability reporting to the WLCG with first official reports for July. 11
August 5th, 2008 Infrastructure Accomplishments Significantly improved local site monitoring tools (RSV). Well received set of storage tools released for administrators. Initial use by VOs of opportunistic storage. Improved administrative information capabilities (OIM). Support for additional LHC Tier-3s: US CMS: UCLA, UMD, FlTech, UIC US ATLAS: UIUC, UWisc-Madison, Iowa State (earlier?) 12
August 5th, 2008 Production Information OIM RSV 13 Number of Production CEs86 Number of production SRMV2 SEs17 (4 Bestman)
August 5th, 2008 Jump in # of support tickets for OSG 1.0 configurations –ensure all sites configured right for Information/Monitoring/Accounting 14
Interactions with the Joint Oversight Team
August 5th, 2008 Summary of Interactions with JOT New program managers: Don Petravick – DOE HEP Susan Turnbull – DOE ASCR Visit by core-Executive: Miron, Chander, Ruth – to DOE and NSF in June: Fred Johnson, ASCR, and Susan in the am. Moishe, Marv, Susan in the pm. Action item: institute regular JOT, OSG mgmt, US ATLAS & US CMS S&C mgmt phone meetings. 16
August 5th, 2008 First EJOT phone meeting Don: “Goal to understand the reliance of the LHC experiments on the OSG, and to understand that status of that reliance.” Discussed kind of items that in a future work agenda: Experience at Run II that more effort is required when experiments start up. Another meeting planned in 2-3 months. 17
Security Report Mine Altunay, FNAL OSG Security Officer For the OSG Security Team: Doug Olson, Deputy Security Officer, LBNL, Jim Basney NCSA, Ron Cudzewicz FNAL,
August 5th, 2008 Change in OSG – JSPG relationship 19 No mandatory acceptance of JSPG policies in OSG. We contribute in the working groups to make policies as uniform as reasonable & give feedback. Agreed to by OSG, EGEE, WLCG. We work with US LHC S&C and WLCG on OSG policies & to communicate (& agree on) differences from those recommended by JSPG. Contact is: Dave Kelsey, WLCG Security Coordinator
August 5th, 2008 TitleComments VO AUP TemplateOSG has a template AUP policy that member VOs are required to fill out. VO User Registration and Management Template Template OSG has a template policy that member VOs can fill out. OSG Security Incident Handling and Response Plan The JSPG’s incident handling and response policy is based on earlier OSG policy. Thus, two policies are compatible. Work is needed to address cross- grid coordination. Also, EGEE has a separate policy/procedure on software vulnerabilities, while OSG doesn't. Grid Acceptable Use PolicyThe OSG and JSPG policies are identical. Approval of Certificate AuthoritiesAuthoritiesThe OSG specific policy complies with the JSPG policy. Service AgreementOSG-specific policy with no equivalent JSP policy. Policy on Grid Pilot JobsThe JSPG policy is based on a Fermilab policy, which will provide the basis for the OSG policy. OSG does not have a specific document produced yet. Privacy PolicyThis is an OSG-only policy. It has been sent to the OSG EB and received feedback. VO Operations PolicyOSG has sent comments to the JSPG’s final call. There is no OSG-specific version of this policy Site Operations PolicyOSG approved the JSPG policy. OSG has not produced an OSG-specific policy yet. Traceability and Logging PolicyLoggingOSG has sent comments to JSPG’s policy. OSG does not have any auditing requirements formally approved beyond those in AUPs and accounting VO Registration Policy and Site Registration Policy These two policies are not reviewed by OSG yet. JSPG will start working on these documents soon and OSG will send comments later. Working with VOs on appropriate detail and contents List of OSG Policies
August 5th, 2008 Recent Iranian CA discussions and questions OSG EB (Kent & Bill) raised the issue of EAR wrt Iranian CA. FNAL security help determined that site size determines threshold. Did a rough survey of OSG sites. No university sites are near this threshold. Such policies were, are, and remain, the responsibility of the Site. We will undertake some understanding for communication purposes, but we have no responsibility here. For clarity we may add a sentence to our AUP saying we do not collect citizenship information in OSG? 21
August 5th, 2008 Recent Incidents/Alerts A root-level compromise at one site No grid incident detected Take-home messages: Good test for VO security officers: Very important for VOs to identify Security Officers. Alert at 2 nd site was a mis-communication due to system admin mis-configuration. EGEE security challenge resulted in USCMS having a poor score Confusion over which policies to follow being addressed. Now agreed with EGEE security officer - OSG will be involved in the next challenge. Issue of EGEE or WLCG challenge still to be clarified.
August 5th, 2008 Standing Issue: Partner grids with multiple VOs and sub-VOs: Identifying which VO/Application is the job submitter We report the partner grid as the VO How to provide the finer-granularity? How to provide VO usage policy?