OSG Security Program Review OSG Security Team M. Altunay, FNAL, OSG Security Officer, D. Olson LBNL, Ron Cudzewicz FNAL J. Basney NCSA, Anand Padmanabhan NCSA, Aashish Sharma NCSA February 2009
Review Charge Questions To assess the effectiveness of the OSG Security Program Questions to the reviewers (in turn to OSG security team) Assess possible loss of scientific opportunity due to security issues; Assess our assertion in striving to maintaining an open, self- administered environment that does not introduce “high cost” risk; Give us advice on areas of our program that would benefit from change/clarification/rethink. Is the effort assigned to the OSG security program appropriate given the impact of potential disruptions? Is the OSG security program scoped appropriately and evolving in the right direction both operationally and technically ? Are the principles of the OSG security program appropriate in terms of the division of responsibilities and authorities between the members of the OSG Consortium. 2 02/09 OSG Security Review
OSG Security Program Based on OSG principles Autonomy of OSG members OSG security does not mandate over sites or VOs Openness, diversity Anyone is welcome to be a member OSG as Middleman We work as the middleman between site and the VO Integrated Security Management Everyone contributes Self-protecting software/Minimum impact Symmetry between members Mutual benefit Benefit to the Site, to the VO, to the software providers 3 02/09 OSG Security Review
Autonomy VOs (users), sites, and software providers are autonomous. Responsible for their own decisions/policies. We make recommendations, educate, raise awareness We do NOT make decisions for our members Example: sites decides which CA to install, sites “cooperate” during incident response. OSG is bottom-up, NOT top-down Advantage: Diverse members Members feeling embraced and being part of something larger Disadvantage Complexity! Where to draw the line? (later in policies) 4 02/09 OSG Security Review
Principle: Integrated Security Management Everyone contributes to security NOT only the security team! Incorporate security into everyone’s duties Site coordination: tell sites about security practices & policies Engagement: help new users learn and use security practices Operations: secure comm. channels with VO and sites, security tickets, monitoring to help security team Software configuration: have secure configurations by default Consortium members VOs: responsible for VO services security and users’ practices Sites: responsible for site security Security team is responsible for Reviewing, evaluating, making recommendations OSG staff, VO and site staff, users, Software and infrastructure Operational security Training, educating and disseminating security practices 5
Realization of Integrated Security Mgmt 6 OSG Security management plan (Doc id#389) Describes each core OSG areas’ security responsibilities Contains Security Test &Evaluations (ST&E) for each core area. Revised each year based on results 2008 survey – sent out to area coordinators in January 09
Security Program of Work Operational Security Policies, Procedures ST&E Software security Education, Training 7 02/09 OSG Security Review
Security Test and Evaluation Targeted towards OSG core assets: OSG-run services: BDII, VDT caches, Gratia, RSV… Each service owner answers to the controls Sample control: classify data types contained in the service, backup plans, duration, and deletion of the data. Archive from 2008 results Switched to survey gizmo this year Goal: broaden the scope of the controls, include VOs and sites services E.g. check if GUMS service uses ssl with VOMS If VOMS has a valid certificate, lifetime of voms-proxies 8 02/09 OSG Security Review
Operational Security Our activities keep an active dialogue with our members evaluate the security of our software stack release timely patches for identified vulnerabilities observe the practices of our VOs and sites send alerts when abnormalities are found continually performing fire drills to measure readiness and security awareness 9 02/09 OSG Security Review
Operational Security Monitoring the security infrastructure RSV probes CA distribution service: content against IGTF and availability CA/CRL availability Supported VOs, mismatched user certificates VOMS/GUMS connection frequency (on the sites), frequency of downloads VOMS issued proxy lifetimes VOMS/GUMS ssl handshakes Some planned to be monitored daily, some to be once-a- month Need more monitoring tools (more later) Check for unusual activity (inspect log files, track proxies, unusual job submissions) Need a clear separation of what security team monitors and what operations team monitors. What does our peer grids monitor? 10 02/09 OSG Security Review
Software Security Software Stack Software vulnerabilities Reported by the provider, or users, or ITB or security team. Security team determines the fix plan Methodical testing of the software (More on this later) Thinking which tools to buy, whom to do the work and work on the found vulnerabilities Watching vulnerability bulleting boards On a best effort basis Third-party software and bugs – very hard to track 11 02/09 OSG Security Review
Incident Response Each area leader is first line of defense Sites, VOs, Support Centers: minimum of two contacts. Contact info is stored centrally OSG Information Database (OIM) Contact info is verified by Security team at least annually Immediate reporting to OSG Operations 24*7 Operations has security team’s 24*7 contact information Security team’s incident response procedure Gather data from all related parties, sites, VOs, software providers, support centers, core OSG staff Determine the impact of the incident and mitigation plans Inform related parties Coordinate flow of information 12
Incident Response Advise short-term and long-term mitigation plans Work with VDT and operations team for release or patches Open vulnerability tickets with the software if needed Monitor if advises are adopted, contact unresponsive parties. Close the incident when mitigation plan is completed Post-mortem incident analysis with executive team 13 02/09 OSG Security Review
Security Incidents in Incident nameSeverity First Response by OSGScope Total Time Spent Debian OpensslHighNext dayAll sites2 weeks U Michigan compromiseHighSame day U of Michigan and Atlas sites3 days DNS cache poisoningMediumSame dayAll sites25 days SSH compromiseHighSame dayAll sites14 days Twiki compromiseMediumNext day OSG twiki server3 days TeraGridLowSame dayAll sites7 days INFNHighSame daydCache sites6 Possible certificate CompromiseHighSame dayAll sites2 days PakistaniGrid CA failureMediumNext dayAll sites20 days
Incident Response and Communications Problem faced: Poor coordination with peer grid and non- grid HEP community during incident response Solution 1: Security team developed close ties with TeraGrid security team and EGEE (WLCG) security team Joint lists, immediate alerts and updates. Two members of OSG incident response team are members of TeraGrid incident response team Building a GRID-SEC community for all other peer grids Will need to evolve as National Grids emerge in WLCG in upcoming years. Solution 2: Security team joined Ren-ISAC to plug in to non-grid community Research and Education Networking Information Sharing and Analysis Center Including many universities All information flow is kept confidential 15
Operational Security Current scope: Risk assessment and threat model Blueprint planned Currently threat levels are all evaluated as medium (loss of 1 week of production and/or 200K) and low No high level risk (a month or more than a week) We have no attack community Checklist for site and VO admins Enable them to measure themselves Monitoring tools Methodical software testing 16 02/09 OSG Security Review
Policies & Procedures Security team participates in policy and standards bodies IGTF/TAGPMA JSPG (WLCG’s security policy body) Policies are discussed with suitable core area leaders. Do not expect area leaders to be security experts. Extract the impact of the policy, explain in simple language, and gather response on feasibility. E.g. a policy regarding VO operations is explained to OSG VO coordinator, who gathers VOs’ feedback and sends it back to the security team Security team does not make policy decisions in isolation Gives a chance to related parties to refuse. Conflicts: send policies with security team’s recommendations to OSG EB and OSG Council. 17
Education & Training Hands-on tutorials, presentations at AHM, Site Admins meeting We need a systematic and sustainable approach: Creating security lectures to be taught as site admin grid school Open office hours during & after incidents 18 02/09 OSG Security Review
Back to Charge Questions Assess “possible loss” of science due to security Slide 14# last year’s incidents No OSG-wide shutdown, no loss of production “Possible loss” Risk assessment document (under review) No high level risk due to lack of attacker community Mostly medium level risk and medium level loss (Medium loss: loss of a week or 200K) Working on threat model (blueprint meeting scheduled on April 28) 19 02/09 OSG Security Review
Effort given to Security Program FTEs currently 0.25 ST&E annual controls 0.8 security officer, anything needed 1 temp help with developing monitoring tools 0.5 incident response and education/training 0.5 DOEGrids, policy&procedures, operational Total: 3.05 FTE but 1 FTE is temp 20 02/09 OSG Security Review
21 02/09 OSG Security Review
Software & Security Vulnerability discovery procedure Reporter opens a ticket with GOC or VDT Security receives the ticket and then Informs the software provider Rate the severity (high-asap, medium – 1 week, low) Determine the impact of the fix to OSG community Negotiate the timeframe with VDT Prepare an announcement If a fix plan is needed, create one with operations and VDT (how to propagate the fix, which services be affected, monitor the install, etc) Once the fix is released, close the ticket 22 02/09 OSG Security Review
Software security VDT has 70+ components Had a security meeting with software providers Mutual expectations from both sides A draft document summarizing our findings have posted Work on vulnerabilities, Vulnerability announcements, security related config variables, authN/authZ libraries, Monitoring third-party bugs (hard because…) Software certification/testing process No security testing at the moment Dependent on developer’s expertise in security Usually not sufficient despite good intentions 23 02/09 OSG Security Review
Software security Current focus: Methodical security testing Tool-based testing : fortitude or coverity. Which one to buy, who will look after the errors undetermined Human-based testing: possible, but time costly Building and accessing a security test-bed What other peer grids are doing? Security certification? How? Black-box testing (very simple tests) Test the authN and authZ mechanisms, with wrong inputs, absent CAs & CRLs Search for open ports 24 02/09 OSG Security Review
Software & Security New operational tools -- development work Current focus: Authorization service (Banning service) Centrally and locally managed Central banning list pushed to sites Local banning service managed solely by the site Banning based on cert serial numbers Refusing long time proxies Central CA & CRL distribution Developing Monitoring tools GUMS and VOMS handshake frequency VOMS – GUMS service certs, secure connections Monitoring CAs Tools for incident monitoring: logging and proxy tracking 25 02/09 OSG Security Review
26 02/09 OSG Security Review
Policies & Procedures edures edures New procedures OSG will need: How to include a new CA (non-IGTF accredited) How to remove a CA New policies Core services CA acceptance policy (only for OSG-operated services) Banning sites or VOs (where to draw the line) How many failures on a monitoring service grounds for expulsion? What other behaviors? 27 02/09 OSG Security Review
Security Infrastructure Future Embrace diverse authN methods Non-accredited CAs, shibboleth CAs? Can we use Shib portals or openID portals to generate certs? However, NO change in underlying X.509 infrastructure Goal: make cert management transparent to the user Goal: move ID vetting to organizations? Proxy Lifetimes Proxy renewal is not in our plans Instead, sites will get a “knob” to refuse proxies longer than a week A week would be enough for osg jobs to finish Sites will ban proxies If jobs will get longer, we either increase proxy lifetime or provide a renewal service 28 02/09 OSG Security Review
AuthZ Not as universally accepted as X.509 Several mapping and authz services: GUMS, edg, SCAS, EGEE’s new Authz Service… None of which work with virtual workspaces, or web services – UNIX dependency At VO side: Grouper, VOMS … 29 02/09 OSG Security Review
Priorities AuthN priorities Understanding RA workflow and auditing the cert approval process Auditing of non-accredited CAs (Purdue CA) Authorization Banning process 30 02/09 OSG Security Review
Last Year’s Priorities ST&E completion for the first 6 months Identifying core OSG policies and services Core assets list List of policies we approved Incident Response Process defined Secure comm channels with security contacts Proxy Cleanup survey 31 02/09 OSG Security Review
Incident Response REN-ISAC, coordination with TG and EGEE Roadmap for the security team Needed operational tools (banning tool, cert mgmt tool) Jointly listed tools with EGEE (banning tool) 32 02/09 OSG Security Review
Priorities Operational security ST&E and revise our security plan Auditing core services (Cert RA) Incident response containment (Banning tool) Procedures Making security advisories Auditing new CAs 33 02/09 OSG Security Review
Policies Focus on implementing and reviewing existing policies rather than getting new ones approved Focus will be OSG registration process 34 02/09 OSG Security Review
Risk Assessment Plan Blueprint 4/28-29 Threat model and risk assessment ET will be in attendance Risks and the loss associated Revise the old assessment plan 35 02/09 OSG Security Review
36 02/09 OSG Security Review