David Groep Nikhef Amsterdam PDP & Grid Traceability in the face of Clouds EGI-GEANT Symposium – cloud security track With grateful thanks for the input.

Slides:

Advertisements

Similar presentations

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.

Advertisements

Overview of local security issues in Campus Grid environments Bruce Beckles University of Cambridge Computing Service.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI AAI in EGI Status and Evolution Peter Solagna Senior Operations Manager

Science Gateway Security Recommendations Jim Basney Von Welch This material is based upon work supported by the.

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

WLCG Cloud Traceability Working Group progress Ian Collier Pre-GDB Amsterdam 10th March 2015.

Chapter 1  Introduction 1 Overview  What is a secure computer system?  Concerns of a secure system o Data: Privacy, Integrity, Availability o Users:

SOA Security Chapter 12 SOA for Dummies. Outline User Authentication/ authorization Authenticating Software and Data Auditing and the Enterprise Service.

WLCG Security TEG, risks and Identity Management David Kelsey GridPP28, Manchester 18 Apr 2012.

Website Hardening HUIT IT Security | Sep

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.

What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:

INFSO-RI Enabling Grids for E-sciencE Incident Response Policies and Procedures Carlos Fuentes

EGI-InSPIRE RI EGI-InSPIRE RI EGI-InSPIRE EGI services for the long tail of science Peter Solagna Senior Operations.

EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.

What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:

Cloud Use Cases, Required Standards, and Roadmaps Excerpts From Cloud Computing Use Cases White Paper

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud F2F Security Issues in the cloud Introduction Linda Cornwall,

WLCG Security: A Trust Framework for Security Collaboration among Infrastructures David Kelsey (STFC-RAL, UK) CHEP2013, Amsterdam 17 Oct 2013.

WLCG Cloud Traceability Working Group face to face report Ian Collier 11 February 2015.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,

UKI ROC/GridPP/EGEE Security Mingchao Ma Oxford 22 October 2008.

LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 November 2007.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud Security - what is needed Linda Cornwall (STFC) and the.

Summary of AAAA Information David Kelsey Infrastructure Policy Group, Singapore, 15 Sep 2008.

EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.

INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.

Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.

Security Policy Update David Kelsey UK HEP Sysman, RAL 1 Jul 2011.

David Groep Nikhef Amsterdam PDP & Grid Some Comments on “Problem description for non-proliferation issues in Grids” Joint Security Policy Group 7 December.

A Trust Framework for Security Collaboration among Infrastructures David Kelsey (STFC-RAL, UK) 1 st WISE, Barcelona 20 Oct 2015.

Module 12: Responding to Security Incidents. Overview Introduction to Auditing and Incident Response Designing an Audit Policy Designing an Incident Response.

IOTA AP Towards Differentiated Identity Assurance David Groep, Nikhef supported by the Netherlands e-Infrastructure and SURFsara.

A Trust Framework for Security Collaboration among Infrastructures David Kelsey (STFC-RAL, UK) WLCG GDB, CERN 10 Jul 2013.

PRESENTATION TITLE Presented by: Xxxx Xxxxx. Providence Health & Services Very large Catholic healthcare system 33 hospitals in AK, CA, MT, OR, WA 65,000.

Reflections “from around the block.” (Security) Ian Neilson GridPP Security Officer STFC RAL.

DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.

Security Policy: From EGEE to EGI David Kelsey (STFC-RAL) 21 Sep 2009 EGEE’09, Barcelona david.kelsey at stfc.ac.uk.

EGI-InSPIRE RI EGI EGI-InSPIRE RI Service Operations Security Policy the new generalised site operations security policy.

Evolving Security in WLCG Ian Collier, STFC Rutherford Appleton Laboratory Group info (if required) 1 st February 2016, WLCG Workshop Lisbon.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI SPG future work EGI Technical Forum Lyon, 21 Sep 2011 David Kelsey, STFC/RAL.

Role Of Network IDS in Network Perimeter Defense.

An Active Security Infrastructure for Grids Stuart Kenny*, Brian Coghlan Trinity College Dublin.

 Introduction  Tripwire For Servers  Tripwire Manager  Tripwire For Network Devices  Working Of Tripwire  Advantages  Conclusion.

JSPG Update David Kelsey MWSG, Zurich 31 Mar 2009.

Planning for LCG Emergencies HEPiX, Fall 2005 SLAC, 13 October 2005 David Kelsey CCLRC/RAL, UK

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Draft Security Virtualisation Policy (for Romain Wartel – CERN) EGI Technical.

Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.

David Groep Nikhef Amsterdam PDP & Grid Bring the WLCG federation Home Extending your trust options beyond bottom-up identity by collaborating with global.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Federated Cloud and Software Vulnerabilities Linda Cornwall, STFC 20.

Traceability WLCG GDB Amsterdam, 7 March 2016 David Kelsey STFC/RAL.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security aspects (based on Romain Wartel’s.

Cyber Security Issues in HEP and NP Grids Bob Cowles — SLAC NC August 2004.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,

Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.

Logging and Monitoring. Motivation Attacks are common (see David's talk) – Sophisticated – hard to reveal, (still) quite limited in our environment –

Cloud Security Session: Introduction 25 Sep 2014Cloud Security, Kelsey1 David Kelsey (STFC-RAL) EGI-Geant Symposium Amsterdam 25 Sep 2014.

Security Policy Update WLCG GDB CERN, 11 June 2008 David Kelsey STFC/RAL

Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.

SCI & Sirtfi David Kelsey (STFC-RAL) EGI Conference, Lisbon 19 May 2015.

Ian Bird GDB Meeting CERN 9 September 2003

Hot Topics:Mobility in the Cloud

Romain Wartel EGEE08 Conference, Istanbul, 23rd September 2008

WLCG Collaboration Workshop;

Assessing Combined Assurance

Assessing Combined Assurance

Update - Security Policies

Federated Incident Response

Presentation transcript:

David Groep Nikhef Amsterdam PDP & Grid Traceability in the face of Clouds EGI-GEANT Symposium – cloud security track With grateful thanks for the input from Romain Wartel, CERN and wLCG Sven Gabriel, Nikhef and EGI Ian Collier, STFC/RAL

David Groep Nikhef Amsterdam PDP & Grid wLCG experience Incidents happen on a regular basis, per year Attacks continue to improve over the years ◦ More and more sophisticated  For example, Zeus (Windows botnet) used to steal HEP accounts  No easy or public mean to detect modern malware ◦ No longer a side-effect of being connected to the Internet  State-of-the-art malware used against WLCG  Attackers being arrested for attacking WLCG resources ◦ No reduction of the severity or # of incidents in the recent years  Yet most of them follow the same pattern  We have now built the necessary expertise and have experience Slide content courtesy Romain Wartel, CERN and wLCG

David Groep Nikhef Amsterdam PDP & Grid “Be able to answer the basic questions who, what, where, and when concerning any incident.” Prevent re-occurances of the incident Prevent a ‘waterbed effect’ in our federated infrastructure ‘in building our infrastructure to federate we also help miscreants spread through federated access – so we now also need rapid, coordinated, and federated response’ Larger federation  larger risk of (apparent) ‘insider actions’ The Traceability Premise

David Groep Nikhef Amsterdam PDP & Grid Record (‘who, what, when, where’) ◦ at minimum be able to identify the source of all actions (executables, file transfers, portal jobs) and the individual who initiated them ◦ traceability commensurate with scope of action and React ◦ sufficiently fine-grained controls, such as blocking the originating user and monitoring ◦ communicate controls information rapidly throughout the federation (resource centres, users, communities) and only then Recover ◦ understand the cause and to fix any problems before re-enabling access for the user Traceability for the HTC platform

David Groep Nikhef Amsterdam PDP & Grid A policy framework Number of security policies apply to participants: ◦ Important operational security: ◦ Security Incident Response Policy   “A security incident is the act of violating an explicit or implied security policy “  Report suspected incidents locally and to the infrastructure  “Perform appropriate investigations and forensics and share the results with the incident coordinator”  “Aim at preserving the privacy of involved participants and identities” ◦ Traceability and Logging Policy  Slide content courtesy Romain Wartel, CERN and wLCG

David Groep Nikhef Amsterdam PDP & Grid Idea: understand and prevent incidents* Requirements: ◦ Software MUST produce application logs:  Source of any action  Initiator of any action ◦ Logs MUST be collected centrally [resource centre] ◦ Logs MUST be kept 180 days Sites currently know what to do in order to be able to answer who, what, where & when Current EGI/wLCG Security Traceability and Logging Policy

David Groep Nikhef Amsterdam PDP & Grid Forensics & trace analysis capabilities scarce ◦ Mostly at the larger resource centres and with a few specialised institutes and individuals ◦ Logs and audit records needed for experts to work on ◦ Collaborate widely with the trusted community to maintain integrity of our ecosystem at large Capabilities

David Groep Nikhef Amsterdam PDP & Grid With new service classes (like IaaS clouds) our ‘attack surface’ increases ◦ Record?  we now need traceability capabilities for all access methods  with expertise for forensics and analysis ◦ React?  controlling access for suspected miscreants  both to the innards of the VM as well as to the ‘external controls’ (management interfaces, KVM console, networks, …) ◦ Recover?  different entities now responsible for the resolution  But re-enabling any service should wait for full resolution! Beyond the HTC platform offering

David Groep Nikhef Amsterdam PDP & Grid We cannot implement traceability in exactly the same way Sites can log observable behaviour ◦ VM launched at such and such a time ◦ Network connection to such and such an address at a certain time ◦ Etc. Sites can no longer see ◦ Credential used to run workload(s) inside VMs ◦ Detailed application logs from within past VMs CAN isolate running VMs for analysis ‘Sites’ can’t do it all Slide content courtesy Ian Collier, STFC/RAL

David Groep Nikhef Amsterdam PDP & Grid New territory VOs (research communities) will have to participate in incident response to provide the missing information. Are VOs going to maintain detailed central application logs and retain them? Could sites provide a central syslog service for VMs run at their site? ◦ But that would not help for public cloud work ◦ Perhaps just for some nodes Many more issues and questions Slide content courtesy Ian Collier, STFC/RAL

David Groep Nikhef Amsterdam PDP & Grid wLCG already faced distributed responsibility Distributed traceability in practice? End User Community overlay services Resource Centre 1 Resource Centre II Resource Centre III Overlay prepositioned jobs (“container”) Request VO service to execute their task Preplaced container retrieves user payload

David Groep Nikhef Amsterdam PDP & Grid For a test, a fake-malicious user payload was submitted through community container portal … Common & multiple-use containers made tracing impossible for the VO – and the VO-CSIRT existed (unique!) and was involved After a week (!) the intruder was not yet found Remarkable resources would have be needed for a proper response Retention times for needed logs are to short (<30 days). “It would have taken O(1 week) to scan all input sources for the offending code” Exercising traceability Slide content inspired by Sven Gabriel, Nikhef

David Groep Nikhef Amsterdam PDP & Grid Next steps We need to address this before workflows become too firmly established. ◦ Easier to build in early than to add on afterwards ◦ Traceability requires specific design at every level Working group (sites, communities, and users) ◦ Test different approaches to filling traceability gaps ◦ Update guidelines ◦ Disseminate ◦ Exercise the system – with planned and unscheduled challenges Slide content inspired by Ian Collier, STFC/RAL