Security in an Agile Infrastructure CERN Computer Security Team 2013/12/12.

Slides:



Advertisements
Similar presentations
Middleware technology and software quality issues Andrew McNab Grid Security Research Fellow University of Manchester.
Advertisements

Version Control System (Sub)Version Control (SVN).
1 Chapter 8 Fundamentals of System Security. 2 Objectives In this chapter, you will: Understand the trade-offs among security, performance, and ease of.
Web Defacement Anh Nguyen May 6 th, Organization Introduction How Hackers Deface Web Pages Solutions to Web Defacement Conclusions 2.
WLCG Cloud Traceability Working Group progress Ian Collier Pre-GDB Amsterdam 10th March 2015.
Lecture 11 Reliability and Security in IT infrastructure.
Web server security Dr Jim Briggs WEBP security1.
Operating System Security Chapter 9. Operating System Security Terms and Concepts An operating system manages and controls access to hardware components.
7-Access Control Fundamentals Dr. John P. Abraham Professor UTPA.
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
Security Risk Management Marcus Murray, CISSP, MVP (Security) Senior Security Advisor, Truesec
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Threats to I.T Internet security By Cameron Mundy.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
Cyber Patriot Training
General Awareness Training
1 Group Account Administration Introduction to Groups Planning a Group Strategy Creating Groups Understanding Default Groups Groups for Administrators.
What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
FORESEC Academy FORESEC Academy Security Essentials (II)
SECURITY ZONES. Security Zones  A security zone is a logical grouping of resources, such as systems, networks, or processes, that are similar in the.
What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
1 Administering Shared Folders Understanding Shared Folders Planning Shared Folders Sharing Folders Combining Shared Folder Permissions and NTFS Permissions.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
WLCG Cloud Traceability Working Group face to face report Ian Collier 11 February 2015.
Lecture 19 Page 1 CS 236 Online 16. Account Monitoring and Control Why it’s important: –Inactive accounts are often attacker’s path into your system –Nobody’s.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES AI’s user access, OpenStack security groups and firewall.
Virtual Workspaces Kate Keahey Argonne National Laboratory.
CERN IT Department CH-1211 Genève 23 Switzerland t Security Overview Luca Canali, CERN Distributed Database Operations Workshop April
Meeting Minutes and TODOs TG has no distributed monitoring. During incident response, use a manual twiki page to distribute information TG monitors the.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
1 OFF SYMB - 12/7/2015 Firewalls Basics. 2 OFF SYMB - 12/7/2015 Overview Why we have firewalls What a firewall does Why is the firewall configured the.
Computer Security Risks for Control Systems at CERN Denise Heagerty, CERN Computer Security Officer, 12 Feb 2003.
Security Vulnerabilities in A Virtual Environment
CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.
Module 12: Responding to Security Incidents. Overview Introduction to Auditing and Incident Response Designing an Audit Policy Designing an Incident Response.
Administering Groups Chapter Eight. Exam Objectives In this Chapter:  Plan a security group hierarchy based upon delegation requirements  Plan a security.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
Importance of Physical Security Common Security Mistakes 1.Security Awareness 2.Incident Response 3.Poor Password Management 4.Bad administrative.
Chapter 6 Discovering the Scope of the Incident Spring Incident Response & Computer Forensics.
Writing Security Alerts tbird Last modified 2/25/2016 8:55 PM.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
1 Integrated Site Security Project Denise Heagerty CERN 22 May 2007.
CERN AI Config Management 16/07/15 AI for INFN visit2 Overview for INFN visit.
“Status and Challenges of Security in Distributed Computing” — Stefan Lüders — CHEP2010 Status and Challenges of Security in Distributed Computing Stefan.
Chapter 12 Operating System Security. Possible for a system to be compromised during the installation process before it can install the latest patches.
How We Got Here PC and Internet changed the rules –Viruses, information sharing, “outside” and “inside” indistinguishable –Vulnerability research for.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,
Logging and Monitoring. Motivation Attacks are common (see David's talk) – Sophisticated – hard to reveal, (still) quite limited in our environment –
Configuration Services at CERN HEPiX fall Ben Jones, HEPiX Fall 2014.
Automating operational procedures with Daniel Fernández Rodríguez - Akos Hencz -
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
INFSO-RI Enabling Grids for E-sciencE Workshop WLCG Security for Grid Sites Louis Poncet System Engineer SA3 - OSCT.
Stop Those Prying Eyes Getting to Your Data
Critical Security Controls
Open Source Systems Administration
Firewall Configuration and Administration
Threats to computers Andrew Cormack UKERNA.
Determined Human Adversaries: Mitigations
Chapter 7 – and 8 pp 155 – 202 of Web security by Lincoln D. Stein
BACHELOR’S THESIS DEFENSE
BACHELOR’S THESIS DEFENSE
BACHELOR’S THESIS DEFENSE
Determined Human Adversaries: Mitigations
16. Account Monitoring and Control
WTF… About the unsecurity of IoT
Presentation transcript:

Security in an Agile Infrastructure CERN Computer Security Team 2013/12/12

Why? In the “old” CC world, we had a good level of security (with some areas of “bricolage” and suboptimal configurations). Many have worked hard to keep this level up. This has spared us from major incidents in the recent years. And from major consequences.

Well done so far we have There was on average one root compromise per month in the WLCG during the last ten years. Passwords are harvested en gros. These compromises periodically affected CERN users; CERN power users/admins are as exposed as anybody else.

The Consequences (I)  Loss of reputation: Downtimes can have bad adverse effects (one minute of AFS down stirred already noise within BE like in 2011)  Damage to data: Misconfigurations or successful attacks can lead to data deletion (as just happened recently with 15TB of CMS EOS data lost)  Manipulation of source code or configuration files: Compromise of Git or PuppetDB can render our whole software base compromised. The costs for going through all recent commits and code changes are enormous (fortunately, CERN spared so far, but Linux itself has been hit hard in 2011)  Reinstallation of services: Even in an Agile Infrastructure, reinstallation requires lots of due diligence, thorough testing, and involvement of the whole service team (LXPLUS/LXBATCH regularly reinstalled after some critical vulnerabilities have been published)

The Consequences (II)  Change of passwords: Password exposure always requires re-initialisation of passwords, and, as many dependencies are not always understood, require carefulness (changing Oracle passwords is always a challenge to the DB owners and to BE/CO)  Change of secrets: Every secret that leaks need to be changed. Some are “easily” renewable, others are not (e.g. the LXPLUS SSHD private key)  Forensics: Investigation for cause of incident require involvement of Security Team but also of affected service providers. This can take time and usually trigger other costs (like reinstallation and password change)

Restart. The new Agile Infrastructure is a game changer. Lots of new tools, lots of fringe effects, lots of paradigm changes. Partially we gain control, partially we lose control. Are we back to square one?

New Security Needs Vincent (IT/CSO)

Some Working Areas  Puppet “Facts” executed as root: Puppet module owners (150+ from all over CERN) can run arbitrary commands as root on any host just by defining “facts”. JIRA AI-3081 by Gavin. Pursuing upstream. Assessing cost to fix in parallel.  Secrets exposed by Git: ai-admins owning a VM (170+ from all over CERN) can access the secret keys and passwords used by any other host as hiera-gpg is inherently broken. JIRA AI-3266 with Ben & Vincent going for “Teigi” at probably low costs.  Puppet environment hijack: ai-admins can add an “override” to existing environments and become root. git-hook might help, under discussion. Given that the usage of Puppet within CERN ought to grow, the more systems and functionality are added, the more shared modules will be created, the more people will be added to these e-group. For sure, these won’t only be members of the IT department. And for sure, some have left again their AI role…

The Threat  Finger-trouble by any ai-admin when modifying facts can impact on a multitude of CC services (not for the first time where unintentional mistakes created downtime)  Frustrated colleagues would have an open door to take revenge against the IT department, an experiment or CERN (happened already in e.g. BE and GS)  Malicious attackers can take over the CC. All they need is a password of any ai-admin and the patience to understand our environment (passwords regularly lost to sophisticated attacks at CERN and other sites) Given it takes an average of six months before incidents are discovered, if any CERN account with admin rights is abused, a complete reinstallation of the CC will become mandatory to be sure that the incident is contained and resolved.

Reflect. How much Agile Security do we want? While we might all agree that having too much security is not optimal and probably too expensive, having too little security implies significant hidden costs to be borne by the IT department and CERN should a compromise occur.

Act! Maintain an Agile security footprint as we’d had for the “old” CC. A new infrastructure comes with new security needs (and with new costs). We must continue to invest in order to be spared from major incidents. And from major consequences.

On Short Term Meet regularly to discuss/decide on Agile Security measures  Already started bottom-up but needs to be widened in scope.  see also next slides Harden hosts under our control  Default toggle “on” of tight iptables, RPMverify and syslog.  Preferred “on” for netlog & Snoopy (but we have some digestion troubles ATM). Identify who needs full default access to all nodes  (PF or FP will panic if they learn that “most IT” has access to their data)  Procurement? 1 st line HW support? CC operators? CC admins?  Align with CC access policy and have a justified & permanent need.  How is IPMI/root access done? P2P? SSH via [LX|LXVO|AI]ADM (as in the past)? Deploy 2FA AuthN for Foreman/Judy and AIADMs  Required for everyone? Only super-users? How to deal with API calls?  Both Intranet/Internet or only Internet? Does that make sense?

On the Long Run: Agile Infrastructure Review security & perform impact analysis of all Agile components & tools  (5 remote code execution vulnerabilities for Puppet reported 1 st half 2013!)  Discuss findings on Thursday meetings; turn valid ones into JIRA tickets. Compartmentalize access to hostgroups & modules  Reduce the wide rights of “ai-admins”.  Move towards a system of least-privileges where hostgroup admins can do everything to their hostgroups, but nothing outside…  …and where modules can only be altered by their admins. Have traceability on all(?) actions Deploy logging infrastructure for VMs  elasticSearch and Storm to the rescue(?!).  Can we really scale up to such a size???  Pending with the Security Team, but help/ideas appreciated!

On the Long Run: OpenStack Review security of CERN’s OpenStack deployment  Perform gap analysis to  Prioritize, agree, deploy.  Identify areas of improvement.  Provide Security Baseline for Hypervisors and VMs. (5 “important” CVEs for KVM in 2013!) Keep traceability of user-controlled VMs  Log contact information, date of usage, assigned IP, …  Have snapshots of VMs at start, end, in between (?!) and archive them. Deploy VM segregation  Separate critical services from user-controlled VMs (avoid having mail, EDH, DFS on the same HV than VOboxes).  Keep homogenous services (like LXBATCH) together (in order to simplify outer perimeter firewall & gate rules).  LCG vs GPN>.

John, please answer me! Thanks!