A Large Scale Exploratory Analysis of Software Vulnerability Life Cycles Muhammad Shahzad Dept. of Computer Science and Engineering Michigan State University.

Slides:



Advertisements
Similar presentations
Abstract There is significant need to improve existing techniques for clustering multivariate network traffic flow record and quickly infer underlying.
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Network Security Attack Analysis. cs490ns - cotter2 Outline Types of Attacks Vulnerabilities Exploited Network Attack Phases Attack Detection Tools.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
CHAPTER 2 KNOW YOUR VILLAINS. Who writes it: Malware writers vary in age, income level, location, social/peer interaction, education level, likes, dislikes.
AEB 37 / AE 802 Marketing Research Methods Week 7
Secure Unlocking of Mobile Touch Screen Devices by Simple Gestures – You can see it but you can not do it Arjmand Samuel Microsoft Research Muhammad Shahzad.
P REDICTING ZERO - DAY SOFTWARE VULNERABILITIES THROUGH DATA MINING Su Zhang Department of Computing and Information Science Kansas State University 1.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Chapter 2: Pattern Recognition
Advanced Security Center Overview Northern Illinois University.
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Computer Security and Penetration Testing
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
BUILDING A SECURE STANDARD LIBRARY Information Assurance Project I MN Tajuddin hj. Tappe Supervisor Mdm. Rasimah Che Mohd Yusoff ASP.NET TECHNOLOGY.
The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,
© Sam Ransbotham The Impact of Immediate Disclosure on Attack Diffusion and Volume Sam Ransbotham Boston College Sabyasachi Mitra Georgia Institute of.
Presenter Deddie Tjahjono.  Introduction  Website Application Layer  Why Web Application Security  Web Apps Security Scanner  About  Feature  How.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
P REDICTING ZERO - DAY SOFTWARE VULNERABILITIES THROUGH DATA - MINING --T HIRD P RESENTATION Su Zhang 1.
Vulnerabilities. flaws in systems that allow them to be exploited provide means for attackers to compromise hosts, servers and networks.
1 Security Risk Analysis of Computer Networks: Techniques and Challenges Anoop Singhal Computer Security Division National Institute of Standards and Technology.
Data Mining Techniques
Objectives Learn what a file system does
Information Systems Security Computer System Life Cycle Security.
Ladd Van Tol Senior Software Engineer Security on the Web Part One - Vulnerabilities.
Introduction to Computer Ethics
Web Application Security Testing Automation.. Copyright © 2008 Deloitte Touche Tohmatsu. All rights reserved.1 What types of automated testing are there?
Top Five Web Application Vulnerabilities Vebjørn Moen Selmersenteret/NoWires.org Norsk Kryptoseminar Trondheim
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
1 Vulnerability Assessment of Grid Software James A. Kupsch Computer Sciences Department University of Wisconsin Condor Week 2007 May 2, 2007.
Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,
Security Attacks CS 795. Buffer Overflow Problem Buffer overflows can be triggered by inputs that are designed to execute code, or alter the way the program.
CIS 450 – Network Security Chapter 14 – Specific Exploits for UNIX.
Input Validation – common associated risks  ______________ user input controls SQL statements ultimately executed by a database server
Attack signatures derived from Metasploit Final Presentation E. Ramirez A. Zoghbi
Security measures across the software development process Dr. Holger Peine Slide 1 Security vulnerabilities are clearly.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 9 1COMP9321, 15s2, Week.
+ Moving Targets: Security and Rapid-Release in Firefox Presented by Carlos Bernal-Cárdenas.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Example – SQL Injection MySQL & PHP code: // The next instruction prompts the user is to supply an ID $personID = getIDstringFromUser(); $sqlQuery = "SELECT.
CISC 849 : Applications in Fintech Vaishnavi Gandra Dept of Computer & Information Sciences University of Delaware Extracting Cybersecurity Related Linked.
Is finding security holes a good idea? Presented By: Jeff Wheeler CSC 682.
1 Modeling the Search Landscape of Metaheuristic Software Clustering Algorithms Dagstuhl – Software Architecture Brian S. Mitchell
Zero Day Attacks Jason Kephart. Purpose The purpose of this presentation is to describe Zero-Day attacks, stress the danger they pose for computer security.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Security of Digital Signatures
Web Application Protection Against Hackers and Vulnerabilities
Clustering CSC 600: Data Mining Class 21.
Compliance with hardening standards
Example – SQL Injection
Nessus Vulnerability Scanning
CIS16 Application Development – Programming with Visual Basic
CSCI N317 Computation for Scientific Applications Unit Weka
Security at the Source.
Software Security Slide Set #10 Textbook Chapter 11 Clicker Questions
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Presentation transcript:

A Large Scale Exploratory Analysis of Software Vulnerability Life Cycles Muhammad Shahzad Dept. of Computer Science and Engineering Michigan State University Joint work with Muhammad Zubair Shafiq and Alex X. Liu

2 ICSE 2012, Zürich Software Vulnerabilities  A Software vulnerability is a weakness in software that allows attackers to compromise the security of a system.  An exploit is a means of taking advantage of a software vulnerability to compromise the security of a system. ─ In form of a piece of software, or a sequence of commands.  A patch is a means of fixing the vulnerability so that exploit becomes ineffective.  Vulnerability lifecycle ICSE 2012, Zürich

3 Why Study Software Vulnerability Lifecycle  Software vendors are adversely affected by vulnerability announcements. ─ Lost money: vendors loses 0.63% in market value on disclosure date [Telang and Vattal 2007] ─ Lost reputation  Goal: to know how the software industry is doing w.r.t vulnerabilities

4 ICSE 2012, Zürich Data Set  Sources ─ National Vulnerability Database (NVD) ─ Open Source Vulnerability Database (OSVDB) ─ Vulnerability data by Frei et al (FVDB)  vulnerabilities ─ 9667 vulnerabilities with patch dates ─ vulnerabilities with exploit dates  Software vendors ─ Over 11 thousand vendors and 17 thousand products

5 ICSE 2012, Zürich Vulnerability Information  Risk Score: low, medium, or high ─ Assigned by Common Vulnerability Scoring System (CVSS)  Access Vector: Local, Adjacent Network, Network ─ From which place hackers can launch attacks  Access Complexity: low, medium, or high ─ Complexity of the attack that exploits a vulnerability  Integrity Impact: none, partial, or complete ─ Impact of the attack that exploits a vulnerability  Disclosure date: when a vulnerability is disclosed  Exploit date: when an exploit is available  Patch date: when the patch is available  Text description of the vulnerability

6 ICSE 2012, Zürich Vulnerability Disclosure Rate

7 ICSE 2012, Zürich Access Vector

8 ICSE 2012, Zürich Access Complexity

9 ICSE 2012, Zürich Integrity Impact

Evolution of Different Types of Vulnerabilities

11 ICSE 2012, Zürich Vulnerability Clustering  Data set does not have vulnerability type.  The total number of vulnerability types is unknown.  Solution: use clustering algorithms to determine type and number of vulnerabilities. ─ Extracted relevant keywords from text description ─ Keywords used as features for clustering ─ Obtained 7 clusters ● EXE (Executables) ● DoS (Denial of Service) ● BO (Buffer Overflow) ● SQL injection ● XSS (Cross Site Scripting) ● PHP ● Misc

12 ICSE 2012, Zürich Vulnerability Evolution by Type

Evolution of Exploitation Behavior

14 ICSE 2012, Zürich t ed = Exploit Date - Disclosure Date  t ed < 0 ─ 2.8% vulnerabilities  t ed = 0 ─ 88.2% vulnerabilities  t ed > 0 ─ 9% vulnerabilities ─ Sub-ranges ● 0 < t ed ≤ 7: exploit released within a week after disclosure ● 7 < t ed ≤ 30: exploit released after a week but before a month ● t ed > 30: exploit released more than a month after disclosure

15 ICSE 2012, Zürich Evolution of Aggregate Exploitation Behavior

16 ICSE 2012, Zürich Evolution of Exploitation Behavior by Vendor

17 ICSE 2012, Zürich Evolution of Exploitation Behavior by Product

Evolution of Patching Behavior

19 ICSE 2012, Zürich t pd = Patch Date – Disclosure Date  t pd < 0 ─ 10.1% vulnerabilities ● Greater that the corresponding 2.8% of t ed < 0  t pd = 0 ─ 62.2% vulnerabilities ● Lesser compared to 88.2% of t ed = 0  t pd > 0 ─ 27.7% vulnerabilities ─ Sub-ranges ● 0 < t pd ≤ 7: patch released within a week after disclosure ● 7 < t pd ≤ 30: patch released after a week but before a month ● t pd > 30: patch released more than a month after disclosure

20 ICSE 2012, Zürich Evolution of Aggregate Patching Behavior

21 ICSE 2012, Zürich Evolution of Patching Behavior by Vendor

22 ICSE 2012, Zürich Evolution of Patching Behavior by Product

23 ICSE 2012, Zürich Conclusions  Number of vulnerabilities being disclosed each year has stopped increasing since 2006  Percentage of remotely exploitable vulnerabilities has gradually increased to over 80%  The access complexity of vulnerabilities has also been increasing  Closed source vendors are faster at patching the vulnerabilities  Since 2008, vendors have become very agile in patching the vulnerabilities  Still, average time for hackers to exploit a vulnerability is shorter than the time for vendors to patch.

24 ICSE 2012, Zürich Questions?

25 BACKUP SLIDES

26 ICSE 2012, Zürich Evolution of Exploitation Behavior by Type

27 ICSE 2012, Zürich Evolution of Patching Behavior by Type

28 ICSE 2012, Zürich Data Sources  

29 ICSE 2012, Zürich Interesting Patterns Mined Using Association Rules  Attributes used for association rule mining ─ Vendor name, product name, vulnerability type, Risk, t ed, t pd  For Microsoft, majority of high risk vulnerabilities are exploited on the disclosure date ─ vnd=Microsft type=XSS risk=H → ted=0  For Sun’s Solaris, medium risk vulnerabilities are exploited within a week from disclosure ─ vnd=Sun Prod=Solaris risk=M → 0<t ed ≤7  For Mozilla, we saw interesting rules stating that hackers are very quick in exploiting vulnerabilities that have not been patched while very slow for the patched vulnerabilities ─ vnd=Mozilla Prod=Firefox typ=BO t pd =0 → t ed >30 ─ vnd=Mozilla Prod=Firefox typ=BO 7<t pd ≤30 → t ed =0

30 ICSE 2012, Zürich Interesting Patterns Mined Using Association Rules  Microsoft is quicker in patching vulnerabilities in Windows compared to its other products ─ vnd=Microsoft prod=Windows type=BO → t pd =0 ─ vnd=Microsoft prod=IE type=BO → t pd >30  In case of Mozilla, BO and EXE vulnerabilities are patched very quickly ─ vnd=Mozilla prod=SeaMonkey type=BO → t pd =0

31 ICSE 2012, Zürich Implications  Observations from this study have important implications in ─ Software Design ─ Code Development Practices ─ Customer assessment of vendors and products

32 ICSE 2012, Zürich Software Design  Analysis of access requirements, functionality, and risk level ─ can reveal inherent flaws in software design process ─ For example, If a particular software series has abundant BO vulnerabilities ● shows lack of sanity check in socket and read processes  DoS vulnerabilities ─ In Solaris 38.85% of all exploited vulnerabilities ─ In OS X only 11.7% of all exploited vulnerabilities ─ Solaris is more susceptible to DoS attacks ─ Solaris developers need to take additional steps to avoid DoS attacks

33 ICSE 2012, Zürich Code Development Practices  Analysis of life cycles of vulnerabilities can reveal insights into code development and testing practices ─ For example, we observed that percentage of vulnerabilities with t pd >0 for open source vendors are significantly greater than for closed source ─ Shows that open source software have less resources dedicated to security compared to closed source

34 ICSE 2012, Zürich Customer Assessment of Vendors and Products  This analysis can be used in product assessment, certification, and security recommendations to customers  For example, ─ Sun should be preferred if patch response of vendor is of prime importance ─ MAC OS X should be used if a customer infrastructure has less tolerance to DoS attacks ─ Solaris should be used if customer wants to be robust against BO attacks

35 ICSE 2012, Zürich Proposed Methodology  Preprocess the data ─ Extract relevant keywords from the text description ─ Represent each vulnerability in terms of the keywords  Data Mining ─ Cluster the vulnerabilities ─ Identify the types of vulnerabilities in each cluster  Post processing ─ Assign each vulnerability a type

36 ICSE 2012, Zürich Preprocessing  Attributes are required to cluster  Representative keywords in the text can act as attributes ─ Take all words in all text descriptions ─ Compare the words with everyday news articles ─ Remove the matching words ─ Manually go through the remaining words ─ Remove the words that are non technical ─ Leaves us with 608 keywords

37 ICSE 2012, Zürich Preprocessing  Each vulnerability is a data point ─ 608 binary attributes DenialServiceBuffer…Overflow CVE-xxxx- yyyy 001…1 100…1 010…0

38 ICSE 2012, Zürich Clustering: Scheme  Selection of clustering scheme ─ Same vulnerability type ─ Different vendors ─ E.g., Buffer Overflow vulnerabilities ● Can be subdivided into: Apple BO, Microsoft BO  Hierarchical more suitable compared to Partitional ─ Ward ● Less susceptible to noise ● Does not break large clusters ● Ensures that SSE is small

39 ICSE 2012, Zürich Clustering: Distance Measure  Desired: Jaccard ─ Not implemented in Weka, problems in Matlab  Used: Hamming ─ Not implemented in Weka, available in Matlab  Euclidean not used ─ Asymmetric data  Cosine not used ─ Values in many cases become very small but non zero ─ Matlab does not handle them and results in error

40 ICSE 2012, Zürich Clustering: Challenges  Hierarchical clustering uses proximity matrix ─ by ─ Requires about 15.9GB RAM in Matlab  Solution ─ Sampling ─ 10 files randomly generated ● 5% sampling rate  If dataset has valid clusters, each random file should generate same centroids

41

42 ICSE 2012, Zürich Clustering: Centroids  608 attributes ─ Value of each attribute: 0 or 1 ─ Data points lie at the edges of the 608 dimensional unit hypercube  Take each cluster at a time and find the centroid ─ Values of each of the 608 attributes lies in [0,1] ─ Value close to 1 means occurred in a large number of data points of the cluster and vice versa ─ Get the attributes which are greater than 0.8 ● appeared in the description of over 80% of vulnerabilities in the cluster ─ e.g., in one cluster ● Denial, Service –Represent DoS attacks  We get the centroids ─ Dominant keywords represent type cluster

43 ICSE 2012, Zürich Clustering: Number of clusters  No universal way of determining exact number of clusters  Visualize the dendrogram ─ Decide appropriate number of clusters

44 ICSE 2012, Zürich Hierarchical Clustering SQLMiscXSSEXE DoSMisc BOCEXE MiscPHPPHP EXEC- EX E LocalMisc A- EXEA- EXE EXE US-EXEUS-EXE BOA-BOA-BO CEXEBO SQL MiscPHPPHP DoS XSS

45 ICSE 2012, Zürich Clustering: Remaining Samples  This analysis was on 1 sample  Did the same analysis on remaining 9 samples  Centroids obtained from all 10 samples are shown next

46 ICSE 2012, Zürich Clustering: Intensity Plot of Proximity Matrix

47 ICSE 2012, Zürich Final Clustering  We have all 7 centroids ─ Assign each of points to nearest centroid ─ Sizes of each cluster after assigning points PHPSQLBOXSSEXEDoSMisc 8.32%11.2%10.2%12.3%7.25%14.2%36.6%

48 ICSE 2012, Zürich Post Processing  Evolution of different types of vulnerabilities  Evolution for different types in vendors  Evolution of exploitation behavior of hackers  Evolution of patching behavior of vendors