Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

Slides:



Advertisements
Similar presentations
The Mathematics of Information Retrieval 11/21/2005 Presented by Jeremy Chapman, Grant Gelven and Ben Lakin.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Advanced Data Structures
Formal Methods in Software Engineering Credit Hours: 3+0 By: Qaisar Javaid Assistant Professor Formal Methods in Software Engineering1.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
CORBA - Common Object Request Broker Architecture.
Lecture 19: Internet Intro to IT COSC1078 Introduction to Information Technology Lecture 19 Internet James Harland
EE442—Multimedia Networking Jane Dong California State University, Los Angeles.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Shirokuro : A Backtracking Approach Benjamin Bush Faculty Advisors: Dr. Russ Abbott, Dr. Gary Brookfield Department of Computer Science, Department of.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
CIS101 Introduction to Computing Week 11. Agenda Your questions Copy and Paste Assignment Practice Test JavaScript: Functions and Selection Lesson 06,
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Establishing Pairwise Keys in Distributed Sensor Networks Donggang Liu, Peng Ning Jason Buckingham CSCI 7143: Secure Sensor Networks October 12, 2004.
SNMP & MIME Rizwan Rehman, CCS, DU. Basic tasks that fall under this category are: What is Network Management? Fault Management Dealing with problems.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Data Mining Techniques
Modeling, Searching, and Explaining Abnormal Instances in Multi-Relational Networks Chapter 1. Introduction Speaker: Cheng-Te Li
A Taxonomy of Network and Computer Attacks Simon Hansman & Ray Hunt Computers & Security (2005) Present by Mike Hsiao, S. Hansman and R. Hunt,
Introduction to the Mobile Security (MD)  Chaitanya Nettem  Rawad Habib  2015.
BLUEPRINT: Robust Prevention of Cross-site Scripting Attacks for Existing Browsers Mike Ter Louw, V.N. Venkatakrishnan University of Illinois at Chicago.
Design of a Collaborative System Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University, U.S.A
Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt In ACM CCS’05.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
SINGULAR VALUE DECOMPOSITION (SVD)
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Dynamic Data Structures and Generics Chapter 10. Outline Vectors Linked Data Structures Introduction to Generics.
The Software Development Process
DETECTING TARGETED ATTACKS USING SHADOW HONEYPOTS AUTHORS: K. G. Anagnostakisy, S. Sidiroglouz, P. Akritidis, K. Xinidis, E. Markatos, A. D. Keromytisz.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Real-Time Cyber Physical Systems Application on MobilityFirst Winlab Summer Internship 2015 Karthikeyan Ganesan, Wuyang Zhang, Zihong Zheng Shantanu Ghosh,
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu.
Semantic Phyloinformatic Web Services Using the EvoInfo Stack Speaker: John Harney LSDIS Lab, Dept. of Computer Science, University of Georgia Mentor(s):
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
M1G Introduction to Programming 2 3. Creating Classes: Room and Item.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Web Services An Introduction Copyright © Curt Hill.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Martin Kruliš by Martin Kruliš (v1.1)1.
Security Knowledge Should be Embedded Inside the Protocol RFCs The corresponding implementations should come out robust even if the implementers blindly.
Aspect Oriented Security Tim Hollebeek, Ph.D.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
SSE3 Knowledge mangement concepts 1. Agenda What is knowledge management Classification of knowledge Knowledge management process Common/shared information.
Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad.
INTRODUCTION TO ONLINE FACILITATION- DAY TWO Anna N Perry.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt Cyber Defense.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Singular Value Decomposition and its applications
WEB SERVICES.
XML in Web Technologies
Overview What is Multimedia? Characteristics of multimedia
Presentation transcript:

Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012) Outstanding paper award Generalized Vulnerability Extrapolation using Abstract Syntax Trees

Outline Introduction Vulnerability Extrapolation Evaluation Limitations 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 2

Introduction 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 3 The discovery of vulnerabilities in source code is a central issue of computer security. Many of these researches, however, are limited to specific conditions and types of vulnerabilities. The discovery of vulnerabilities in practice still mainly rests on tedious manual auditing that requires considerable time and expertise. Instead of striving for an automated solution, we aim at rendering manual auditing more effective by guiding the search for vulnerabilities.

Contributions 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 4 Generalized vulnerability extrapolation Structural comparison of code Evaluation and cases studies

Vulnerability Extrapolation 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 5 The concept of vulnerability extrapolation builds on the observation that source code often contains several vulnerabilities linked to the same flawed programming patterns. Given a known vulnerability, it is thus often possible to discover previously unknown vulnerabilities by finding functions sharing similar code structure. 2 advantages of this approach: It is a general approach that is not limited to any specific vulnerability type. The extrapolation does not hinge on any involved analysis machinery.

Schematic Overview 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 6

Robust AST Extraction 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 7 Our parser is based on a single grammar definition for the ANTLR parser generator [23] and publicly available. [link]link API node Syntax node

Embedding of ASTs in a Vector Space 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 8 We describe the AST of each functions in our code base using a set of subtrees S. We experiment with the following three definitions of the set: API nodes The set S simply consists of all individual API nodes. API subtrees The set S is defined as all subtrees of depth D in the code base that contain at least one API node. API/S subtrees The set S consists of all subtrees of depth D containing at least one API or syntax node. In the following we fix the depth of subtrees to D = 3.

Converting ASTs to Vectors 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 9 Function 1 Function 2 Function |X| M = 0*00*000*00*00... |S| |X| W s : TF-IDF weighting [link]link

Identification of Structural Patterns 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 10 However, we cannot yet compare functions with respect to more involved patterns. For example, the code base of a server application may contain functions related to network communication, message parsing and thread scheduling. It would be better to compare the functions with respect to these functionalities rather than looking at the plain subtrees of the ASTs. Latent semantic analysis is a classic technique of natural language processing (NLP) that is used for identifying topics in text documents. [link]link It determines dominant directions in the vector space. We refer to these directions of related subtrees as structural patterns.

Obtaining Directions 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 11 We obtain these d directions is by performing a singular value decomposition (SVD) of the matrix M. [link]link

Extrapolation of Vulnerabilities 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 12 Three activities can be performed to assist code auditing. Vulnerability extrapolation Finding structurally similar functions is thus as simple as comparing the rows of V using a suitable measure, such as the cosine distance [link].link Code base decomposition the matrix U storing the most prevalent structural patterns in its columns gives important insight into the structure of the code base. Detection of unusual functions

Evaluation 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 13 For the evaluation we consider 4 popular open-source projects. LibTIFF [link] is a library for processing images in the TIFF format.link 1,292 functions and 52,650 lines of code Version of the library contains a stack-based buffer overflow in the parsing of TLV. (CVE [link])link Candidate functions are all parsers for TLV elements.

Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 14 Pidgin [link] is a client for instant messaging implementing several communication protocols.link 11,505 functions and 272,866 lines of code. Version of the client contains a vulnerability in the implementation of the AIM protocol (CVE [link]).link Candidate functions are all AIM protocol handlers converting incoming binary messages to strings.

Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 15 FFmpeg [link] is a library for conversion of audio and video streams.link 6,941 functions with a total of 298,723 lines of code During the decoding of video frames in version 0.6, indices are incorrectly computed (CVE [link]).link Candidate functions are all video decoding routines, which write decoded video frames to a pixel buffer.

Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 16 Asterisk [link] is a framework for Voice-over-IP communication.link 8,155 functions and 283,883 lines of code Version of the framework contains a vulnerability (CVE [link]), which allows a remote attacker to corrupt memory of the server via a crafted packet.link Candidate functions are all functions reading incoming packets from UDP/TCP sockets. We thoroughly inspect each code base and manually label all candidate functions, that is, all functions that potentially contain the same vulnerability. This manual analysis process required several weeks of work.

Quantitative Evaluation 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 17 The number of extracted structural patterns is not a critical parameter for vulnerability extrapolation. In the following case studies, we fix this parameter to 70.

Quantitative Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 18

Qualitative Evaluation (Case Study) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 19 In a case study with FFmpeg and Pidgin, we now demonstrate the practical merit of vulnerability extrapolation and show how our method plays the key role in identifying 8 zero-day vulnerabilities. We have conducted two further studies with Pidgin and Asterisk uncovering 2 more zero- day vulnerabilities. For the sake of brevity however, we omit these case studies here.

Case Study: FFmpeg 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 20 CVE further vulnerabilities 2 of which were zero-day * *

Case Study: FFmpeg 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 21

Case Study: Pidgin 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 22 CVE further vulnerabilities Six of which were zero-day

Case Study: Pidgin 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 23

Limitations 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 24 Only identifying potentially vulnerable code Due to Rice’s theorem [link], however, a generic discovery of vulnerabilities is impossible anyway.link The existence of a starting vulnerability Complex flaws that span several functions across a code base can be difficult to detect for our method.

Q & A 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 25