1 Datalog Applied to Software Security Vulnerabilities Datalog Data-Flow Analysis.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

JavaScript I. JavaScript is an object oriented programming language used to add interactivity to web pages. Different from Java, even though bears some.
Relational Calculus and Datalog
Interprocedural Analysis. Currently, we only perform data-flow analysis on procedures one at a time. Such analyses are called intraprocedural analyses.
1 Extended Conjunctive Queries Unions Arithmetic Negation.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Programming Languages and Paradigms
Procedures and Control Flow CS351 – Programming Paradigms.
1 Everything Else About Data Flow Analysis Flow- and Context-Sensitivity Logical Representation Pointer Analysis Interprocedural Analysis.
Ben Livshits Based in part of Stanford class slides from
Type Checking Compiler Design Lecture (02/25/98) Computer Science Rensselaer Polytechnic.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Stack-Based Buffer Overflows Attacker – Can take over a system remotely across a network. local malicious users – To elevate their privileges and gain.
CS 106 Introduction to Computer Science I 11 / 09 / 2007 Instructor: Michael Eckmann.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Chapter 3 An Introduction to Relational Databases.
Run-Time Storage Organization
CS 201 Functions Debzani Deb.
Functional programming: LISP Originally developed for symbolic computing First interactive, interpreted language Dynamic typing: values have types, variables.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Chapter 12 Pointers and linked structures. 2 Introduction  The data structures that expand or contract as required during the program execution is called.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
DEDUCTIVE DATABASE.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
Formal Models of Computation Part II The Logic Model
Chapter 3 An Introduction to Relational Databases.
Attacking Applications: SQL Injection & Buffer Overflows.
Compiler Construction
Copyright © 2005 Elsevier Chapter 8 :: Subroutines and Control Abstraction Programming Language Pragmatics Michael L. Scott.
Basic Semantics Associating meaning with language entities.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Procedure Basics Computer Organization I 1 October 2009 © McQuain, Feng & Ribbens Procedure Support From previous study of high-level languages,
COMPUTER PROGRAMMING. Iteration structures (loops) There may be a situation when you need to execute a block of code several number of times. In general,
Programming Languages and Paradigms Imperative Programming.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.
Functions. Motivation What is a function? A function is a self-contained unit of program code designed to accomplish a particular task. We already used.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
1 Program Development  The creation of software involves four basic activities: establishing the requirements creating a design implementing the code.
Sairajiv Burugapalli. This chapter covers three main categories of classic software vulnerability: Buffer overflows Integer vulnerabilities Format string.
Information Security - 2. A Stack Frame. Pushed to stack on function CALL The return address is copied to the CPU Instruction Pointer when the function.
Security Attacks Tanenbaum & Bo, Modern Operating Systems:4th ed., (c) 2013 Prentice-Hall, Inc. All rights reserved.
FUNCTIONS. Midterm questions (1-10) review 1. Every line in a C program should end with a semicolon. 2. In C language lowercase letters are significant.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
VM: Chapter 7 Buffer Overflows. csci5233 computer security & integrity (VM: Ch. 7) 2 Outline Impact of buffer overflows What is a buffer overflow? Types.
Chapter 3 An Introduction to Relational Databases.
CSE 332: C++ Exceptions Motivation for C++ Exceptions Void Number:: operator/= (const double denom) { if (denom == 0.0) { // what to do here? } m_value.
Design issues for Object-Oriented Languages
CS314 – Section 5 Recitation 9
User-Written Functions
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
CMSC 414 Computer and Network Security Lecture 21
Semantics of Datalog With Negation
6.001 SICP Data abstractions
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Introduction to C++ Programming
Format String.
Chapter 13 Security Methods Part 3.
What would be our focus ? Geometry deals with Declarative or “What is” knowledge. Computer Science deals with Imperative or “How to” knowledge 2/23/2019.
Logic Based Query Languages
Announcements Quiz 5 HW6 due October 23
Datalog Inspired by the impedance mismatch in relational databases.
Course Overview PART I: overview material PART II: inside a compiler
Classes, Objects and Methods
Introduction to Programming
Presentation transcript:

1 Datalog Applied to Software Security Vulnerabilities Datalog Data-Flow Analysis

2 The Story uI long thought that Datalog was a dead issue. uMy colleague Monica Lam discovered Datalog and has been using it to write efficient algorithms for important data- flow questions. uSo Datalog gets new life, but not as a database query language.

3 Outline 1.Vulnerabilities. wBuffer overflow. wSQL injection. 2.Datalog --- review. 3.Andersen’s pointer analysis. wExplanation. wExpression in Datalog.

4 Buffer Overflow uUser provides a string that is later copied to a buffer of fixed length. uBuffer is located on the run-time stack, so it is near the place where the return address is held. uString is so long it overwrites the return address. uReturn actually goes to intruder’s code.

5 The Data-Flow Issue  Suppose we encounter strcpy(b,a) in a program. uIs a a string that was obtained by reading input? uHas some function already checked that a is not too long to fit in buffer b ? uInput and check could have occurred anywhere.

6 SQL Injection uSQL queries are often constructed by programs, e.g., via JDBC. uThese queries may take constants from user input. uCareless code can allow rather unsuspected queries to be constructed and executed.

7 Example: SQL Injection uRelation Accounts(name, passwd, acct). uWeb interface: get name and password from user, store in strings n and p, issue query, display account number. SELECT acct FROM Accounts WHERE name = :n AND passwd = :p

8 User (Who Is Not Yanni) Types Name: Password: Your account number is Yanni’ // who cares?

9 The Query Executed SELECT acct FROM Accounts WHERE name = ‘Yanni’ //’ AND passwd = ‘who cares?’ All treated as a comment

10 The Data-Flow Issue uIf a string is passed to JDBC or another query interface, did (part of) the string come from the user? uIf so, has it been checked for validity? uAgain --- the string could have come from anywhere in the program.

11 Outline 1.Vulnerabilities. wBuffer overflow. wSQL injection. 2.Datalog --- review. 3.Andersen’s pointer analysis. wExplanation. wExpression in Datalog.

12 Datalog --- (1) Atom = p(A,B,C) Literal = Atom or NOT Atom Rule = Atom :- Literal & … & Literal Predicate Arguments: variables or constants The body : For each assignment of values to variables that makes all these true … Make this atom true (the head ).

13 Example: Datalog Rules path(X,Y):-edge(X,Y) path(X,Y):-path(X,Z) & path(Z,Y) “There is a path from X to Y if there is an edge from X to Y, or if there is some Z such that there are paths from X to Z and from Z to Y.”

14 Datalog --- (2) uIntuition: subgoals in the body are combined by “and” (strictly speaking: “join”). uIntuition: Multiple rules for a predicate (head) are combined by “or.”

15 Datalog --- (3) uPredicates are implemented by relations. uEach tuple, or assignment of values to the arguments, represents a ground atom (true or false proposition).

16 EDB Vs. IDB Predicates uSome predicates are given as input data. wCalled EDB, or extensional database predicates. uOthers are defined by the rules only. wCalled IDB, or intensional database predicates.

17 Datalog for Program Analysis uInspect the code to produce the EDB. wExample: userDef(s,i) = “string s is assigned a user-supplied value at position i.” uCompute IDB relations that tell what you need to know about how values flow through the program. wExample: unsafe(x,i) = “string x at position i has a value that is derived from the user.”

18 Iterative Algorithm for Datalog uStart with the EDB predicates = “whatever the code dictates,” and with all IDB predicates empty. uRepeatedly examine the bodies of the rules, and see what new IDB facts can be discovered from the EDB and existing IDB facts.

19 Outline 1.Vulnerabilities. wBuffer overflow. wSQL injection. 2.Datalog --- review. 3.Andersen’s pointer analysis. wExplanation. wExpression in Datalog.

20 Andersen’s Pointer Analysis uComputes Java object references. uIncludes “where can this string have come from?’’ uCast of characters: 1.Local variables, which point to: 2.Heap objects, which may have fields that are references to other heap objects.

21 Representing Heap Objects uA heap object is named by the statement in which it is created. uNote many run-time objects may have the same name.  Example: h: T v = new T; says variable v can point to (one of) the heap object(s) created by statement h. vh

22 Other Relevant Statements  v.f = w makes the f field of the heap object h pointed to by v point to what variable w points to. v hg w i ff

23 Other Statements --- (2)  v = w.f makes v point to what the f field of the heap object h pointed to by w points to. v hg wi f

24 Other Statements --- (3)  v = w makes v point to whatever w points to. wInterprocedural Analysis : Also models copying an actual parameter to the corresponding formal or return value to a variable. v h w

25 EDB Relations uThe facts about the statements in the program and what they do to pointers are accumulated and placed in several EDB relations.  Example: there would be an EDB relation copy(To,From) whose tuples are the pairs (v,w) such that there is a copy statement v=w.

26 Convention for EDB uInstead of using EDB relations for the various statement forms, we shall simply use the quoted statement itself to stand for an atom derived from the statement. uExample: “v=w” stands for copy(v,w).

27 IDB Relations upts(V,H) will get the set of pairs (v,h) such that variable v can point to heap object h. uhpts(H1,F,H2) will get the set of triples (h,f,g) such that the field f of heap object h can point to heap object g.

28 Datalog Rules 1.pts(V,H) :- “H: V = new T” 2.pts(V,H) :- “V=W” & pts(W,H) 3.pts(V,H) :- “V=W.F” & pts(W,G) & hpts(G,F,H) 4.hpts(H,F,G) :- “V.F=W” & pts(V,H) & pts(W,G)

29 Example T p(T x) { h:T a = new T; a.f = x; return a; } void main() { g:T b = new T; b = p(b); b = b.f; }

30 Apply Rules Recursively --- Round 1 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g)

31 Apply Rules Recursively --- Round 2 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g) pts(b,h) pts(x,g)

32 Apply Rules Recursively --- Round 3 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g) pts(x,g) pts(b,h) hpts(h,f,g) pts(x,h)

33 Apply Rules Recursively --- Round 4 T p(T x) {h: T a = new T; a.f = x; return a;} void main() {g: T b = new T; b = p(b); b = b.f;} pts(a,h) pts(b,g) pts(x,g) pts(b,h) pts(x,h)hpts(h,f,g) hpts(h,f,h)

34 The Rest of the Story... uMore complex analysis adds arguments to predicates that distinguish: 1.Flow --- where in the code you are. 2.Context --- what is the sequence of functions that appear on the run-time stack.

35 Catching Security Holes uSince strings in Java are really reference variables, the analysis (with flow and context) lets us know exactly where a string that is used could have been created. uDangerous ones were read from the user.

36 Computational Complexity uAnalysis (with flow and context) for million-line programs is beyond the state of the art. uWhaley and Lam show efficiency enhancements from: 1.Seminaive (incremental) evaluation of Datalog. 2.Binary Decision Diagrams (BDD’s).

37 Research Suggestions 1.Scaling up to “Windows”-sized programs. 2.Handle weakly typed languages (e.g. C) where anything can be a pointer. 3.Framing defense against new attacks as data-flow problems.