Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.

Slides:



Advertisements
Similar presentations
Static Analysis for Security
Advertisements

C Language.
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 7: User-Defined Functions II.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 7: User-Defined Functions II.
Chapter 7: User-Defined Functions II
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 7: User-Defined Functions II.
Detecting Format String Vulnerabilities with Type Qualifier Umesh Shankar, Kunal Talwar, Jeffrey S. Foster, David Wanger University of California at Berkeley.
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
1 Chapter 7 User-Defined Methods Java Programming from Thomson Course Tech, adopted by kcluk.
Visualizing Type Qualifier Inference with Eclipse David Greenfieldboyce Jeffrey S. Foster University of Maryland.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
Chapter 6 C Arrays Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc. Arrays are data structures.
CSE S. Tanimoto Syntax and Types 1 Representation, Syntax, Paradigms, Types Representation Formal Syntax Paradigms Data Types Type Inference.
Java Programming: From Problem Analysis to Program Design, 4e Chapter 7 User-Defined Methods.
Chapter 7: User-Defined Methods
Guide To UNIX Using Linux Third Edition
CQual: A Tool for Adding Type Qualifiers to C Jeff Foster et al UC Berkeley OSQ Retreat, May
Extending Type Systems in a Library Yuriy Solodkyy Jaakko Järvi Esam Mlaih.
C++ fundamentals.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Attribute Grammars They extend context-free grammars to give parameters to non-terminals, have rules to combine attributes Attributes can have any type,
INF5110: Mandatory Exercise 2 Eyvind W. Axelsen @eyvindwa Slides are partly based on.
CSE 341, S. Tanimoto Concepts 1- 1 Programming Language Concepts Formal Syntax Paradigms Data Types Polymorphism.
COP4020 Programming Languages
Semantic Analysis (Generating An AST) CS 471 September 26, 2007.
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
The Java Programming Language
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS 112 Department of Computer Science George Mason University CS 112 Department of Computer Science George Mason University Final Review Lecture 14.
1 C++ Syntax and Semantics, and the Program Development Process.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
More on Hierarchies 1. When an object of a subclass is instantiated, is memory allocated for only the data members of the subclass or also for the members.
Overflow Examples 01/13/2012. ACKNOWLEDGEMENTS These slides where compiled from the Malware and Software Vulnerabilities class taught by Dr Cliff Zou.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
1 Review. 2 Creating a Runnable Program  What is the function of the compiler?  What is the function of the linker?  Java doesn't have a linker. If.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Introduction to c++ programming - object oriented programming concepts - Structured Vs OOP. Classes and objects - class definition - Objects - class scope.
Programming Languages
Topic 3: C Basics CSE 30: Computer Organization and Systems Programming Winter 2011 Prof. Ryan Kastner Dept. of Computer Science and Engineering University.
Slides created by: Professor Ian G. Harris Hello World #include main() { printf(“Hello, world.\n”); }  #include is a compiler directive to include (concatenate)
A FIRST BOOK OF C++ CHAPTER 6 MODULARITY USING FUNCTIONS.
User Defined Methods Methods are used to divide complicated programs into manageable pieces. There are predefined methods (methods that are already provided.
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Chapter 11: Advanced Inheritance Concepts. Objectives Create and use abstract classes Use dynamic method binding Create arrays of subclass objects Use.
Embedding Assembly Code in C Programs תרגול 7 שילוב קוד אסמבלי בקוד C.
Language Implementation Overview John Keyser Spring 2016.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
Chapter 10 Chapter 10 Implementing Subprograms. Implementing Subprograms  The subprogram call and return operations are together called subprogram linkage.
CPSC 252 ADTs and C++ Classes Page 1 Abstract data types (ADTs) An abstract data type is a user-defined data type that has: private data hidden inside.
Lecture 9 Symbol Table and Attributed Grammars
Introduction to Compiler Construction
Names and Attributes Names are a key programming language feature
Constructing Precedence Table
CS 326 Programming Languages, Concepts and Implementation
Semantic Analysis with Emphasis on Name Analysis
Representation, Syntax, Paradigms, Types
CS 536 / Fall 2017 Introduction to programming languages and compilers
Inheritance Often, software encapsulates multiple concepts for which some attributes/behaviors overlap E.g. A computer (role-playing) game has: Monsters:
Basic Program Analysis: AST
CSE 3302 Programming Languages
Representation, Syntax, Paradigms, Types
Representation, Syntax, Paradigms, Types
Representation, Syntax, Paradigms, Types
Semantic Type Qualifiers
ENERGY 211 / CME 211 Lecture 8 October 8, 2008.
Presentation transcript:

Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Goals Build extensible infrastructure to Find certain categories of bugs –Exhaustively, within some constraints At compile time In real-world C and C++ programs Using composable analyses

Components Elkhound: Generalized LR Parser Generator Elsa: C++ Parser Oink: Whole-program dataflow Cqual++: Type qualifier analysis

Elkhound: GLR Parser Generator GLR eliminates the pain of LALR(1) –Unbounded lookahead –Allows ambiguous grammars! 10x faster than other GLR implementations –Novel combination of GLR and LALR(1) User-defined disambiguation –Early: during parsing –Late: after generating AST w/ambiguities

Example: ‘>’ ambiguity new C + 4 > + 5 ; Expr Type Expr Type

Example: ‘>’ ambiguity new C + 4 > + 5 ; Expr Type Expr Type unparenthesized ‘ > ’ symbol Correct Incorrect

Example: Type vs. Variable In C & C++, sometimes hard to tell whether a name refers to a type or a variable (a) & (b) Expr TypeExpr or

Example: Type vs. Variable In C & C++, sometimes hard to tell whether a name refers to a type or a variable int a; // hidden class C { int f(int b) { return (a) & (b); } typedef int a; // visible };

Elsa: Extensible C++ Front-end Parses ANSI C++ with GNU extensions Uses GLR to handle the ambiguities Extensible components: –flex lexer –Elkhound parser –AST defined with custom tool –Type checker

The Elsa Block Diagram Lexer preproc’d source Parser token stream Type Checker possibly ambiguous AST Post Process annotated unambiguous AST final AST No lexer feedback hack!

Extending the Syntax ANSI or GNU? Both! –Declarative language –Extend simply by concatenating nonterm ConditionalExp { -> Exp {...} -> Exp "?" Exp ":" Exp {...} } ANSI Base: nonterm ConditionalExp { -> Exp "?" ":" Exp {...} } GNU Extension:

Declarative Abstract Syntax class Statement (SourceLoc loc) { -> S_compound(ASTList stmts); -> S_if(Condition cond, Statement thenBranch, Statement elseBranch); -> S_while(Condition cond, Statement body); //... } superclass name superclass ctor parameter subclass names subclass ctor parameter subclass ctor list parameter

Extending the Abstract Syntax ANSI or GNU? Both! –Declarative language –Extend simply by concatenating ANSI Base:GNU Extension: class Statement { -> S_decl(Declaration decl); -> S_expr(Expression expr); -> S_if(...); -> S_for(...);  } class Statement { -> S_function(Function f); } GNU nested functions

Semantic Analysis Disambiguate Compute types Resolve overloading Insert implicit conversions Instantiate templates

Disambiguation Ambiguous syntax example: return (x)(y); S_return E_cast TypeId x E_funCall E_variable y ambiguity link expr typefuncarg

Lowered Output: Simplified C++ Original or Lowered output can be printed Lowering always done: –Templates are instantiated –Implicit type conversions inserted Lowering optionally done: –Implicit member functions created –Implicit ctor/dtor calls inserted

C++ or XML, In and Out Elsa C++ XML C++ XML First pass renders to a canonical form. Serialization commutes with lowering.

Cqual++: Dataflow Dataflow Analysis on Type Qualifiers Successor to Cqual: Jeff Foster, Alex Aiken char $tainted *getenv(); void printf(char $untainted *fmt,...); int main() { char *x = getenv(“foo”)); printf(x); }

Feature: Polymorphic Dataflow int f(int x) {return x;} int main() { int $tainted t =...; int a = f(t); int $untainted u = f(3); }

Feature: “Funky Qualifiers”: Fake Function Bodies char $_1_2 *strcat(char $_1_2 *dest, const char $_1 *src); int main() { char $tainted *x; char $untainted *y; strcat(y, x); } {1} ½ {1,2}

Feature: Separate Compilation for Scalability “Compile” each file to a dataflow graph –only flow behavior between external symbols matters –compress by finding smaller graph with same flow behavior; typically saves factor of 12 “Link” each graph –AST is gone at linking so we save even more space

Non-Feature: Cqual++ Is Not Flow-Sensitive q = p;... time passes... p->s = read_from_network(); use_in_untrusting_way(p->s); // does p == q still?? q->s = "innocuous"; use_in_trusting_way(p->s); $tainted??

What Exactly Is ‘Data-Flow’? char *launderString(char *in) { int len = strlen(in); char *out = malloc(len+1); for (int i=0; i<len; ++i) { out[i] = 0; for (int j=0; j<8; ++j) if (in[i] & (1<<j)) out[i] |= (1<<j); } out[len] = '\0'; return out; }

Application: Finding Format- String Vulnerabilities Printf() is an interpreter the format string is a program –%n writes number of bytes written to memory pointed to by the arg –ex: printf(“stuff%n”, p) means *p = 5 if no argument p, printf() writes through some pointer on the stack –do not allow untrusted data in first arg to printf

Application: Finding User-Kernel Vulnerabilities Kernel must check user pointers are valid –must point to memory mapped into user process’s address space –otherwise could manipulate the kernel data This is also a dataflow/taint analysis

Rob’s Cqual Linux User-Kernel Results , full config, 7 bugs, 275 false pos , full config, 6 bugs, 264 false pos. including other trials on same kernels: –found 17 different security vulnerabilites –found bugs missed by other tools and manually –all but one bug confirmed exploitable –significant “bug churn” across kernel versions

Linus’s “Sparse” Tool for User-Kernel Vulnerabilities Linus also has a tool using type qualifiers –it requires manual annotation of every var In contrast, Cqual++ infers the qualifiers –only sources and sinks need be annotated –and any “sanitizer” functions: Linus says this “is not the C way” –ok, he can write all the annotations

Future Application: Finding Character-Set Confusions Microsoft confusing ASCII and UCS2 Mozilla has 20-ish differnt charcter sets they should only flow together through conversion functions if array sizes differ, confusions can be a security hole too

Oink Vision: Composable Analysis Tools Compilers refuse to compile bugs –well, some classes of bugs –and you may have to wait until tomorrow morning to find out Correctness analysis is expected as part of any compiler toolchain The analyses are composable and extensible