Checking System Rules Using System-Specific Programmer Written Compiler Extensions Dawson Engler, Benjamin Chelf, Andy Chou and Seth Hallem Presented by:

Slides:



Advertisements
Similar presentations
Semantics Static semantics Dynamic semantics attribute grammars
Advertisements

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Programming Languages and Paradigms The C Programming Language.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
Program Representations. Representing programs Goals.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Overview of Previous Lesson(s) Over View 3  Debugger  A computer program that is used to test and debug other programs.  Local Debugging  Debugging.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
1 Pointers A pointer variable holds an address We may add or subtract an integer to get a different address. Adding an integer k to a pointer p with base.
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Aliases in a bug finding tool Benjamin Chelf Seth Hallem June 5 th, 2002.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
1 Pertemuan 20 Run-Time Environment Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Checking System Rules Using System-Specific, Programmer- Written Compiler Extensions Dawson Engler, Benjamin Chelf, Andy Chow, Seth Hallem Computer Systems.
Chapter 10 Recursion. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Explain the underlying concepts of recursion.
Run time vs. Compile time
Catriel Beeri Pls/Winter 2004/5 environment 68  Some details of implementation As part of / extension of type-checking: Each declaration d(x) associated.
MULTIVIE W Checking System Rules Using System-Specific, Program-Written Compiler Extensions Paper: Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
“A System and Language for Building System-Specific, Static Analyses” CMSC 631 – Fall 2003 Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler (presented.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Gaurav S. Kc, 1 MC: Meta-level Compilation Extending the Process of Code Compilation with Application-Specific Information.
Subclasses and Subtypes CMPS Subclasses and Subtypes A class is a subclass if it has been built using inheritance. ▫ It says nothing about the meaning.
Precision Going back to constant prop, in what cases would we lose precision?
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Data Structures Week 5 Further Data Structures The story so far  We understand the notion of an abstract data type.  Saw some fundamental operations.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Inferring and checking system rules by static analysis William R Wright.
Compiler Construction
Richard Mancusi - CSCI 297 Static Analysis and Modeling Tools which allows further checking of software systems.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Static Code Checking: Security and Concurrency Ben Watson The George Washington University CS 297 Security and Programming Languages June 9, 2005.
Pointers OVERVIEW.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
1 Homework HW5 due today Review a lot of things about allocation of storage that may not have been clear when we covered them in our initial pass Introduction.
COMP3190: Principle of Programming Languages
1 Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions Dawson Engler Benjamin Chelf Andy Chou Seth Hallem Stanford University.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
1 Splint: A Static Memory Leakage tool Presented By: Krishna Balasubramanian.
Bindings. Typical stages of program execution First, the program is compiled Then everything is loaded into memory Then it is linked to any library routines.
CMSC 202 Advanced Section Classes and Objects: Object Creation and Constructors.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Engineering Classes. Objectives At the conclusion of this lesson, students should be able to: Explain why it is important to correctly manage dynamically.
“ A Simple Method for Extracting Models From Protocol Code ” David Lie, Andy Chou, Dawson Engler and David D. Dill Presented by Anna Zamansky.
/ PSWLAB Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions by D. Engler, B. Chelf, A. Chou, S. Hallem published.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Adding Concurrency to a Programming Language Peter A. Buhr and Glen Ditchfield USENIX C++ Technical Conference, Portland, Oregon, U. S. A., August 1992.
Memory Management.
Source Analysis for Security
YAHMD - Yet Another Heap Memory Debugger
Semantic Analysis with Emphasis on Name Analysis
Harry Xu University of California, Irvine & Microsoft Research
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Java Software Structures: John Lewis & Joseph Chase
This pointer, Dynamic memory allocation, Constructors and Destructor
Binding Times Binding is an association between two things Examples:
SPL – PS3 C++ Classes.
SPL – PS2 C++ Memory Handling.
CMSC 202 Constructors Version 9/10.
Presentation transcript:

Checking System Rules Using System-Specific Programmer Written Compiler Extensions Dawson Engler, Benjamin Chelf, Andy Chou and Seth Hallem Presented by: Erez Louidor

Traditional Methods for Correctness Checking Formal verification –Models are difficult and costly to construct. –Models need maintenance. –Sometimes suffer from over-simplifications. Can be used to rule out deviant behaviors.

Traditional Methods for Correctness Checking (cont.) Testing Dynamic - The number of execution paths grows exponentially with code size: -Thorough testing requires writing many tests -Thorough testing costs a lot of time. - Requires running the tested code Can be used to check properties that are very difficult to check by other (static) means. Real-time constraints.

Traditional Methods for Correctness Checking (cont.) Manual inspection (code review) –Error prone. –Impractical to perform thoroughly in large systems. Usually done only on “Critical” sub-systems. Specifying the property to check is usually easy.

Static Analysis with Meta Compilation Compilers can be used to check general restrictions statically: –“A function can be called only with the parameter types it was declared.” –“A function cannot change a ‘const’ object.” Programming languages are usually too general to express system-specific restrictions: –“A function that returns with an error must free the resources it acquired.” –“Check every user-supplied pointer for validity before dereferencing it.” –“A blocking function must not be called when interrupts are disabled.”

Static Analysis with Meta Compilation (cont.) Solution: extend the compiler, to check system-specific rules. –Use a meta-language to write compiler extension, and a meta-compiler to compile the extension.

Static Analysis with Meta Compilation (cont.) Compilers work with the code itself: No need to construct and maintain a model. Static: Can examine all execution paths. Does not require running the code. -Some properties are very hard/impossible to check. Scales well to large systems. – Can be used to find bugs, not to guarantee their absence. – Can produce many false warnings.

A Meta Compilation System Extensions are written in Metal –A high-level language based on the state-machine abstraction. Metal extensions are compiled with the metal compiler. The compiled extensions are dynamically linked into xg++, a C++ compiler built on top of g++. The system is “still under development” and is not publicly available.

Example 1: A Metal Interrupt Checker. sm check_interrupts { // Variables used in patterns decl { unsigned } flags; // Patterns to specify enable/disable functions. pat enable = { sti(); } | { restore_flags(flags); }; pat disable = { cli(); }; // States. The first state is the initial state. is_enabled: disable  is_disabled | enable  { err(“double enable”}; }; is_disabled: enable  is_enabled | disable  { err(“double disable”}; } $end_of_path$  { err(“exiting with interrupts enabled!”);}; }

Metal Overview A Metal program describes an extended state- machine by transition rules: –Each transition rule specifies: A source state. A pattern. An optional action - specified as C code. An optional destination state. –If any code matches the pattern, and the current state is the source state, metal executes the action, and updates the current state By default, this state-machine is applied down every execution path.

Example 1: A Metal Interrupt Checker (cont.). /* From Linux kernel drivers/block/raid5.c */ static struct buffer_head* get_free_buffer(struct stripe_head* sh, int b_size) { stuct buffer_head* bh; unsigned long flags; save_flags(flags); cli(); if ((bh=sh->buffer_pool)==NULL) return NULL; sh->buffer_pool = bh->b_next; bh->b_size = b_size; restore_flags(flags); return bh; } An extended version of this checker found 82 bugs in the Linux kernel code.

Example 2: A static memory allocation checker sm null_checker { decl {scalar} sz; decl {const int} retv; decl {any_ptr} v1; state decl { any_ptr } v; start, v.all: {((v=(any)malloc(sz)) == 0)  true=v.null, false=v.not_null | {((v=(any)malloc(sz)) != 0)  true=v.not_null, false=v.null | {((v =(any)malloc(sz)}  v.unknown; v.unknown, v.null, v.not_null: { (v==0) }  true=v.null, false=v.not_null | { (v != 0) }  true=v.not_null, false=v.null; v.unknown, v.not_null: {return retv; }  {if (mgk_int_cst(retv) < 0) err(“Error path leak!”);};

Example 2: A static memory allocation checker (cont.) v.null, v.unknown: {*(any *)v }  {err(“Using pointer illegally!”); }; v.unknown, v.null, v.not_null: { free(v); }  v.freed; v.freed: { free(v) }  { err(“Dup free!”); } | { v }  { err(“Use-after-free!”); }; v.all: { v=v1 }  v.stop; }

Example 2: A static memory allocation checker (cont.) An extended version of this checker found 132 errors in the Linux kernel code: 61 of which were false warnings: The checker does not handle variable copies. It does not detect when a clean-up routine would free the memory. /* Drivers/isdn/pcbit:pcbit_init_dev */ … free(dev); iounmap((unsigned char*)dev->sh_mem); release_mem_region(dev->ph_mem, 4096); …

The Analysis Algorithm The algorithm traverses all paths in the source code flow graph. The algorithm’s state is a list of state-values: –A global state-value ‘start’ in the previous example. –A set of variable-instance state-values: Each variable instance is bound to a different program-object, e.g. there may be several v’s, each bound to a different pointer. Binding a state-value to a variable, adds an instance-state to the set. (e.g. setting v’s state-value to ‘not_null’). Binding the special state-value ‘stop’ to a variable-instance, removes that instance from the set of instance-variable states, and stops tracking that program object.

The Analysis Algorithm (cont.) At each node in the graph, and for each transition rule, the algorithm iterates on all the state-values in the current state, looking for an applicable transition. For each node, the algorithm keeps a list of the states in which it visited that node. Upon reaching a node with an already visited state the algorithm back-tracks. –Deals with loops (assuming the SM has a finite number of states). –Speeds up the analysis by pruning redundant paths. –Assumes the SM is deterministic.

Example 3: Assertions Metal extensions can also operate in a linear flow- insensitive mode. sm Assert flow_insensitive { decl {any} expr, x, y, z; decl {any_call} any_fcall; decl {any_args} args; // Find all assert calls. Then apply SM to “expr” in state “in_assert” start: { assert(expr); }  {mgk_expr_recurse(expr, in_assert); }; // Find all side-effects. A function call is considered a side-effect. in_assert: { any_fcall(args) }  {err(“function call”); } | { x = y }  { err(“assignment”); } // ‘=‘ also matches ‘+=‘, ‘-=‘, etc. | {z++}  { err(“post-increment”); } | {--z}  { err (“pre-decrement”); }; // We should also check for ++z, z--. }

Example 3: Assertions (cont.) The authors also wrote a flow sensitive metal static assertion checker, that tries to find false assertions in compile-time. –Hindered by the primitive dataflow analysis of xg++. –Found 5 errors in the FLASH cache-coherence code.

Possible Restrictions Using similar checkers it is possible to check restrictions of the following forms: –Never/Always do X Never use floating-point in the kernel. Never allocate more than xxx bytes on the stack. Always allocate as much storage as an object needs –Never/Always do X before/after Y Check user pointers before using them in the kernel. –In situation X do (not do) Y If interrupts are disabled, do not block.

Global Analysis So far, every checker used local analysis: –The analysis was confined to the function scope. Many system rules, are context dependent, and apply globally across function boundaries: –“A blocking function must not be called when interrupts are disabled.”

Global Analysis (cont.) The xg++ system described in the paper does not support global analysis well. To check the above rule in the Linux kernel, the authors used xg++ to compute a list of all possible blocking functions: –A function is possibly-blocking if it either blocks directly or starts a call-chain containing a function that blocks directly. They then applied a metal local-analysis extension to check for every function that it doesn’t call a possibly blocking function in an interrupt-disabled mode. This method found 123 bugs, and produced 8 false warnings.

Optimizing with Meta- Compilation Domain-specific static-analysis can be used for finding optimization opportunities: –The optimizers built into compilers are inherently general, and therefore may be too conservative for a specific system. –Some optimizations are system-specific. The authors used metal extensions to find hundreds of optimization opportunities in the FLASH machine cache coherence code.