Finding and Understanding Bugs in C Compilers Xuejun Yang Yang Chen Eric Eide John Regehr University of Utah.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Numerical Recipes The Art of Scientific Computing (with some applications in computational physics)
Undefined Behavior What happened to my code?
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
Chair of Software Engineering From Program slicing to Abstract Interpretation Dr. Manuel Oriol.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Kernighan/Ritchie: Kelley/Pohl:
Lecture # 20 Type Systems. 2 A type system defines a set of types and rules to assign types to programming language constructs Informal type system rules,
Technology from seed Weakest Precondition Synthesis for Compiler Optimizations Nuno Lopes and José Monteiro.
Technology from seed Automatic Synthesis of Weakest Preconditions for Compiler Optimizations Nuno Lopes Advisor: José Monteiro.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Ross Tate, Juan Chen, Chris Hawblitzel. Typed Assembly Languages Compilers are great but they make mistakes and can introduce vulnerabilities Typed assembly.
1 Pointers A pointer variable holds an address We may add or subtract an integer to get a different address. Adding an integer k to a pointer p with base.
Korey Breshears. Overview  What are automated security tools?  Why do we need them?  What types of tools are there?  What problems do these tools.
C Language Elements (II) H&K Chapter 2 Instructor – Gokcen Cilingir Cpt S 121 (June 22, 2011) Washington State University.
1 HOIST: A System for Automatically Deriving Static Analyzers for Embedded Systems John Regehr Alastair Reid School of Computing, University of Utah.
1 Efficient Memory Safety for TinyOS Nathan Cooprider Will Archer Eric Eide David Gay † John Regehr University of Utah School of Computing † Intel Research.
Automatically Proving the Correctness of Compiler Optimizations Sorin Lerner Todd Millstein Craig Chambers University of Washington.
1 TinyOS 2.1: Deploying Memory Safety Nathan Cooprider Yang Chen Will Archer Eric Eide David Gay † John Regehr University of Utah School of Computing †
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
1 Efficient Memory Safety for TinyOS 2.1 Yang Chen Nathan Cooprider Will Archer Eric Eide David Gay † John Regehr University of Utah School of Computing.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Approaches to Typing Programming Languages Robert Dewar.
A Static Analysis Framework For Embedded Systems Nathan Cooprider John Regehr's Embedded Systems Group.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Automatically Checking the Correctness of Program Analyses and Transformations.
CSC 107 – Programming For Science. Today’s Goal  Learn how arrays normally used in real programs  Why a function returning an array causes bugs  How.
What’s in an optimizing compiler?
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
1 Efficient Type and Memory Safety for Tiny Embedded Systems John Regehr Nathan Cooprider Will Archer Eric Eide University of Utah School of Computing.
Volatiles Are Miscompiled, and What to Do about It Eric Eide and John Regehr University of Utah EMSOFT 2008 / October 22, 2008.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
Advanced Computer Architecture Lab University of Michigan USENIX Security ’03 Slide 1 High Coverage Detection of Input-Related Security Faults Eric Larson.
A Certifying Compiler and Pointer Logic Zhaopeng Li Software Security Lab. Department of Computer Science and Technology, University of Science and Technology.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Automatic program generation for detecting vulnerabilities and errors in compilers and interpretersAutomatic program generation for detecting vulnerabilities.
Copyright 2005, The Ohio State University 1 Pointers, Dynamic Data, and Reference Types Review on Pointers Reference Variables Dynamic Memory Allocation.
The Fail-Safe C to Java translator Yuhki Kamijima (Tohoku Univ.)
Towards Beautiful Test Cases for Compiler Bugs John Regehr University of Utah.
Mathematics.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Provably Correct Peephole Optimizations with Alive.
Topic 3: C Basics CSE 30: Computer Organization and Systems Programming Winter 2011 Prof. Ryan Kastner Dept. of Computer Science and Engineering University.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
8.4 Use Scientific Notation Algebra. Scientific Notation Numbers such as 1,000,000, 153,000 and are written in standard form. Another way to write.
+ Arrays & Random number generator. + Introduction In addition to arrays and structures, C supports creation and manipulation of the following data structures:
1D Arrays and Random Numbers Artem A. Lenskiy, PhD May 26, 2014.
1 Chapter 15-1 Pointers, Dynamic Data, and Reference Types Dale/Weems.
CSE 332: C++ STL iterators What is an Iterator? An iterator must be able to do 2 main things –Point to the start of a range of elements (in a container)
Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken.
Conditionally Correct Superoptimization Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken (Stanford University)
Automatic program generation for detecting vulnerabilities and errors in compilers and interpretersAutomatic program generation for detecting vulnerabilities.
Dr. M. Al-Mulhem Introduction 1 Chapter 6 Type Systems.
Nathan Cooprider and John Regehr University of Utah School of Computing Pluggable Abstract Domains for Analyzing Embedded Software.
Test-Case Reduction for C Compiler Bugs
High-level optimization Jakub Yaghob
Nurit Dor Shir Landau-Feibish Noam Rinetzky
Undefined Behavior: Long Live Poison!
Volatiles Are Miscompiled, and What to Do about It
High Coverage Detection of Input-Related Security Faults
Control Flow.
Improving Reliability of Compilers
Pointers C#, pointers can only be declared to hold the memory addresses of value types int i = 5; int *p; p = &i; *p = 10; // changes the value of i to.
Efficient Memory Safety for TinyOS 2.1
Math 1-6: Warm-up Subtract. 4 – 9 -2 – (-6) |-3| – |7| -21 – (-10)
Presentation transcript:

Finding and Understanding Bugs in C Compilers Xuejun Yang Yang Chen Eric Eide John Regehr University of Utah

C compilers should be correct – Part of trusted computing base – Used to compile OS and safety critical applications But sometimes compilers are incorrect – Fail to compile a valid program – Generate wrong code 2

Contributions Developed Csmith, a random C program generator that is expressive and generates unambiguous code Used Csmith to find 382 bugs in widely used C compilers – Most of the bugs have been fixed 3

Random Generator: Csmith gcc -O0gcc -O2clang -Os … vote minority majority C program results 4

5

6

Why Csmith Works Unambiguous: avoid undefined or unspecified behaviors that create ambiguous meanings of a program Integer operations Loops (with break/continue) Conditionals Function calls Const and volatile Structs and Bitfields Pointers and arrays Goto Expressiveness: support most commonly used C features 7 Integer undefined behavior Use without initialization Unspecified evaluation order Use of dangling pointer Null pointer dereference OOB array access

8

Avoiding Undefined/unspecified Behaviors 9 ProblemGeneration Time SolutionRun Time Solution Integer undefined behaviors Constant folding/propagation Algebraic simplification Safe math wrappers Use without initialization explicit initializers OOB array accessForce index within rangeTake modulus Null pointer dereference Inter-procedural points-to analysis Use of dangling pointers Inter-procedural points-to analysis Unspecified evaluation order Inter-procedural effect analysis

Code Generator 10 assign call func_2 validate ok? Generation Time Analyzer no *q … RHS LHS

Code Generator 11 assign call func_2 Generation Time Analyzer … RHS LHS

*p 12 *p Code Generator update facts assign call func_2 validate ok? yes Generation Time Analyzer … RHS LHS

From March, 2008 to present: Do they matter? – 25 priority 1 bugs for GCC – 8 of our bugs were re-reported by others CompilerBugs reported (fixed) GCC104 (86) LLVM228 (221) Others (Compcert, icc, armcc, tcc, cil, suncc, open64, etc) 50 Total Accounts for 1% total valid GCC bugs reported in the same period Accounts for 3.5% total valid LLVM bugs reported in the same period

Bug Dist. Across Compiler Stages GCCLLVM Front end111 Middle end7193 Back end2878 Unclassified446 Total

15 Coverage of GCCCoverage of LLVM/Clang

Common Compiler Bug Pattern Analysis Safety Check Transformation Y N if (condition1 && condition2 ) 16 missing safety condition Compiler Optimization

CompCert Bugs Certified C compiler 11 bugs reported – All in the unproved front end or back end – No bugs in the proved part Developing compiler optimizations within a proof framework is helpful for compiler correctness 17

Conclusion By randomly generating expressive and unambiguous test cases, we have found, and continue to find, compiler bugs effectively Csmith is open source: 18