Improving Reliability of Compilers

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Technology from seed Weakest Precondition Synthesis for Compiler Optimizations Nuno Lopes and José Monteiro.
ISBN Chapter 3 Describing Syntax and Semantics.
Technology from seed Automatic Synthesis of Weakest Preconditions for Compiler Optimizations Nuno Lopes Advisor: José Monteiro.
CS 355 – Programming Languages
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.
C. FlanaganSAS’04: Type Inference Against Races1 Type Inference Against Races Cormac Flanagan UC Santa Cruz Stephen N. Freund Williams College.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.
Describing Syntax and Semantics
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Cormac Flanagan University of California, Santa Cruz Hybrid Type Checking.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
Procedure specifications CSE 331. Outline Satisfying a specification; substitutability Stronger and weaker specifications - Comparing by hand - Comparing.
Have Your Verified Compiler And Extend It Too Zachary Tatlock Sorin Lerner UC San Diego.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
1 Introduction to JVM Based on material produced by Bill Venners.
Quantum Programming Languages for Specification and Optimization Fred Chong UC Santa Barbara Ken Brown, Ravi Chugh, Margaret Martonosi, John Reppy and.
SWE 619 © Paul Ammann Procedural Abstraction and Design by Contract Paul Ammann Information & Software Engineering SWE 619 Software Construction cs.gmu.edu/~pammann/
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Provably Correct Peephole Optimizations with Alive.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
13 Aug 2013 Program Verification. Proofs about Programs Why make you study logic? Why make you do proofs? Because we want to prove properties of programs.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Spec# John Lefor Program Manager Developer Division, Microsoft.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
C HAPTER 3 Describing Syntax and Semantics. D YNAMIC S EMANTICS Describing syntax is relatively simple There is no single widely acceptable notation or.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
CSCI 161 Lecture 3 Martin van Bommel. Operating System Program that acts as interface to other software and the underlying hardware Operating System Utilities.
Finding and Understanding Bugs in C Compilers Xuejun Yang Yang Chen Eric Eide John Regehr University of Utah.
LLVM IR, File - Praakrit Pradhan. Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.
Test-Case Reduction for C Compiler Bugs
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
A Review of Software Testing - P. David Coward
Types for Programs and Proofs
This Week: Integers Integers Summary
Instructor: David Ferry
CSE 374 Programming Concepts & Tools
Undefined Behavior: Long Live Poison!
Volatiles Are Miscompiled, and What to Do about It
LLVM Pass and Code Instrumentation
Testing, conclusion Based on material by Michael Ernst, University of Washington.
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
Expressions Chapter 4 Copyright © 2008 W. W. Norton & Company.
Performance Optimization for Embedded Software
Comp Integers Spring 2015 Topics Numeric Encodings
Lecture 5 Floyd-Hoare Style Verification
AdaCore Technologies for Cyber Security
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
CSC 3210 Computer Organization and Programming
Programming Languages 2nd edition Tucker and Noonan
Chapter-3 Operators.
Semantics In Text: Chapter 3.
Operations and Arithmetic
Abstraction, Verification & Refinement
The Zoo of Software Security Techniques
Programming Languages and Compilers (CS 421)
CSc 453 Interpreters & Interpretation
Programming Languages 2nd edition Tucker and Noonan
Presentation transcript:

Improving Reliability of Compilers Nuno Lopes Joint work with David Menendez, Santosh Nagarakatte, John Regehr, Sarah Winkler

Compilers… Frontend Optimization 1 Optimization n Backend … 100101010 010001011 100110101 101010111 001010110

Why care about Compilers? Today’s software goes through at least one compiler Correctness and safety depends on compilers Applications OS Compiler Hardware

Pressure for better Compilers Improve performance Reduce code size Reduce energy consumption

Compilers do deliver LLVM 3.2 introduced a Loop Vectorizer Performance improvement of 10-300% in benchmarks Visual Studio 2015 Update 3: > 10% on SPEC CPU 2k/2k6

Compilers also have Bugs Csmith [PLDI’11]: 79 bugs in GCC (25 P1) 202 bugs in LLVM Orion [PLDI’14]: 40 wrong-code bugs in GCC 42 wrong-code bugs in LLVM Last Week: 476 open wrong-code bug reports in GCC (out of 9,930) 22 open wrong-code bug reports in LLVM (out of 7,002)

Buggy Compilers = Security Bugs CVE-2006-1902 GCC 4.1/4.2 (fold-const.c) had a bug that could remove valid pointer comparisons Removed bounds checks in, e.g., Samba

Churn in Compiler’s code gcc:

March to Disaster? 2015: Academics introduce backdoor in “sudo” through a known bug in LLVM 1984: 1st compiler backdoor by Ken Thompson 2006: 1st CVE report on a compiler introducing a security vulnerability 1974: idea of using compilers to introduce backdoors - US Air Force report on Multics Backdoor in Windows/Linux? 1970 1980 1990 2000 2010 2020

Motivation Pressure to Improve Compilers More Bugs in Compilers Potential Crashes and Vulnerabilities in Compiled Software

Software Development Practices Unit tests End-to-end tests Fuzzing Static Analysis Verification of pseudo-code … Since ever? Since 2000s (SLAM, PREfast, SAGE, Z3…) ?

Software Compiler Development Practices Unit tests End-to-end tests Fuzzing Static Analysis Verification of pseudo-code … Unit tests End-to-end tests Fuzzing Static Analysis Verification of pseudo-code Semi-automated verification – CompCert (2008) … Since ever? Csmith – industry standard [PLDI’11] Since 2000s ? ?

Why no Static Analysis for Compilers? Static analysis for general software targets safety (i.e., no crashes) Compiler correctness needs functional verification Safety checking still an open research problem Functional verification is harder

“Static Analysis” for Compilers Compiler verification: verify compiler once and for all (e.g., CompCert, Alive [PLDI’15]) Compiler validation: verify output of compiler on every compilation (utcTV) For new code: specify in a DSL and verify automatically For legacy code: validation

Optimizations are Easy to Get Wrong int a = x << c; int b = a / d; int t = d / (1 << c); int b = x / t; x * 2c / d x / (d / 2c) = x / d * 2c = x * 2c / d School math vs compiler math (c and d are constants)

Optimizations are Easy to Get Wrong int a = x << c; int b = a / d; int t = d / (1 << c); int b = x / t; ERROR: Domain of definedness of Target is smaller than Source's for i4 %b Example: %X i4 = 0x0 (0) c i4 = 0x3 (3) d i4 = 0x7 (7) %a i4 = 0x0 (0) (1 << c) i4 = 0x8 (8, -8) %t i4 = 0x0 (0) Source value: 0x0 (0) Target value: undef LLVM bug #21245 Even people that do this everyday get it wrong

Implementing Peephole Optimizers { Value *Op1C = Op1; BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0); if (!BO || (BO->getOpcode() != Instruction::UDiv && BO->getOpcode() != Instruction::SDiv)) { Op1C = Op0; BO = dyn_cast<BinaryOperator>(Op1); } Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() && (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) && (BO->getOpcode() == Instruction::UDiv || BO->getOpcode() == Instruction::SDiv)) { Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1); // If the division is exact, X % Y is zero, so we end up with X or -X. if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO)) if (SDiv->isExact()) { if (Op1BO == Op1C) return ReplaceInstUsesWith(I, Op0BO); return BinaryOperator::CreateNeg(Op0BO); } Value *Rem; if (BO->getOpcode() == Instruction::UDiv) Rem = Builder->CreateURem(Op0BO, Op1BO); else Rem = Builder->CreateSRem(Op0BO, Op1BO); Rem->takeName(BO); if (Op1BO == Op1C) return BinaryOperator::CreateSub(Op0BO, Rem); return BinaryOperator::CreateSub(Rem, Op0BO); } }

Alive New language and tool for: Specifying peephole optimizations Proving them correct (or generate a counterexample) Generating C++ code for a compiler Design point: both practical and formal

A Simple Peephole Optimization {   Value *Op1C = Op1;   BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0);   if (!BO ||       (BO->getOpcode() != Instruction::UDiv &&        BO->getOpcode() != Instruction::SDiv)) {     Op1C = Op0;     BO = dyn_cast<BinaryOperator>(Op1);   }   Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() &&      (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) &&     (BO->getOpcode() == Instruction::UDiv ||       BO->getOpcode() == Instruction::SDiv)) {    Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1);    // If the division is exact, X % Y is zero, so we end up with X or -X.    if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO))      if (SDiv->isExact()) {        if (Op1BO == Op1C)          return ReplaceInstUsesWith(I, Op0BO);        return BinaryOperator::CreateNeg(Op0BO);      }    Value *Rem;    if (BO->getOpcode() == Instruction::UDiv)      Rem = Builder->CreateURem(Op0BO, Op1BO);    else      Rem = Builder->CreateSRem(Op0BO, Op1BO);    Rem->takeName(BO);    if (Op1BO == Op1C)      return BinaryOperator::CreateSub(Op0BO, Rem);    return BinaryOperator::CreateSub(Rem, Op0BO);  } } int f(int x, int y) { return (x / y) * y; } Compile to LLVM IR define i32 @f(i32 %x, i32 %y) { %1 = sdiv i32 %x, %y %2 = mul i32 %1, %y ret i32 %2 } Optimize define i32 @f(i32 %x, i32 %y) { %1 = srem i32 %x, %y %2 = sub i32 %x, %1 ret i32 %2 }

A Simple Peephole Optimization { Value *Op1C = Op1; BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0); if (!BO || (BO->getOpcode() != Instruction::UDiv && BO->getOpcode() != Instruction::SDiv)) { Op1C = Op0; BO = dyn_cast<BinaryOperator>(Op1); } Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() && (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) && (BO->getOpcode() == Instruction::UDiv || BO->getOpcode() == Instruction::SDiv)) { Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1); // If the division is exact, X % Y is zero, so we end up with X or -X. if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO)) if (SDiv->isExact()) { if (Op1BO == Op1C) return ReplaceInstUsesWith(I, Op0BO); return BinaryOperator::CreateNeg(Op0BO); } Value *Rem; if (BO->getOpcode() == Instruction::UDiv) Rem = Builder->CreateURem(Op0BO, Op1BO); else Rem = Builder->CreateSRem(Op0BO, Op1BO); Rem->takeName(BO); if (Op1BO == Op1C) return BinaryOperator::CreateSub(Op0BO, Rem); return BinaryOperator::CreateSub(Rem, Op0BO); } } define i32 @f(i32 %x, i32 %y) { %1 = sdiv i32 %x, %y %2 = mul i32 %1, %y ret i32 %2 } => %1 = srem i32 %x, %y %2 = sub i32 %x, %1 define i32 @f(i32 %x, i32 %y) { %1 = sdiv i32 %x, %y %2 = mul i32 %1, %y ret i32 %2 } Optimize define i32 @f(i32 %x, i32 %y) { %1 = srem i32 %x, %y %2 = sub i32 %x, %1 ret i32 %2 }

A Simple Peephole Optimization { Value *Op1C = Op1; BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0); if (!BO || (BO->getOpcode() != Instruction::UDiv && BO->getOpcode() != Instruction::SDiv)) { Op1C = Op0; BO = dyn_cast<BinaryOperator>(Op1); } Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() && (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) && (BO->getOpcode() == Instruction::UDiv || BO->getOpcode() == Instruction::SDiv)) { Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1); // If the division is exact, X % Y is zero, so we end up with X or -X. if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO)) if (SDiv->isExact()) { if (Op1BO == Op1C) return ReplaceInstUsesWith(I, Op0BO); return BinaryOperator::CreateNeg(Op0BO); } Value *Rem; if (BO->getOpcode() == Instruction::UDiv) Rem = Builder->CreateURem(Op0BO, Op1BO); else Rem = Builder->CreateSRem(Op0BO, Op1BO); Rem->takeName(BO); if (Op1BO == Op1C) return BinaryOperator::CreateSub(Op0BO, Rem); return BinaryOperator::CreateSub(Rem, Op0BO); } } %1 = sdiv i32 %x, %y %2 = mul i32 %1, %y => %t = srem i32 %x, %y %2 = sub i32 %x, %t

A Simple Peephole Optimization { Value *Op1C = Op1; BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0); if (!BO || (BO->getOpcode() != Instruction::UDiv && BO->getOpcode() != Instruction::SDiv)) { Op1C = Op0; BO = dyn_cast<BinaryOperator>(Op1); } Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() && (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) && (BO->getOpcode() == Instruction::UDiv || BO->getOpcode() == Instruction::SDiv)) { Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1); // If the division is exact, X % Y is zero, so we end up with X or -X. if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO)) if (SDiv->isExact()) { if (Op1BO == Op1C) return ReplaceInstUsesWith(I, Op0BO); return BinaryOperator::CreateNeg(Op0BO); } Value *Rem; if (BO->getOpcode() == Instruction::UDiv) Rem = Builder->CreateURem(Op0BO, Op1BO); else Rem = Builder->CreateSRem(Op0BO, Op1BO); Rem->takeName(BO); if (Op1BO == Op1C) return BinaryOperator::CreateSub(Op0BO, Rem); return BinaryOperator::CreateSub(Rem, Op0BO); } } %1 = sdiv i32 %x, %y %2 = mul i32 %1, %y => %t = srem i32 %x, %y %2 = sub i32 %x, %t

A Simple Peephole Optimization { Value *Op1C = Op1; BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0); if (!BO || (BO->getOpcode() != Instruction::UDiv && BO->getOpcode() != Instruction::SDiv)) { Op1C = Op0; BO = dyn_cast<BinaryOperator>(Op1); } Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() && (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) && (BO->getOpcode() == Instruction::UDiv || BO->getOpcode() == Instruction::SDiv)) { Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1); // If the division is exact, X % Y is zero, so we end up with X or -X. if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO)) if (SDiv->isExact()) { if (Op1BO == Op1C) return ReplaceInstUsesWith(I, Op0BO); return BinaryOperator::CreateNeg(Op0BO); } Value *Rem; if (BO->getOpcode() == Instruction::UDiv) Rem = Builder->CreateURem(Op0BO, Op1BO); else Rem = Builder->CreateSRem(Op0BO, Op1BO); Rem->takeName(BO); if (Op1BO == Op1C) return BinaryOperator::CreateSub(Op0BO, Rem); return BinaryOperator::CreateSub(Rem, Op0BO); } } %1 = sdiv %x, %y %2 = mul %1, %y => %t = srem %x, %y %2 = sub %x, %t

A Simple Peephole Optimization { Value *Op1C = Op1; BinaryOperator *BO = dyn_cast<BinaryOperator>(Op0); if (!BO || (BO->getOpcode() != Instruction::UDiv && BO->getOpcode() != Instruction::SDiv)) { Op1C = Op0; BO = dyn_cast<BinaryOperator>(Op1); } Value *Neg = dyn_castNegVal(Op1C); if (BO && BO->hasOneUse() && (BO->getOperand(1) == Op1C || BO->getOperand(1) == Neg) && (BO->getOpcode() == Instruction::UDiv || BO->getOpcode() == Instruction::SDiv)) { Value *Op0BO = BO->getOperand(0), *Op1BO = BO->getOperand(1); // If the division is exact, X % Y is zero, so we end up with X or -X. if (PossiblyExactOperator *SDiv = dyn_cast<PossiblyExactOperator>(BO)) if (SDiv->isExact()) { if (Op1BO == Op1C) return ReplaceInstUsesWith(I, Op0BO); return BinaryOperator::CreateNeg(Op0BO); } Value *Rem; if (BO->getOpcode() == Instruction::UDiv) Rem = Builder->CreateURem(Op0BO, Op1BO); else Rem = Builder->CreateSRem(Op0BO, Op1BO); Rem->takeName(BO); if (Op1BO == Op1C) return BinaryOperator::CreateSub(Op0BO, Rem); return BinaryOperator::CreateSub(Rem, Op0BO); } } Name: sdiv general %1 = sdiv %x, %y %2 = mul %1, %y => %t = srem %x, %y %2 = sub %x, %t Name: sdiv exact %1 = sdiv exact %x, %y %2 = %x

Alive Language Precondition Pre: C2 % (1<<C1) == 0 %s = shl nsw %X, C1 %r = sdiv %s, C2 => %r = sdiv %X, C2/(1<<C1) Source template Target template Predicates in preconditions may be the result of a dataflow analysis.

Alive Language Pre: C2 % (1<<C1) == 0 %s = shl nsw %X, C1 %r = sdiv %s, C2 => %r = sdiv %X, C2/(1<<C1) Constants Generalized from LLVM IR: Symbolic constants Implicit types

Refinement Constraints Alive Typing Constraints Refinement Constraints Alive Transformation C++

Type Inference Encode type constraints in SMT Operand have same type Result of trunc has fewer bits than argument Find all solutions for the constraint Up to a bounded bitwidth, e.g., 64

Correctness Criteria Target invokes undefined behavior only when the source does Result of target = result of source when source does not invoke undefined behavior Final memory states are equivalent LLVM has 3 types of UB: Poison values Undef values True UB

The story of a new optimization A developer wrote a new optimization that improves benchmarks: 3.8% perlbmk (SPEC CPU 2000) 1% perlbench (SPEC CPU 2006) 1.2% perlbench (SPEC CPU 2006) w/ LTO+PGO 40 lines of code August 2014

The story of a new optimization The first patch was wrong ERROR: Mismatch in values of %r Example: %A i4 = 0x0 (0) C1 i4 = 0xA (10, -6) C3 i4 = 0x5 (5) C2 i4 = 0x2 (2) %x i4 = 0xA (10, -6) %i i1 = 0x0 (0) %y i4 = 0x2 (2) %j i1 = 0x1 (1, -1) %and i4 = 0x0 (0) %lhs i4 = 0xA (10, -6) Source value: 0x1 (1, -1) Target value: 0x0 (0) Pre: isPowerOf2(C1 ^ C2) %x = add %A, C1 %i = icmp ult %x, C3 %y = add %A, C2 %j = icmp ult %y, C3 %r = or %i, %j => %and = and %A, ~(C1 ^ C2) %lhs = add %and, umax(C1, C2) %r = icmp ult %lhs, C3

The story of a new optimization The second patch was wrong The third patch was correct! Still fires on the benchmarks! Pre: C1 u> C3 && C2 u> C3 && isPowerOf2(C1 ^ C2) && isPowerOf2(-C1 ^ -C2) && (-C1 ^ -C2) == ((C3-C1) ^ (C3-C2)) && abs(C1-C2) u> C3 %x = add %A, C1 %i = icmp ult %x, C3 %y = add %A, C2 %j = icmp ult %y, C3 %r = or %i, %j => %and = and %A, ~(C1^C2) %lhs = add %and, umax(C1,C2) %r = icmp ult %lhs, C3

Experiments Translated > 300 optimizations from LLVM’s InstCombine to Alive. Found 8 bugs; remaining proved correct. Automatic optimal post-condition strengthening Significantly better than developers Replaced InstCombine with automatically generated code

InstCombine: Stats per File # opts. # translated # bugs AddSub 67 49 2 AndOrXor 165 131 Calls 80 - Casts 77 Combining 63 Compares 245 LoadStoreAlloca 28 17 MulDivRem 65 44 6 PHI 12 Select 74 52 Shifts 43 41 SimplifyDemanded 75 VectorOps 34 Total 1,028 334 8 14% wrong! Found bugs even after all the fuzzing

Optimal Attribute Inference Pre: C1 % C2 == 0 %m = mul nsw %X, C1 %r = sdiv %m, C2 => %r = mul nsw %X, C1/C2 States that the operation will not result in a signed overflow Devs not confident in adding/propagating attributes.

Optimal Attribute Inference Weakened 1 precondition Strengthened the postcondition for 70 (21%) optimizations 40% for AddSub, MulDivRem, Shifts Postconditions state, e.g., when an operation will not overflow Devs not confident in adding/propagating attributes.

Long Tail of Optimizations SPEC gives poor code coverage

Alive is Useful! Released as open-source in Fall 2014 In use by developers across 6 companies Already caught dozens of bugs in new patches Talks about replacing InstCombine

utcTV: Validation for UTC Frontend Optimization 1 Optimization n Backend … OK / Bug 100101010 010001011 100110101 101010111 001010110 utcTV

utcTV: Validation for UTC New compiler switch: /d2Verify Validates optimizers at the IR/IL level No full correctness guarantee (yet) for some cases

utcTV: Early Results Benchmark Lines of code Compile with /d2Verify Slowdown bzip2 7k 5 min 106x gcc 754k 8 hours 186x gzip 9k 2 min 70x sqlite3 189k 1 h 20 min 234x Z3 500k 17 hours 32x Note: 32-bits, single-threaded compiler

Conclusion Today’s software passes through at least one compiler Compilers can introduce bugs and security vulnerabilities in correct programs We need reliable compilers Alive and utcTV are first steps in this direction Can we replicate the success of Static Analysis?