1 Lecture 24 Hiding Exploit in Compilers bootstrapping, self-generating code, tombstone diagrams Ras Bodik Mangpo and Ali Hack Your Language! CS164: Introduction.

Slides:



Advertisements
Similar presentations
Spring Semester 2013 Lecture 5
Advertisements

1 Anti Virus vs virus System i-Specific Anti-Virus Product Ali ameen al said.
Reflections on Trusting Trust Ken Thompson. Communication of the ACM, Vol. 27, No. 8, August 1984, pp Copyright 1984, Association for Computing.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Introduction to Computer Programming in C
Chapter3: Language Translation issues
Chapter 2: Algorithm Discovery and Design
Chapter 1 Introduction to Object- Oriented Programming and Problem Solving.
T-diagrams “Mommy, where do compilers come from?” Adapted from:
Computers: Tools for an Information Age
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
Guide To UNIX Using Linux Third Edition
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Models of Computation as Program Transformations Chris Chang
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
C programming Language and Data Structure For DIT Students.
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
CHAPTER 1: INTORDUCTION TO C LANGUAGE
1 Chapter-01 Introduction to Computers and C++ Programming.
CCSA 221 Programming in C CHAPTER 2 SOME FUNDAMENTALS 1 ALHANOUF ALAMR.
Welcome In The World Of ‘C’.  TEXT BOOK: Programming in ANCI ‘C By: E Balagurusamy. TMH  Reference Books: 1) The ‘C Programming Language By: Kernighan.
Invitation to Computer Science 5 th Edition Chapter 9 Introduction to High-Level Language Programming.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Invitation to Computer Science, Java Version, Second Edition.
Introduction of C++ language. C++ Predecessors Early high level languages or programming languages were written to address a particular kind of computing.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
1 Precise Enforcement of Policies After we have a policy, is there always a mechanism to enforce it? If so, can we devise a generic procedure for developing.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
Computer Programming A program is a set of instructions a computer follows in order to perform a task. solve a problem Collectively, these instructions.
Programming With C.
Programming in C#. I. Introduction C# (or C-Sharp) is a programming language. C# is used to write software that runs on the.NET Framework. Although C#
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Looping and Counting Lecture 3 Hartmut Kaiser
Algorithms  Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Chapter Three The UNIX Editors.
1 Subverting the Type System and Reflections on Trusting Trust Ras Bodik, Thibaud Hottelier, James Ide UC Berkeley CS164: Introduction to Programming Languages.
Compilers and Interpreters. HARDWARE Machine LanguageAssembly Language High Level Language C++ Visual Basic JAVA Humans.
Georgia Institute of Technology Speed part 4 Barb Ericson Georgia Institute of Technology May 2006.
CIT 380: Securing Computer Systems Security Solutions Part 2.
H1-1 University of Washington Computer Programming I Lecture 9: Iteration © 2000 UW CSE.
Introduction to Information Security מרצים : Dr. Eran Tromer: Prof. Avishai Wool: מתרגלים : Itamar Gilad
©2016 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. CSC 110 – INTRO TO COMPUTING - PROGRAMMING Overview of Programming.
CSE 311 Foundations of Computing I Lecture 28 Computability: Other Undecidable Problems Autumn 2011 CSE 3111.
Reflections on Trusting Trust Ken Thompson. Overview Introduction Introduction “Cutest Program” “Cutest Program” Stage 1 Stage 1 Stage 2 Stage 2 Stage.
1 Lecture 24 Subverting a Type System, Hiding Exploit in Compilers turning bitflip into an exploit; bootstrapping Ras Bodik Shaon Barman Thibaud Hottelier.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.
1 UMBC CMSC 104, Section Fall 2002 Functions, Part 1 of 3 Topics Top-down Design The Function Concept Using Predefined Functions Programmer-Defined.
Compilers and Interpreters
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
CMSC 104, Section 301, Fall Lecture 18, 11/11/02 Functions, Part 1 of 3 Topics Using Predefined Functions Programmer-Defined Functions Using Input.
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
OPERATORS IN C CHAPTER 3. Expressions can be built up from literals, variables and operators. The operators define how the variables and literals in the.
Lecture 1b- Introduction
Programming what is C++
CSCI-235 Micro-Computer Applications
Computer Programming.
Algorithms Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
Countering Trusting Trust through Diverse Double-Compiling
Welcome In The World Of ‘C’.  TEXT BOOK: Programming in ANSI ‘C By: E Balagurusamy. TMH  Reference Books: 1) The ‘C Programming Language By: Kernighan.
Assembler, Compiler, Interpreter
Chapter 1 Introduction(1.1)
Assembler, Compiler, Interpreter
CS-3013 Operating Systems Hugh C. Lauer
Life is Full of Alternatives
Capitolo 1 – Introduction C++ Programming
Reflections on Trusting Trust by Ken Thomson
Programming Logic and Design Eighth Edition
Presentation transcript:

1 Lecture 24 Hiding Exploit in Compilers bootstrapping, self-generating code, tombstone diagrams Ras Bodik Mangpo and Ali Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2013 UC Berkeley

Outline Ken Thompson’s “Reflections on trusting trust” -You can teach a compiler to propagate an exploit -General lessons on bootstrapping a compiler Tombstone diagrams -a visual notation for explaining bootstrapping -a Datalog implementation Reading (optional): -Reflections on Trusting TrustReflections on Trusting Trust -A Formalism for Translator InteractionsA Formalism for Translator Interactions 2

Reflections on Trusting Trust a Berkeley graduate, maybe :-) a former cs164 student best known for his work on Unix we also know him for the RE-to-NFA algorithm Ken Thompson, Turing Award, 1983

The DNA a self-replicating program 4

A self-replicating program P It seems like P prints itself twice? Why? char s[] = { ‘ ’, ‘0’, ‘ ’, ‘}’, ‘;’, ‘\n’, ‘\n’, ‘/’, ‘*’, ‘ ’, ‘T’, …, 0 }; /* The string is a representation of the body of this program * from ‘0’ to the end. */ main() { int i; printf(“char s[] = {\n”); // 0: for (i=0; s[i]; i++) printf(“%d, ”, s[i]); // 1: print s once printf(“%s”, s); // 2: print s again } 5

What is printed by each print statement char s[] = { ‘ ’, ‘0’, ‘ ’, ‘}’, ‘;’, ‘\n’, ‘\n’, ‘/’, ‘*’, ‘ ’, ‘T’, …, 0 }; /* The string is a representation of the body of this program * from ‘0’ to the end. */ main() { int i; printf(“char s[] = {\n”); for (i=0; s[i]; i++) printf(“%d, ”, s[i]); printf(“%s”, s); } 6

Analysis and lesson 7

Intermezzo Tombstone diagrams 8

Four tombstone diagrams 9 L M M L1L1 L2L2 M compiler f L a program computing a function f, written in language L an interpreter for language L, written in language M a machine executing language M a compiler written in language M, translating program in language L 1 to programs in language L 2

The interpreter rules interpreter(L1, L3) :- interpreter(L1, L4), interpreter(L4, L3). interpreter(L1, L3) :- compiler(L4, L3, L5), interpreter(L1, L4), machine(L5). 10

The compiler rules compiler(L1, L2, L3) :- compiler(L1, L2, L4), interpreter(L4, L3). compiler(L1, L2, L3) :- compiler(L4, L3, L5), compiler(L1, L2, L4), machine(L5). 11 C M C C M M cc C M M cc.execc.c M L1L1 L2L2 L4L4 cc L4L4 L3L3

How is Java compiled and executed 12

Bootstrapping growing a language in a portable fashion 13

A portable C compiler A compiler for C can compile itself –because the compiler is written in C It is therefore portable to other platforms. –Just recompile it on the new platform. 14

An example of a portable feature How Compiler Translates Escaped Char Literals: … c = next(); if (c != ‘\\’) return c; c = next(); if (c == ‘\\’) return ‘\\’; if (c == ‘n’) return ‘\n’; … Note that this is portable code: –‘\n’ is 0x0a on an ASCII platform but 0x15 on EBDIC –the same compiler code will work correctly on both

the bootstrapping problem You want to extend the C language with the ‘\v’ literal –‘\v’ can be n on one machine on m on another –you want to use in the compiler code the portable expression ‘\v’ rather than hardcode n or m –you want the compiler to look like this c = next(); if (c != ‘\\’) return c; c = next(); if (c == ‘\\’) return ‘\\’; if (c == ‘n’) return ‘\n’; if (c == ‘v’) return ‘\v’; 16

solving the bootstrapping problem Your compiled (.exe) compiler does not accept \v, so you teach it: –write this code first, compile it, and make it your binary C compiler now your exe compiler accepts \v in input programs –then edit 11 to ‘\v’ in the compiler source code now your compiler source code is portable –how about other platforms? c = next(); if (c != ‘\\’) return c; c = next(); if (c == ‘\\’) return ‘\\’; if (c == ‘n’) return ‘\n’; if (c == ‘v’) return 11; 17

discussion By compiling ‘\v’ into 11 just once, we taught the compiler forever that ‘\v’ == 11 (on that platform). The term “taught” is not too much of a stretch –no matter how many times you now recompile the compiler, it will perpetuate the knowledge

This bootstrapping with tombstones 19 Naming: C: the C language without \v C v : C with support for \v Step 1: write a compiler for C v in C by extending the existing C compiler Step 2: compile step 1 compiler to now we have an executable compiler for C v Step 3: write a compiler for C v in C v really, just replace 11 with the portable ‘\v’ CvCv M C cc.c c = next(); if (c != ‘\\’) return c; c = next(); if (c == ‘\\’) return ‘\\’; if (c == ‘n’) return ‘\n’; if (c == ‘v’) return 11; CvCv M C C M M cc CvCv M M cc.execc.c M CvCv M CvCv c = next(); if (c != ‘\\’) return c; c = next(); if (c == ‘\\’) return ‘\\’; if (c == ‘n’) return ‘\n’; if (c == ‘v’) return ‘\v’;

We can use Datalog deduction to see this process machine(m). # we have a machine m compiler(c,m,m). # cc.exe: we have an m-executable compiler from C to machine m compiler(c,m,c). # cc.c: we also have the source code of this same compiler compiler(cv,m,c). # cc.v: we write the compiler for Cv in C compiler(L1, L2, L3) :- compiler(L1, L2, L4), interpreter(L4, L3). interpreter(L1, L3) :- interpreter(L1, L4), interpreter(L4, L3). compiler(L1, L2, L3) :- machine(L5), compiler(L4, L3, L5), compiler(L1, L2, L4). interpreter(L1, L3) :- machine(L5), compiler(L4, L3, L5), interpreter(L1, L4). You can now ask whether it is possible to obtain compiler from Cv to m that is executable: ?- compiler(cv, m, m). 20

The Black Belt creating a perpetual backdoor super-user access 21

Stage I: Hack login.c Login: utility that checks passwords and grants credentials. if(hash(pswd)==stored[user]) { grant access } Thompson created a backdoor into Unix: -make login.c grant access to any user (including the superuser) -condition: a magic password (Ken Thompson’s) is entered /* if ‘kt’ open sesame */ if(hash(pswd)==stored[user] || hash(pswd)== L) { grant access to ‘user’ } 22

Stage I: Hack login.c 23 add exploit: edit login.c to insert this code block: if ‘kt’ open sesame C login* C login C M M cc install as the system login utility M login*

Stage I limitations Someone will rather soon notice the exploit in login.c. After all, login.c is written in C (it is readable). Also, it’s a security-critical code, so it will be audited. ==> We need to hide the exploit somewhere else. 24

Stage II: Hack the C compiler (cc.c) Assume that compile() compiles a line of source code compile(char s[]) { … } login.c passes through this procedure when compiled. He who controls the compiler … 25

Stage III This is a routine that compiles one line of source code compile(char s[]) { if (match(s, “ ”)) { compile(“ ”); return; } } 26

Stage II: Hack cc.c 27 C C C hack C M C cc C M C C M C C M C clean up: remove exploit from cc.c to hide it from auditors.

Stage II: Hack cc.c 28 C M C cc* C M M cc C M M cc* C C M hack C M M cc install as system compiler C login C login* install as system login utility M login*

Stage II limitations Eventually the compiler will be recompiled. Using the clean cc.c will produce clean cc.exe. The exploit will be lost. 29

Stage III We want the exploit in the compiler to self reproduce. compile(char s[]) { if (match(s, “ ”)) { compile(“suitably edited function”); return; } if (match(s, “ ”)) { compile(“magic text”); return; }... } What is magic text? it reproduces the inserted hack

Stage III 31 C C C hack C M C cc C M C C M C C M C clean up: remove exploit

Stage III 32 C M C cc** C M C cc C M M cc** C M C cc C M C cc** C M M C C M hack C M M cc install as system compiler

Your excercise Figure out what magic text needs to be exactly. How resilient is Thompson’s technique to changes in the compiler source code? Will it work when someone entirely rewrites the cc.c or login.c? What are other ways how the exploit can be lost or discovered? 33

Summary 34

35 Summary PL knowledge useful beyond language design and implementation Helps programmers understand the behavior of their code Compiler techniques can help to address other problems like security (big research area) Safety and security are hard –Assumptions must be explicit