Cs3102: Theory of Computation Class 21: Undecidability in Theory and Practice Spring 2010 University of Virginia David Evans Exam 2: Out at end of class.

Slides:



Advertisements
Similar presentations
David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 17: ProvingUndecidability.
Advertisements

CS 345: Chapter 9 Algorithmic Universality and Its Robustness
THE CHURCH-TURING T H E S I S “ TURING MACHINES” Pages COMPUTABILITY THEORY.
Introduction to Information Security ROP – Recitation 5 nirkrako at post.tau.ac.il itamarg at post.tau.ac.il.
1 CHAPTER 8 BUFFER OVERFLOW. 2 Introduction One of the more advanced attack techniques is the buffer overflow attack Buffer Overflows occurs when software.
Nathan Brunelle Department of Computer Science University of Virginia Theory of Computation CS3102 – Spring 2014 A tale.
CS 310 – Fall 2006 Pacific University CS310 Turing Machines Section 3.1 November 6, 2006.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Maziéres, Dan Boneh
Class 19: Undecidability in Theory and Practice David Evans cs302: Theory of Computation University of Virginia Computer.
CS5371 Theory of Computation Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)
David Evans CS150: Computer Science University of Virginia Computer Science Lecture 28: Implementing Interpreters.
Cs3102: Theory of Computation Class 20: Busy Beavers Spring 2010 University of Virginia David Evans Office hours: I am not able to hold my Thursday morning.
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
Cs3102: Theory of Computation Class 17: Undecidable Languages Spring 2010 University of Virginia David Evans.
Remaining Topics Decidability Concept 4.1 The Halting Problem 4.2
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
Complexity and Computability Theory I Lecture #13 Instructor: Rina Zviel-Girshin Lea Epstein Yael Moses.
Cs3102: Theory of Computation Class 18: Proving Undecidability Spring 2010 University of Virginia David Evans.
Lecture 6: Buffer Overflow CS 436/636/736 Spring 2014 Nitesh Saxena *Adopted from a previous lecture by Aleph One (Smashing the Stack for Fun and Profit)
Introduction CS 104: Applied C++ What is Programming? For some given problem: __________ a solution for it -- identify, organize & store the problem's.
University of Washington Today Memory layout Buffer overflow, worms, and viruses 1.
Class 37: Computability in Theory and Practice cs1120 Fall 2011 David Evans 21 November 2011 cs1120 Fall 2011 David Evans 21 November 2011.
David Evans University of Virginia cs1120 Fall Lecture 38: Modeling Computing.
Mitigation of Buffer Overflow Attacks
Introduction to CS Theory Lecture 15 –Turing Machines Piotr Faliszewski
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Presenter: Jianyong Dai Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookhot.
University of Washington Today Happy Monday! HW2 due, how is Lab 3 going? Today we’ll go over:  Address space layout  Input buffers on the stack  Overflowing.
Buffer Overflow CS461/ECE422 Spring Reading Material Based on Chapter 11 of the text.
Phrase-structure grammar A phrase-structure grammar is a quadruple G = (V, T, P, S) where V is a finite set of symbols called nonterminals, T is a set.
THE CHURCH-TURING T H E S I S “ TURING MACHINES” Part 1 – Pages COMPUTABILITY THEORY.
Overflows & Exploits. In the beginning 11/02/1988 Robert Morris, Jr., a graduate student in Computer Science at Cornell, wrote an experimental, self-replicating,
Lecture 8: Buffer Overflow CS 436/636/736 Spring 2013 Nitesh Saxena *Adopted from a previous lecture by Aleph One (Smashing the Stack for Fun and Profit)
Cs3102: Theory of Computation Class 8: Non-Context-Free Languages Spring 2010 University of Virginia David Evans.
Introduction 1 (Read Chap. 1) What is Programming? For some given problem: design a solution for it -- identify, organize & store the problem's data --
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Introduction to Information Security ROP – Recitation 5.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
1 Understanding Pointers Buffer Overflow. 2 Outline Understanding Pointers Buffer Overflow Suggested reading –Chap 3.10, 3.12.
AES Encryption FIPS 197, November 26, Bit Block Encryption Key Lengths 128, 192, 256 Number of Rounds Key Length Rounds Block.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Where’s the FEEB?: Effectiveness of Instruction Set Randomization Nora Sovarel, David Evans, Nate Paul University of Virginia Computer Science USENIX Security.
COMP1070/2002/lec1/H.Melikian COMP1070 Lecture #2 Computers and Computer Languages Some terminology What is Software? Operating Systems.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
Slides by Kent Seamons and Tim van der Horst Last Updated: Nov 11, 2011.
1 Section 13.1 Turing Machines A Turing machine (TM) is a simple computer that has an infinite amount of storage in the form of cells on an infinite tape.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 3 The Church-Turing Thesis Contents Turing Machines definitions, examples,
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
1 8.4 Extensions to the Basic TM Extended TM’s to be studied: Multitape Turing machine Nondeterministic Turing machine The above extensions make no increase.
Software Engineering Algorithms, Compilers, & Lifecycle.
David Evans CS150: Computer Science University of Virginia Computer Science Class 32: Computability in Theory and Practice.
Chapter 9 Turing Machines What would happen if we change the stack in Pushdown Automata into some other storage device? Truing Machines, which maintains.
Computer Software 1.
Computability. Turing Machines Read input letter & tape letter, write tape letter, move left or right.
Introduction to Information Security
Mitigation against Buffer Overflow Attacks
Buffer Overflow Buffer overflows are possible because C doesn’t check array boundaries Buffer overflows are dangerous because buffers for user input are.
The Hardware/Software Interface CSE351 Winter 2013
CS216: Program and Data Representation
CSE 311 Foundations of Computing I
Introduction to Information Security
Combinations COURSE 3 LESSON 11-3
CS 465 Buffer Overflow Slides by Kent Seamons and Tim van der Horst
Make an Organized List and Simulate a Problem
Defeating Instruction Set Randomization Nora Sovarel
Format String.
Machine Level Representation of Programs (IV)
Lecture 24: Metalinguistics CS200: Computer Science
Presentation transcript:

cs3102: Theory of Computation Class 21: Undecidability in Theory and Practice Spring 2010 University of Virginia David Evans Exam 2: Out at end of class today, due Tuesday at 2:01pm

Menu Turing-equivalent Grammar Universal Programming Languages “Return-Oriented Programming” Problems Computers Can and Cannot Solve What does undecidability mean for problems real people (not just CS theorists and Busy Beavers) care about?

Models of Computation MachineReplacement Grammar Finite Automata (Class 2-5) Regular Grammar ( A  aB ) Pushdown Automata (add a stack) (Classes 6-7) Context-free Grammar (Classes 7-9) ( A  BC ) Turing machine (add an infinite tape) (Classes 14-20) Adapted from Class 1: ?

Unrestricted Grammar aXbY  cZd Right and left sides of a grammar rule can be any sequence of terminals and nonterminals. How can we prove unrestricted grammars are equivalent to a Turing machine?

Simulation Proof Show that we can simulate every Unrestricted Grammar with some TM. Show that we can simulate every TM with some Unrestricted Grammar. Fairly easy, but tedious (not shown): design a TM that does the grammar replacements by writing on the tape.

Simulating TM with UG

(No replacement rules with R on left side)

Initial Configuration... q0q0 w0w0 w1w1 w2w2

Models of Computation MachineReplacement Grammar Finite Automata (Class 2-5) Regular Grammar ( A  aB ) Pushdown Automata (add a stack) (Classes 6-7) Context-free Grammar (Classes 7-9) ( A  BC ) Turing machine (add an infinite tape) (Classes 14-20) Unrestricted Grammar (Class 21) (  )

Universal Programming Language Definition: a programming language that can describe every algorithm. Equivalently: a programming language that can simulate every Turing Machine. Equivalently: a programming language in which you can implement a Universal Turing Machine.

Which of these are Universal Programming Languages? Python Java C++ C# HTML Scheme Ruby COBOL Fortran JavaScript PostScript BASIC x86 TeX PDF

Proofs BASIC, C, C++, C#, Fortran, Java, JavaScript, PDF, PostScript, Python, Ruby, Scheme, TeX, etc. are universal – Proof: implement a TM simulator in the PL HTML (before HTML5) is not universal: – Proof: show some algorithm that cannot be implemented in HTML An infinite loop – HTML5 might be a universal programming language! (Proof is worth challenge bonus.)

Why is it impossible for a programming language to be both universal and resource-constrained? Resource-constrained means it is possible to determine an upper bound on the resources any program in the language can consume.

All universal programming language are equivalent in power: they can all simulate a TM, which can carry out any mechanical algorithm.

Why so many equally powerful programming languages?

Proliferation of Universal PLs “Aesthetics” – Some people like :=, others prefer =. – Some people think whitespace shouldn’t matter (e.g., Java), others think programs should be formatted like they mean (e.g., Python) – Some people like goto, others like throw. Expressiveness vs. Simplicity – Hard to write programs in SUBLEQ Expressiveness vs. “Truthiness” – How much you can say with a little code vs. how likely it is your code means what you think it does

Programming Language Design Space Expressiveness “Truthiness” Scheme Python Java C low high Spec# Ada strict typing, static more mistake prone less mistake prone public class HelloWorld { public static void main(String[] args) { System.out.println ("Hello!"); } print ("Hello!") (display “Hello!”) x86

Do most x86 programs contain Universal Turing Machines? Hovav Shacham. The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86). CCS [Paper link]

CS 3330 Condensed Instruction Pointer (ip) Fetch Instruction Update ip load Execute Instruction Stack Return Address Memory Execution Stack (not like PDA stack)

CS 2150 Really Condensed x86 programs are just sequences of bytes (1 byte = 8 bits = 2 hex characters) Instructions are encoded using variable-length (1-15 bytes) First byte is opcode that identifies the type of instruction bb ef be ad de MOV $0xDEADBEEF, %eax 5-byte instruction that writes the constant 0xDEADBEEF into register %eax c3 RET 1-byte instruction that returns (jumps to the return address that is stored in a location on the stack) eb fe JMP -2 2-byte instruction that moves the instruction pointer back 2

Sequences of Instructions f7 c f c3 (Example from Shacham’s paper) test $0x , %edi setnzb -61(%ebp) ret inc %ebp xchg %ebp, %eax movl $0x0f000000, (%edi)

“Return-Oriented Programming” Return Address 3 Return Address 1 Return Address 2 Return Address 6 Return Address 4 Return Address 5 … Execution Stack 41 AC 16 FA A C 2D 43 B3 47 4C B8 C9 9F BB E9 4E 75 D5 A5 64 5F 5A B6 4B 61 A8 B4 AE B CA 20 F5 F2 16 C2 B4 8D DC B1 AD BB DA 39 B3 23 7D B3 BD D2 87 D6 D2 C5 67 8B BE 5F 09 BB B8 F7 EF E 8F 5E 4C C1 66 C1 1D B7 C F9 CD 82 2F 93 C2 10 5D DD 21 4D 16 F4 8E 36 7B 7D 91 C7 D3 E1 49 DB A5 FE A4 61 5C 5D E4 8C 8D 6C 33 C3 46 7E 27 F F6 F9 B0 E8 B F 6B FB C8 07 4B 9A F7 FC 1D 8D 0D D B D7 9B 0F 5B 8D A7 CE F B F DA B C7 D F6 C7 2C 1F EF B7 56 2D F0 1B 39 2F FB 42 6F DD F7 A1 5D 83 C C3 B9 E7 BA DF B7 DD 28 5C 62 6F 2F 17 9F D1 51 EC 82 0C 40 7B F B7 E0 C7 0B 5A 03 CB 3A BE 38 C2 DB EA A7 E6 8C 0B B8 0A C2 BA 54 1D CF D8 76 B1 DD F2 4E DF 3F C6 FF A7 BF 4B 89 D4 11 2F 3F 4F F7 93 B5 CB 6A AA 01 E1 E6 …

Injecting Malicious Code int main (void) { int x = 9; char s[4]; gets(s); printf ("s is: %s\n“, s); printf ("x is: %d\n“, x); } Stack s[0] s[1] s[2] s[3] x return address C Program a b c d e f g h...

Buffer Overflows int main (void) { int x = 9; char s[4]; gets(s); printf ("s is: %s\n“, s); printf ("x is: %d\n“, x); } > gcc -o bounds bounds.c > bounds abcdefghijkl s is: abcdefghijkl x is: 9 > bounds abcdefghijklm s is: abcdefghijklmn x is: > bounds abcdefghijkln s is: abcdefghijkln x is: > bounds aaa... [a few thousand characters] crashes shell (User input) = 0x6d = 0x6e Note: your results may vary (depending on machine, compiler, what else is running, time of day, etc.). This is what makes C fun! What does this kind of mistake look like in a popular server?

Code Red

Defenses Use a type-safe programming language (e.g., bounds checking) Write-xor-Execute pages – When the OS loads a page into memory, it is marked as either executable or writable: can’t be both – Hence: attacker can inject all the code it wants on the stack, but can’t jump to it and execute it

“Return-Oriented Programming” Return Address 3 Return Address 1 Return Address 2 Return Address 6 Return Address 4 Return Address 5 … Execution Stack 41 AC 16 FA A C 2D 43 B3 47 4C B8 C9 9F BB E9 4E 75 D5 A5 64 5F 5A B6 4B 61 A8 B4 AE B CA 20 F5 F2 16 C2 B4 8D DC B1 AD BB DA 39 B3 23 7D B3 BD D2 87 D6 D2 C5 67 8B BE 5F 09 BB B8 F7 EF E 8F 5E 4C C1 66 C1 1D B7 C F9 CD 82 2F 93 C2 10 5D DD 21 4D 16 F4 8E 36 7B 7D 91 C7 D3 E1 49 DB A5 FE A4 61 5C 5D E4 8C 8D 6C 33 C3 46 7E 27 F F6 F9 B0 E8 B F 6B FB C8 07 4B 9A F7 FC 1D 8D 0D D B D7 9B 0F 5B 8D A7 CE F B F DA B C7 D F6 C7 2C 1F EF B7 56 2D F0 1B 39 2F FB 42 6F DD F7 A1 5D 83 C C3 B9 E7 BA DF B7 DD 28 5C 62 6F 2F 17 9F D1 51 EC 82 0C 40 7B F B7 E0 C7 0B 5A 03 CB 3A BE 38 C2 DB EA A7 E6 8C 0B B8 0A C2 BA 54 1D CF D8 76 B1 DD F2 4E DF 3F C6 FF A7 BF 4B 89 D4 11 2F 3F 4F F7 93 B5 CB 6A AA 01 E1 E6 … Defeats WoX defense! Attacker can run any code they want by finding a Turing-complete set of “gadgets” it in your program and jumping to them!

Does it really work? Likelihood of finding enough gadgets in “random” bytes to make Turing-complete libc (C library included in nearly all Unix programs) contains more than enough (18MB ~ expect to have ~ RET (c3) instructions) Demonstration of attack on voting machine:

Vulnerability Detection Input: an x86 program P Output: True if there is some input w, such that running P on w allows the attacker to overwrite return addresses on stack; False otherwise.

Example: Morris Internet Worm (1988) P = fingerd – Program used to query user status (running on most Unix servers) isVulnerable(P)? Yes, for w = “ nop 400 pushl $68732f pushl $6e69622f movl sp,r10 pushl $0 pushl $0 pushl r10 pushl $3 movl sp,ap chmk $3b ” – Worm infected several thousand computers (~10% of Internet in 1988)

Vulnerability Detection Input: an x86 program P Output: True if there is some input w, such that running P on w allows the attacker to overwrite return addresses on stack; False otherwise.

Vulnerability Detection is Undecidable

“Solving” Undecidable Problems Undecidable means there is no program that 1. Always gives the correct answer, and 2. Always terminates Must give up one of these: – Giving up #2 is not acceptable in most cases – Must give up #1: cannot be correct on all inputs Or change the problem – e.g., modify P to make it invulnerable, etc.

“Impossibility” of Vulnerability Detection

Actual Vulnerability Detectors Sometimes give the wrong answer: – “False positive”: say P is a vulnerable when it isn’t – “False negative”: say P is safe when it is Heuristics to find common errors Heuristics to rank-order possible problems Can Microsoft squash 63,000 bugs in Windows 2000? … Overall, there are more than 65,000 "potential issues" that could emerge as problems, as discovered by Microsoft's Prefix tool. Microsoft is estimating that 28,000 of these are likely to be "real" problems. Can Microsoft squash 63,000 bugs in Windows 2000? … Overall, there are more than 65,000 "potential issues" that could emerge as problems, as discovered by Microsoft's Prefix tool. Microsoft is estimating that 28,000 of these are likely to be "real" problems.

Computability in Theory and Practice (Intellectual Computability Discussion on TV)

Ali G Problem Input: a list of numbers (mostly 9s) Output: the product of the numbers Is L ALIG decidable? L ALIG = { | each k i represents a number and p represents a number that is the product of all the k i s. numbers } Yes. It is easy to see a simple algorithm (e.g., elementary school multiplication) that decides it. Can real computers solve it?

Ali G was Right! Theory assumes ideal computers: – Unlimited, perfect memory – Unlimited (finite) time Real computers have: – Limited memory, time, power outages, flaky programming languages, etc. – There are many decidable problems we cannot solve with real computer: the actual inputs do matter (in practice, but not in theory!)

Charge Exam 2 out now Due at beginning of class, Tuesday It has some pretty tough questions (and no really easy questions): don’t get stressed out if you can’t answer everything