Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes.

Slides:



Advertisements
Similar presentations
Turing Machines Memory = an infinitely long tape Persistent storage A read/write tape head that can move around the tape Initially, the tape contains only.
Advertisements

Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Formal Language, chapter 4, slide 1Copyright © 2007 by Adam Webber Chapter Four: DFA Applications.
Signals and Systems March 25, Summary thus far: software engineering Focused on abstraction and modularity in software engineering. Topics: procedures,
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Timed Automata.
SYMBOLIC MODEL CHECKING: STATES AND BEYOND J.R. Burch E.M. Clarke K.L. McMillan D. L. Dill L. J. Hwang Presented by Rehana Begam.
6/14/991 Symbolic verification of systems with state machines David L. Dill Jeffrey Su Jens Skakkebaek Computer System Laboratory Stanford University.
1 Formal Methods in SE Qaisar Javaid Assistant Professor Lecture 05.
Great Theoretical Ideas in Computer Science.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata are like Turing Machines with a restriction: The working space of the tape is the space of the.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
1 Homework Turn in HW2 at start of next class. Starting Chapter 2 K&R. Read ahead. HW3 is on line. –Due: class 9, but a lot to do! –You may want to get.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Introduction to Computers and Programming. Some definitions Algorithm: –A procedure for solving a problem –A sequence of discrete steps that defines such.
25/06/2015Marius Mikucionis, AAU SSE1/22 Principles and Methods of Testing Finite State Machines – A Survey David Lee, Senior Member, IEEE and Mihalis.
1 Decidability continued. 2 Undecidable Problems Halting Problem: Does machine halt on input ? State-entry Problem: Does machine enter state halt on input.
CS 267: Automated Verification Lecture 13: Bounded Model Checking Instructor: Tevfik Bultan.
Streaming Tree Transducers Loris D'Antoni University of Pennsylvania Joint work with Rajeev Alur 1.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Induction and recursion
PROGRAMMING USING AUTOMATA AND TRANSDUCERS Loris D’AntoniMargus Veanes.
 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.
Signals and Systems March 25, Summary thus far: software engineering Focused on abstraction and modularity in software engineering. Topics: procedures,
Streaming Tree Transducers Loris D'Antoni University of Pennsylvania Joint work with Rajeev Alur 1.
FAST : a Transducer Based Language for Manipulating Trees Presented By: Loris D’Antoni Joint work with: Margus Veanes, Ben Livshits, David Molnar.
IT253: Computer Organization
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Application of Finite Geometry LDPC code on the Internet Data Transport Wu Yuchun Oct 2006 Huawei Hisi Company Ltd.
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
The Beauty and Joy of Computing Lecture #3 : Creativity & Abstraction UC Berkeley EECS Lecturer Gerald Friedland.
Lecture 1 Computation and Languages CS311 Fall 2012.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
CSCI 2670 Introduction to Theory of Computing Instructor: Shelby Funk.
Fast and Precise Sanitizer Analysis with B EK Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes USENIX Security.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
A Universal Turing Machine
Strings in MIPS. Chapter 2 — Instructions: Language of the Computer — 2 Character Data Byte-encoded character sets – ASCII: 128 characters 95 graphic,
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Ivan Lanese Computer Science Department University of Bologna/INRIA Italy Decidability Results for Dynamic Installation of Compensation Handlers Joint.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Mathematical Preliminaries
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
The Complexity of Tree Transducer Output Languages FSTTCS 2008, Bengaluru The Univ. of Tokyo Kazuhiro Inaba NICTA, and UNSW Sebastian Maneth.
TM Design Macro Language D and SD MA/CSSE 474 Theory of Computation.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Convolutional Coding In telecommunication, a convolutional code is a type of error- correcting code in which m-bit information symbol to be encoded is.
Quantified Data Automata on Skinny Trees: an Abstract Domain for Lists Pranav Garg 1, P. Madhusudan 1 and Gennaro Parlato 2 1 University of Illinois at.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Latches, Flip Flops, and Memory ECE/CS 252, Fall 2010 Prof. Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin – Madison.
Universal Turing Machine
MA/CSSE 474 Theory of Computation Decision Problems, Continued DFSMs.
Introduction toData structures and Algorithms
Minimization of Symbolic Transducers
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Digital Encodings.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Introduction to Data Structure
CSE 370 – Winter Sequential Logic-2 - 1
Presentation transcript:

Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes

Motivation String Encoders and Decoders are – Ubiquitous: – Ubiquitous: transformation from Unicode text files in the Internet to in-memory representation of text – Hard to write: – Hard to write: they use unintuitive logic in order to enable efficiency – Hard to verify: – Hard to verify: big state space, alphabets are very big (2 16 elements). Previous techniques blow up for small decoders. 2

A simple example: BASE64 encoder 3 Bytes 4 Base64 3 Bytes  4 Base64 characters Decoder similar (every 4 encodes 3) bit manipulations Uses bit manipulations to be efficient How do we model it and prove it correct? 3 Text contentMan Bytes Bit Pattern Index Base64 EncodedTWFu

What Properties do we check? Encoder, Decoder denoted by E,D E o D = I D o E = I dom(E) = bytes dom(D) = Base64 bytes We need – Equivalence checking – Function Composition (our model should be closed under composition) 4

Bek code program base64encode(input){ return iter(x in input)[q:=0;r:=0;]{ case (x>0xFF): raise InvalidCharacter; case (q==0): yield (base64(x>>2)); q:=1; r:=(x&3)<<4; case (q==1): yield (base64(r|(x>>4))); q:=2; r:=(x&0xF)<<2; case (q==2): yield (base64((r|(x>>6))), base64(x&0x3F)); q:=0; r:=0; end case (q==1): yield (base64(r),'=','='); end case (q==2): yield (base64(r),'='); }; } 5 How do we analyze this code?

Trust me! It is tricky! [12/12/12 11:35:49 PM] Margus Veanes: I think it is doable, smth that is like ([A-Z2-7]{4}... )* [12/12/12 11:35:57 PM] Loris D'Antoni: ok ill try [12/12/12 11:36:22 PM] Margus Veanes: then you can ry to see the difference compared to the domain of the decoder [12/12/12 11:37:42 PM] Loris D'Antoni: it seems that also on this counterex it doesn't work [12/12/12 11:37:43 PM] Loris D'Antoni: DP2A==== [12/12/12 11:37:50 PM] Loris D'Antoni: which maybe it's a bad one in this sense [12/12/12 11:37:52 PM] Loris D'Antoni: ill check now [12/12/12 11:40:45 PM] Margus Veanes: actually the domain of the decoder looks wrong, it allows 8 and 9 [12/12/12 11:40:46 PM] Margus Veanes: [12/12/12 11:40:58 PM] Loris D'Antoni: yeh i fixed that in my version …COUPLE OF HACKS LATER… [12/13/12 12:24:02 AM] Loris D'Antoni: ok, found bug and fixed it, now proved them correct. Will work on others tomorrow. Was very silly but hard to spot [12/13/12 12:24:35 AM] Margus Veanes:... this is why the analysis we can do is useful :-) [12/13/12 12:24:45 AM] Loris D'Antoni: yeh i was mapping [12/13/12 12:24:46 AM] Loris D'Antoni: ==> 2..7 [12/13/12 12:24:58 AM] Loris D'Antoni: instead of ==> '2'..'7' 6 Brief DEMO

Attempt 1: Finite Transducers 7 MMa n / [TWFu] M / [] a / [] ….. Finite set of states Each transition reads an input symbol and outputs a sequence of symbols Mapping from strings into strings Blue state (final), for which the mapping is defined 2 8 edges out of every state and 2 16 states Decidable equivalence and closure under composition

Attempt 2: Symbolic Finite Transducers [POPL12] 8 MMa λx. x==‘M’ / [λx. x>>2] λx. x==‘a’ / [λx. x>>4,…] ….. Guards are predicates over any decidable theory instead of single characters Output is a function of the input In this case uses theory of bit-vectors Better reflects implementation operations Analysis is still decidable (equivalence, composition) We did not improve much: still state explosion Supports symbolic updates such as bit-vectors

Attempt 3: Symbolic Transducers [POPL12] 9 12 True / [r|(x>>6), x&0x3F], r := 0 True / [x>>2], r := (x&3)<<4 True / [r|(x>>4)], r := (x&0xF)<<2 0 Register can store values and is updated in transitions Inputs and outputs can inspect and use register value Logic is the same as for implementation!! No state explosion No state explosion!! Closed under sequential composition undecidable Analysis (equivalence) is undecidable in general… We need a way to eliminate the registers Registers

Register Elimination: the naïve way x / [r|(x>>6), x&0x3F], r := 0 x / [x>>2], r := (x&3)<<4 x / [r|(x>>4)], r := (x&0xF)<<2 0 MMa n / [(((((M&3) >4))&0xF) >6), n&0x3F] M / [M>>2] a / [((M&3) >4)] ….. Via enumeration: State Explosion, but automatic Can do analysis, but very slow… Doesn’t work if alphabet infinite: waste of Symbolic analysis We need a Better model ST SFT

Text contentMan Byte Bit Pattern Index Base64 EncodedTWFu A simple example: BASE64 3 Bytes4 Base64 3 Bytes  4 Base64 characters Decoder similar (every 4 encodes 3) bit manipulations Uses bit manipulations to be efficient How do we model it and prove it correct? 11

Extended Symbolic Finite Transducers 12 [x 1,x 2,x 3 ] / [x 1 >>2, ((x 1 &3) >4), ((x 2 &0xF) >6), x 3 &0x3F] 0 No state explosion No state explosion Analysis can be done for several interesting cases (in particular for encoders) But, how do we pass from STs to ESFTs? Read sequences of symbols Output is a function of all the 3 symbols

Register Elimination: the good way 1/2 12 x / [r|(x>>6), x&0x3F], r := 0 x / [x>>2], r := (x&3)<<4 x / [r|(x>>4)], r := (x&0xF)<<2 0 [x 1,x 2 ] / [r|(x 1 >>4), ((x 1 &0xF) >6), x 2 &0x3F], r:=0 0 ST ESFT 13 1 x / [x>>2], r := (x&3)<<4 2

Register Elimination: the good way 2/2 [x 1,x 2,x 3 ] / [x 1 >>2, ((x 1 &3) >4), ((x 2 &0xF) >6), x 3 &0x3F] 0 Fast and supports infinite alphabets Not always possible, but works for encoders/decoders 14 [x 1,x 2 ] / [r|(x 1 >>4), ((x 1 &0xF) >6), x 2 &0x3F], r:=0 0 1 x / [x>>2], r := (x&3)<<4 1

Composition of ESFTs 15 ESFT E ESFT D ST E’ ST D’ ST E’oD’ ESFT EoD Use of registers to remember values Uses ST closure under composition Register elimination Not closed in general

Equivalence Semi-Decision Procedure First we check equivalence on domain intersection (hard) then we check domain equivalence (easier in this case). 16 (λ(x1,x2).True)/[x1,x2] λ(x).True/ [x] 10 0,1 λ(x1,x2).True/([x1,x2],[x1,x2]) We build a product transducer

Unicode Case Study We analyzed UTF8 to UTF16 encoder (E) and decoder (D) 17 TestRunning Time Dom(E) = UTF1647 ms Dom(EoD) = UTF16109 ms Dom(D) = UTF8156 ms Dom(DoE) = UTF8320 ms EoD=Identity (naive) 82,000 ms DoE=Identity (naive) 134,000 ms EoD=Identity (new algorithm) 123 ms DoE=Identity (new algorithm) 215 ms Complete analysis in less than a second

Result Summary ESFTs ESFTs a new transducer model for representing encoders and decoders register elimination algorithm ST ESFTs A new register elimination algorithm from ST to ESFTs, independent from input alphabet Correctness analysis Correctness analysis of real programs: Unicode, Base64 encoders and decoders Automatic code generation Automatic Javascript code generation of the verified code Check it out Transducers are cool!! 18

Future Work theory Understand the theory of ESFTs (coming soon) – Composition closure, equivalence… tree transformations Extend the model to tree transformations – Widely used in NLP Analyze more complex scenarios – List manipulating programs 19

Thank you Loris D’Antoni 20