Presentation is loading. Please wait.

Presentation is loading. Please wait.

Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes.

Similar presentations


Presentation on theme: "Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes."— Presentation transcript:

1 Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes

2 Motivation String Encoders and Decoders are – Ubiquitous: – Ubiquitous: transformation from Unicode text files in the Internet to in-memory representation of text – Hard to write: – Hard to write: they use unintuitive logic in order to enable efficiency – Hard to verify: – Hard to verify: big state space, alphabets are very big (2 16 elements). Previous techniques blow up for small decoders. 2

3 A simple example: BASE64 encoder 3 Bytes 4 Base64 3 Bytes  4 Base64 characters Decoder similar (every 4 encodes 3) bit manipulations Uses bit manipulations to be efficient How do we model it and prove it correct? 3 Text contentMan Bytes7797110 Bit Pattern010011010110000101101110 Index1922546 Base64 EncodedTWFu

4 What Properties do we check? Encoder, Decoder denoted by E,D E o D = I D o E = I dom(E) = bytes dom(D) = Base64 bytes We need – Equivalence checking – Function Composition (our model should be closed under composition) 4

5 Bek code program base64encode(input){ return iter(x in input)[q:=0;r:=0;]{ case (x>0xFF): raise InvalidCharacter; case (q==0): yield (base64(x>>2)); q:=1; r:=(x&3)<<4; case (q==1): yield (base64(r|(x>>4))); q:=2; r:=(x&0xF)<<2; case (q==2): yield (base64((r|(x>>6))), base64(x&0x3F)); q:=0; r:=0; end case (q==1): yield (base64(r),'=','='); end case (q==2): yield (base64(r),'='); }; } 5 How do we analyze this code?

6 Trust me! It is tricky! [12/12/12 11:35:49 PM] Margus Veanes: I think it is doable, smth that is like ([A-Z2-7]{4}... )* [12/12/12 11:35:57 PM] Loris D'Antoni: ok ill try [12/12/12 11:36:22 PM] Margus Veanes: then you can ry to see the difference compared to the domain of the decoder [12/12/12 11:37:42 PM] Loris D'Antoni: it seems that also on this counterex it doesn't work [12/12/12 11:37:43 PM] Loris D'Antoni: DP2A==== [12/12/12 11:37:50 PM] Loris D'Antoni: which maybe it's a bad one in this sense [12/12/12 11:37:52 PM] Loris D'Antoni: ill check now [12/12/12 11:40:45 PM] Margus Veanes: actually the domain of the decoder looks wrong, it allows 8 and 9 [12/12/12 11:40:46 PM] Margus Veanes: http://www.rise4fun.com/Bek/Cy3 [12/12/12 11:40:58 PM] Loris D'Antoni: yeh i fixed that in my version …COUPLE OF HACKS LATER… [12/13/12 12:24:02 AM] Loris D'Antoni: ok, found bug and fixed it, now proved them correct. Will work on others tomorrow. Was very silly but hard to spot [12/13/12 12:24:35 AM] Margus Veanes:... this is why the analysis we can do is useful :-) [12/13/12 12:24:45 AM] Loris D'Antoni: yeh i was mapping [12/13/12 12:24:46 AM] Loris D'Antoni: 26..31 ==> 2..7 [12/13/12 12:24:58 AM] Loris D'Antoni: instead of 26..31 ==> '2'..'7' 6 Brief DEMO

7 Attempt 1: Finite Transducers 7 MMa n / [TWFu] M / [] a / [] ….. Finite set of states Each transition reads an input symbol and outputs a sequence of symbols Mapping from strings into strings Blue state (final), for which the mapping is defined 2 8 edges out of every state and 2 16 states Decidable equivalence and closure under composition

8 Attempt 2: Symbolic Finite Transducers [POPL12] 8 MMa λx. x==‘M’ / [λx. x>>2] λx. x==‘a’ / [λx. x>>4,…] ….. Guards are predicates over any decidable theory instead of single characters Output is a function of the input In this case uses theory of bit-vectors Better reflects implementation operations Analysis is still decidable (equivalence, composition) We did not improve much: still state explosion Supports symbolic updates such as bit-vectors

9 Attempt 3: Symbolic Transducers [POPL12] 9 12 True / [r|(x>>6), x&0x3F], r := 0 True / [x>>2], r := (x&3)<<4 True / [r|(x>>4)], r := (x&0xF)<<2 0 Register can store values and is updated in transitions Inputs and outputs can inspect and use register value Logic is the same as for implementation!! No state explosion No state explosion!! Closed under sequential composition undecidable Analysis (equivalence) is undecidable in general… We need a way to eliminate the registers Registers

10 Register Elimination: the naïve way 10 12 x / [r|(x>>6), x&0x3F], r := 0 x / [x>>2], r := (x&3)<<4 x / [r|(x>>4)], r := (x&0xF)<<2 0 MMa n / [(((((M&3) >4))&0xF) >6), n&0x3F] M / [M>>2] a / [((M&3) >4)] ….. Via enumeration: State Explosion, but automatic Can do analysis, but very slow… Doesn’t work if alphabet infinite: waste of Symbolic analysis We need a Better model ST SFT

11 Text contentMan Byte7797110 Bit Pattern010011010110000101101110 Index1922546 Base64 EncodedTWFu A simple example: BASE64 3 Bytes4 Base64 3 Bytes  4 Base64 characters Decoder similar (every 4 encodes 3) bit manipulations Uses bit manipulations to be efficient How do we model it and prove it correct? 11

12 Extended Symbolic Finite Transducers 12 [x 1,x 2,x 3 ] / [x 1 >>2, ((x 1 &3) >4), ((x 2 &0xF) >6), x 3 &0x3F] 0 No state explosion No state explosion Analysis can be done for several interesting cases (in particular for encoders) But, how do we pass from STs to ESFTs? Read sequences of symbols Output is a function of all the 3 symbols

13 Register Elimination: the good way 1/2 12 x / [r|(x>>6), x&0x3F], r := 0 x / [x>>2], r := (x&3)<<4 x / [r|(x>>4)], r := (x&0xF)<<2 0 [x 1,x 2 ] / [r|(x 1 >>4), ((x 1 &0xF) >6), x 2 &0x3F], r:=0 0 ST ESFT 13 1 x / [x>>2], r := (x&3)<<4 2

14 Register Elimination: the good way 2/2 [x 1,x 2,x 3 ] / [x 1 >>2, ((x 1 &3) >4), ((x 2 &0xF) >6), x 3 &0x3F] 0 Fast and supports infinite alphabets Not always possible, but works for encoders/decoders 14 [x 1,x 2 ] / [r|(x 1 >>4), ((x 1 &0xF) >6), x 2 &0x3F], r:=0 0 1 x / [x>>2], r := (x&3)<<4 1

15 Composition of ESFTs 15 ESFT E ESFT D ST E’ ST D’ ST E’oD’ ESFT EoD Use of registers to remember values Uses ST closure under composition Register elimination Not closed in general

16 Equivalence Semi-Decision Procedure First we check equivalence on domain intersection (hard) then we check domain equivalence (easier in this case). 16 (λ(x1,x2).True)/[x1,x2] λ(x).True/ [x] 10 0,1 λ(x1,x2).True/([x1,x2],[x1,x2]) We build a product transducer

17 Unicode Case Study We analyzed UTF8 to UTF16 encoder (E) and decoder (D) 17 TestRunning Time Dom(E) = UTF1647 ms Dom(EoD) = UTF16109 ms Dom(D) = UTF8156 ms Dom(DoE) = UTF8320 ms EoD=Identity (naive) 82,000 ms DoE=Identity (naive) 134,000 ms EoD=Identity (new algorithm) 123 ms DoE=Identity (new algorithm) 215 ms Complete analysis in less than a second

18 Result Summary ESFTs ESFTs a new transducer model for representing encoders and decoders register elimination algorithm ST ESFTs A new register elimination algorithm from ST to ESFTs, independent from input alphabet Correctness analysis Correctness analysis of real programs: Unicode, Base64 encoders and decoders Automatic code generation Automatic Javascript code generation of the verified code Check it out http://rise4fun.com/Bek/http://rise4fun.com/Bek/ Transducers are cool!! 18

19 Future Work theory Understand the theory of ESFTs (coming soon) – Composition closure, equivalence… tree transformations Extend the model to tree transformations – Widely used in NLP Analyze more complex scenarios – List manipulating programs 19

20 Thank you Loris D’Antoni lorisdan@cis.upenn.eduQuestions? 20


Download ppt "Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes."

Similar presentations


Ads by Google