Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Similar presentations


Presentation on theme: "Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans."— Presentation transcript:

1 cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans

2 Menu Today: – Preparing for Exam 1 – Language class for Deterministic PDAs – Applications of DFAs Thursday: – Exam Review (if you send questions and/or topics) – Applications of probabilistic DFAs and Grammars

3 Exam 1 In class, next Tuesday, 2 March Covers: Classes 1-9 (10 and 11) Sipser Ch 0-2 Problem Sets 1-3 + Comments Exam 1 Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size.

4 What’s on the Exam? Definitions Language, problem, sets Constructing and understanding computing models Finite automata (DFA, NFA) Pushdown automata (DPDA, NPDA) Grammars (Context-Free Grammar) Language Classes: Regular and Context Free Show a language is in the class Show a language is not in the class Prove or disprove a closure property Proof Methods Proof by Induction Proof by Construction Understand and use the pumping lemmas for RL and CFL Sample exam on website should give you a good idea what to expect Your exam will probably also have “what’s wrong with this proof” questions

5 Exam 1 Notesheet For Exam 1, you may use only: – Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides) of notes that you create You may work with others to create your notes page.

6 Admiral Grace Hopper John von Neumann Albert Einstein

7 Exam Help Available Office Hours: – Thursdays, 8:30-9:30am – Thursdays, after class – Fridays, 10-11:30am (Sonali in Stacks) – Mondays, 1:15-3pm TA’s Exam Review Session – This Sunday, 5-6:30pm, Olsson 228E

8 s All Languages Regular Languages (DFA, NFA, RE, RG) Finite Languages Context-Free (CFG or NPDA) w anan anbncnanbncn ww Where are the languages recognized by a Deterministic PDA?

9 Proving Set Equivalence A = B  A  B and B  A Sets A and B are equivalent if A is a subset of B and B is a subset of A. B A A A  BB  AB  A

10 Proving Formalism Equivalence

11

12 Proving Formalism Non-Equivalence

13 s All Languages Regular Languages (DFA, NFA, RE, RG) Context-Free (CFG or NPDA) Which of these could be true? anbnanbn

14 Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA How can we distinguish these two plausible possibilities?

15 Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA How can we distinguish these two plausible possibilities? Find some language A that can be recognized by some NPDA but not by any DPDA. A Prove by construction: for any NPDA, there is a DPDA that recognizes the same language.

16

17 ε, ε  $ a, ε  + ε, ε  ε b, +  ε ε, $  ε ε, ε  ε b, +  ε b, ε  ε ε, $  ε

18 Proof by contradiction: Assume there is a DPDA that recognizes A. Show how to construct a NPDA that recognizes some language we know is not context free. Proved by construction: We showed an NPDA that recognizes A.

19 Proof by contradiction. Suppose there is a DPDA M that recognizes A. It must be in an accept state only after processing a i b i and a i b 2i. … a, α  β b, α  β 2i transitions, consuming 0 i 1 i … b, α  β i transitions, consuming 1 i Construct M’ : copy all the states on the second half, replacing b with c : … a, α  β b, α  β … c, α  β What is the language of M’ ?

20 Proof by contradiction. Suppose there is a DPDA M that recognizes A. It must be in an accept state only after processing a i b i and a i b 2i. … a, α  β b, α  β … Construct M’ : copy all the states on the second half, replacing b with c : … a, α  β b, α  β … c, α  β Not a Context-Free Language! We have a contradiction: if A is in L(DPDA), we could use the DPDA that recognizes A to construct an DPDA that recognizes a non-context-free language! Hence, A must not be in L(DPDA).

21 s All Languages Regular Languages (DFA, NFA, RE, RG) Context-Free (CFG or NPDA) anbnanbn A Deterministic Context-Free Languages Recognized by a DPDA (or DCFG) Context-Free Languages Deterministic Context-Free Languages Regular Languages

22 DFAs in Practice

23 Malware Scanner W32.Bolzano.Gen: 576a222bd2c20400558b4c240cd9ffff 07fbffffff{0-2}5c4e544c445200{0-2} 5c57494e4e545c73797374656d 33325c6e746f736b726e6c2e657 86500{0-29}3b4658 W32.MyLife.E: 7a6172793230*40656d 61696c2e636f6d Note: These are the signatures from ClamAV, an open source virus scanner. Files Network Traffic

24 String Matching q0q1q2q3q4q5 t ru t h We hold these truths to be self-evident, that … How much work is it to scan a string of length N for a signature?

25 Faster String Matching q0q1q2q3q4q5 t ru t h We hold these truths to be self-evident, that … s[4] = h? s[10] = h? truth s[9] = t? s[8] = u? truth Skip table: a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q, r, s, v, w, x, y, z: 6 h: 0 r: 4 t: 1 u: 2

26 DFA / Skipping DFA Is a “Skipping DFA” still a DFA? (That is, does it still only accept the Regular Languages?)

27 J. Strother Moore (UT Austin) Boyer-Moore Fast String Searching Algorithm (1977) Best case: N/(w+1) comparisons where N is the length of the text and w is the length of the search string Is this fast enough for a malware scanner?

28 Virus Detection Total number of signatures: 720,033 Nate Paul’s study Can we scan one input for many possible malware signatures quickly?

29 Combining DFAs? Regular languages closed under union: q0q0 q A0 q B0 q A1 q B1 ε ε a a … … How many states are there now?

30 Signatures First byte:Set of signatures: 00000000~720000/256 00000001~720000/256 00000010~720000/256 … 11111111~720000/256

31 Try a Trie q0 q00 q01 q02 qFF 0x00 0x01 0x02 0xFF … q0000 q0001 q0002 q01FF 0x00 0x01 0x02 0xFF … 720000/(256*256) ~ 11 Alfred V. Aho and Margaret J. Corasick, 1975 q0000 Alure ona 0x02

32 Scanner Demo http://www.virustotal.com

33 Evasive Malware Metamorphic Code: as virus propagates, each new copy is different How hard is it to automatically modify code without changing its behavior?

34 Detecting Evasive Malware Less exact signatures (e.g., W32.MyLife.E: 7a6172793230*40656d61696c2e636f6d) – Dangerous – start matching benign programs if you’re not careful! Behavioral signatures: match the behavior, not the program text – Undecidable in general (we’ll see in a few weeks) – Expensive and difficult in practice (but done by all decent scanners)

35 Faster String Scanning

36 Charge We focus on DFAs, NFAs, PDAs, CFGs, etc. as abstract models: Number of states, time to process, etc. don’t matter Lots of real applications of these models: but in practice, what matters is different If you have topics you want me to review, post comments (on today’s class announcement) by 5pm tomorrow.


Download ppt "Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans."

Similar presentations


Ads by Google