Download presentation
Presentation is loading. Please wait.
Published bySylvia Harris Modified over 9 years ago
1
cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans
2
Menu Today: – Preparing for Exam 1 – Language class for Deterministic PDAs – Applications of DFAs Thursday: – Exam Review (if you send questions and/or topics) – Applications of probabilistic DFAs and Grammars
3
Exam 1 In class, next Tuesday, 2 March Covers: Classes 1-9 (10 and 11) Sipser Ch 0-2 Problem Sets 1-3 + Comments Exam 1 Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size.
4
What’s on the Exam? Definitions Language, problem, sets Constructing and understanding computing models Finite automata (DFA, NFA) Pushdown automata (DPDA, NPDA) Grammars (Context-Free Grammar) Language Classes: Regular and Context Free Show a language is in the class Show a language is not in the class Prove or disprove a closure property Proof Methods Proof by Induction Proof by Construction Understand and use the pumping lemmas for RL and CFL Sample exam on website should give you a good idea what to expect Your exam will probably also have “what’s wrong with this proof” questions
5
Exam 1 Notesheet For Exam 1, you may use only: – Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides) of notes that you create You may work with others to create your notes page.
6
Admiral Grace Hopper John von Neumann Albert Einstein
7
Exam Help Available Office Hours: – Thursdays, 8:30-9:30am – Thursdays, after class – Fridays, 10-11:30am (Sonali in Stacks) – Mondays, 1:15-3pm TA’s Exam Review Session – This Sunday, 5-6:30pm, Olsson 228E
8
s All Languages Regular Languages (DFA, NFA, RE, RG) Finite Languages Context-Free (CFG or NPDA) w anan anbncnanbncn ww Where are the languages recognized by a Deterministic PDA?
9
Proving Set Equivalence A = B A B and B A Sets A and B are equivalent if A is a subset of B and B is a subset of A. B A A A BB AB A
10
Proving Formalism Equivalence
12
Proving Formalism Non-Equivalence
13
s All Languages Regular Languages (DFA, NFA, RE, RG) Context-Free (CFG or NPDA) Which of these could be true? anbnanbn
14
Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA How can we distinguish these two plausible possibilities?
15
Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA How can we distinguish these two plausible possibilities? Find some language A that can be recognized by some NPDA but not by any DPDA. A Prove by construction: for any NPDA, there is a DPDA that recognizes the same language.
17
ε, ε $ a, ε + ε, ε ε b, + ε ε, $ ε ε, ε ε b, + ε b, ε ε ε, $ ε
18
Proof by contradiction: Assume there is a DPDA that recognizes A. Show how to construct a NPDA that recognizes some language we know is not context free. Proved by construction: We showed an NPDA that recognizes A.
19
Proof by contradiction. Suppose there is a DPDA M that recognizes A. It must be in an accept state only after processing a i b i and a i b 2i. … a, α β b, α β 2i transitions, consuming 0 i 1 i … b, α β i transitions, consuming 1 i Construct M’ : copy all the states on the second half, replacing b with c : … a, α β b, α β … c, α β What is the language of M’ ?
20
Proof by contradiction. Suppose there is a DPDA M that recognizes A. It must be in an accept state only after processing a i b i and a i b 2i. … a, α β b, α β … Construct M’ : copy all the states on the second half, replacing b with c : … a, α β b, α β … c, α β Not a Context-Free Language! We have a contradiction: if A is in L(DPDA), we could use the DPDA that recognizes A to construct an DPDA that recognizes a non-context-free language! Hence, A must not be in L(DPDA).
21
s All Languages Regular Languages (DFA, NFA, RE, RG) Context-Free (CFG or NPDA) anbnanbn A Deterministic Context-Free Languages Recognized by a DPDA (or DCFG) Context-Free Languages Deterministic Context-Free Languages Regular Languages
22
DFAs in Practice
23
Malware Scanner W32.Bolzano.Gen: 576a222bd2c20400558b4c240cd9ffff 07fbffffff{0-2}5c4e544c445200{0-2} 5c57494e4e545c73797374656d 33325c6e746f736b726e6c2e657 86500{0-29}3b4658 W32.MyLife.E: 7a6172793230*40656d 61696c2e636f6d Note: These are the signatures from ClamAV, an open source virus scanner. Files Network Traffic
24
String Matching q0q1q2q3q4q5 t ru t h We hold these truths to be self-evident, that … How much work is it to scan a string of length N for a signature?
25
Faster String Matching q0q1q2q3q4q5 t ru t h We hold these truths to be self-evident, that … s[4] = h? s[10] = h? truth s[9] = t? s[8] = u? truth Skip table: a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q, r, s, v, w, x, y, z: 6 h: 0 r: 4 t: 1 u: 2
26
DFA / Skipping DFA Is a “Skipping DFA” still a DFA? (That is, does it still only accept the Regular Languages?)
27
J. Strother Moore (UT Austin) Boyer-Moore Fast String Searching Algorithm (1977) Best case: N/(w+1) comparisons where N is the length of the text and w is the length of the search string Is this fast enough for a malware scanner?
28
Virus Detection Total number of signatures: 720,033 Nate Paul’s study Can we scan one input for many possible malware signatures quickly?
29
Combining DFAs? Regular languages closed under union: q0q0 q A0 q B0 q A1 q B1 ε ε a a … … How many states are there now?
30
Signatures First byte:Set of signatures: 00000000~720000/256 00000001~720000/256 00000010~720000/256 … 11111111~720000/256
31
Try a Trie q0 q00 q01 q02 qFF 0x00 0x01 0x02 0xFF … q0000 q0001 q0002 q01FF 0x00 0x01 0x02 0xFF … 720000/(256*256) ~ 11 Alfred V. Aho and Margaret J. Corasick, 1975 q0000 Alure ona 0x02
32
Scanner Demo http://www.virustotal.com
33
Evasive Malware Metamorphic Code: as virus propagates, each new copy is different How hard is it to automatically modify code without changing its behavior?
34
Detecting Evasive Malware Less exact signatures (e.g., W32.MyLife.E: 7a6172793230*40656d61696c2e636f6d) – Dangerous – start matching benign programs if you’re not careful! Behavioral signatures: match the behavior, not the program text – Undecidable in general (we’ll see in a few weeks) – Expensive and difficult in practice (but done by all decent scanners)
35
Faster String Scanning
36
Charge We focus on DFAs, NFAs, PDAs, CFGs, etc. as abstract models: Number of states, time to process, etc. don’t matter Lots of real applications of these models: but in practice, what matters is different If you have topics you want me to review, post comments (on today’s class announcement) by 5pm tomorrow.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.