Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Slides:



Advertisements
Similar presentations
Lecture 10: Context-Free Languages Contextually David Evans
Advertisements

CSCI 3130: Formal languages and automata theory Tutorial 5
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
1 Pushdown Automata (PDA) Informally: –A PDA is an NFA-ε with a stack. –Transitions are modified to accommodate stack operations. Questions: –What is a.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Cs3102: Theory of Computation Class 7: Context-Free Languages Spring 2010 University of Virginia David Evans.
CS21 Decidability and Tractability
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
PDAs => CFGs Sipser 2.2 (pages ). Last time…
PDAs => CFGs Sipser 2.2 (pages ). Last time…
CFG => PDA Sipser 2 (pages ). CS 311 Fall Formally… A pushdown automaton is a sextuple M = (Q, Σ, Γ, δ, q 0, F), where – Q is a finite set.
CS 310 – Fall 2006 Pacific University CS310 Finite Automata Sections:1.1 page 44 September 8, 2006.
Introduction to the Theory of Computation John Paxton Montana State University Summer 2003.
Courtesy Costas Busch - RPI1 NPDAs Accept Context-Free Languages.
Fall 2004COMP 3351 NPDA’s Accept Context-Free Languages.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2009 with acknowledgement.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
David Evans cs302: Theory of Computation University of Virginia Computer Science Lecture 2: Modeling Computers.
Today Chapter 2: (Pushdown automata) Non-CF languages CFL pumping lemma Closure properties of CFL.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
1 A Non Context-Free Language (We will prove it at the next class)
CS 3240: Languages and Computation Pushdown Automata & CF Grammars NOTE: THESE ARE ONLY PARTIAL SLIDES RELATED TO WEEKS 9 AND 10. PLEASE REFER TO THE TEXTBOOK.
Nathan Brunelle Department of Computer Science University of Virginia Theory of Computation CS3102 – Spring 2014 A tale.
Final Exam Review Cummulative Chapters 0, 1, 2, 3, 4, 5 and 7.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
Cs3102: Theory of Computation Class 2: Problems and Finite Automata Spring 2010 University of Virginia David Evans TexPoint fonts used in EMF. Read the.
Cs3102: Theory of Computation Class 4: Nondeterminism Spring 2010 University of Virginia David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
CSCI 2670 Introduction to Theory of Computing September 20, 2005.
Pushdown Automata CS 130: Theory of Computation HMU textbook, Chap 6.
Pushdown Automata (PDAs)
Cs3102: Theory of Computation Class 6: Pushdown Automata Spring 2010 University of Virginia David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
CSCI 2670 Introduction to Theory of Computing August 26, 2004.
CSCI 2670 Introduction to Theory of Computing August 25, 2005.
Prof. Busch - LSU1 NFAs accept the Regular Languages.
Cs3102: Theory of Computation Class 14: Turing Machines Spring 2010 University of Virginia David Evans.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Cs3102: Theory of Computation Class 8: Non-Context-Free Languages Spring 2010 University of Virginia David Evans.
CSE 311 Foundations of Computing I Lecture 27 FSM Limits, Pattern Matching Autumn 2012 CSE
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Review for final pm. 2 Review for Midterm Induction – On integer: HW1, Ex 2.2.9b p54 – On length of string: Ex p53, HW2, HW3.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2010 with acknowledgement.
Pumping Lemma for CFLs. Theorem 7.17: Let G be a CFG in CNF and w a string in L(G). Suppose we have a parse tree for w. If the length of the longest path.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 9 Mälardalen University 2006.
Regular Expressions Fundamental Data Structures and Algorithms Peter Lee March 13, 2003.
CSCI 3130: Formal languages and automata theory Tutorial 3 Chin.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CSCI 2670 Introduction to Theory of Computing September 7, 2004.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
CS 154 Formal Languages and Computability March 15 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
Cs3102: Theory of Computation (aka cs302: Discrete Mathematics II) Spring 2010 University of Virginia David Evans.
CSCI 2670 Introduction to Theory of Computing September 16, 2004.
Theory of Computation. Introduction to The Course Lectures: Room ( Sun. & Tue.: 8 am – 9:30 am) Instructor: Dr. Ayman Srour (Ph.D. in Computer Science).
CSCE 355 Foundations of Computation
NPDAs Accept Context-Free Languages
PDAs Accept Context-Free Languages
Summary.
DPDA Deterministic PDA
Deterministic PDAs - DPDAs
CSE 105 theory of computation
DPDA Deterministic PDA
CSE 105 theory of computation
CSE 105 theory of computation
Presentation transcript:

cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans

Menu Today: – Preparing for Exam 1 – Language class for Deterministic PDAs – Applications of DFAs Thursday: – Exam Review (if you send questions and/or topics) – Applications of probabilistic DFAs and Grammars

Exam 1 In class, next Tuesday, 2 March Covers: Classes 1-9 (10 and 11) Sipser Ch 0-2 Problem Sets Comments Exam 1 Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size.

What’s on the Exam? Definitions Language, problem, sets Constructing and understanding computing models Finite automata (DFA, NFA) Pushdown automata (DPDA, NPDA) Grammars (Context-Free Grammar) Language Classes: Regular and Context Free Show a language is in the class Show a language is not in the class Prove or disprove a closure property Proof Methods Proof by Induction Proof by Construction Understand and use the pumping lemmas for RL and CFL Sample exam on website should give you a good idea what to expect Your exam will probably also have “what’s wrong with this proof” questions

Exam 1 Notesheet For Exam 1, you may use only: – Your own brain and body – A low-tech writing instrument (pen or pencil) – A single page (both sides) of notes that you create You may work with others to create your notes page.

Admiral Grace Hopper John von Neumann Albert Einstein

Exam Help Available Office Hours: – Thursdays, 8:30-9:30am – Thursdays, after class – Fridays, 10-11:30am (Sonali in Stacks) – Mondays, 1:15-3pm TA’s Exam Review Session – This Sunday, 5-6:30pm, Olsson 228E

s All Languages Regular Languages (DFA, NFA, RE, RG) Finite Languages Context-Free (CFG or NPDA) w anan anbncnanbncn ww Where are the languages recognized by a Deterministic PDA?

Proving Set Equivalence A = B  A  B and B  A Sets A and B are equivalent if A is a subset of B and B is a subset of A. B A A A  BB  AB  A

Proving Formalism Equivalence

Proving Formalism Non-Equivalence

s All Languages Regular Languages (DFA, NFA, RE, RG) Context-Free (CFG or NPDA) Which of these could be true? anbnanbn

Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA How can we distinguish these two plausible possibilities?

Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA Regular Languages (DFA, NFA, RE, RG) Context-Free (NPDA) DPDA How can we distinguish these two plausible possibilities? Find some language A that can be recognized by some NPDA but not by any DPDA. A Prove by construction: for any NPDA, there is a DPDA that recognizes the same language.

ε, ε  $ a, ε  + ε, ε  ε b, +  ε ε, $  ε ε, ε  ε b, +  ε b, ε  ε ε, $  ε

Proof by contradiction: Assume there is a DPDA that recognizes A. Show how to construct a NPDA that recognizes some language we know is not context free. Proved by construction: We showed an NPDA that recognizes A.

Proof by contradiction. Suppose there is a DPDA M that recognizes A. It must be in an accept state only after processing a i b i and a i b 2i. … a, α  β b, α  β 2i transitions, consuming 0 i 1 i … b, α  β i transitions, consuming 1 i Construct M’ : copy all the states on the second half, replacing b with c : … a, α  β b, α  β … c, α  β What is the language of M’ ?

Proof by contradiction. Suppose there is a DPDA M that recognizes A. It must be in an accept state only after processing a i b i and a i b 2i. … a, α  β b, α  β … Construct M’ : copy all the states on the second half, replacing b with c : … a, α  β b, α  β … c, α  β Not a Context-Free Language! We have a contradiction: if A is in L(DPDA), we could use the DPDA that recognizes A to construct an DPDA that recognizes a non-context-free language! Hence, A must not be in L(DPDA).

s All Languages Regular Languages (DFA, NFA, RE, RG) Context-Free (CFG or NPDA) anbnanbn A Deterministic Context-Free Languages Recognized by a DPDA (or DCFG) Context-Free Languages Deterministic Context-Free Languages Regular Languages

DFAs in Practice

Malware Scanner W32.Bolzano.Gen: 576a222bd2c b4c240cd9ffff 07fbffffff{0-2}5c4e544c445200{0-2} 5c57494e4e545c d 33325c6e746f736b726e6c2e {0-29}3b4658 W32.MyLife.E: 7a *40656d 61696c2e636f6d Note: These are the signatures from ClamAV, an open source virus scanner. Files Network Traffic

String Matching q0q1q2q3q4q5 t ru t h We hold these truths to be self-evident, that … How much work is it to scan a string of length N for a signature?

Faster String Matching q0q1q2q3q4q5 t ru t h We hold these truths to be self-evident, that … s[4] = h? s[10] = h? truth s[9] = t? s[8] = u? truth Skip table: a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q, r, s, v, w, x, y, z: 6 h: 0 r: 4 t: 1 u: 2

DFA / Skipping DFA Is a “Skipping DFA” still a DFA? (That is, does it still only accept the Regular Languages?)

J. Strother Moore (UT Austin) Boyer-Moore Fast String Searching Algorithm (1977) Best case: N/(w+1) comparisons where N is the length of the text and w is the length of the search string Is this fast enough for a malware scanner?

Virus Detection Total number of signatures: 720,033 Nate Paul’s study Can we scan one input for many possible malware signatures quickly?

Combining DFAs? Regular languages closed under union: q0q0 q A0 q B0 q A1 q B1 ε ε a a … … How many states are there now?

Signatures First byte:Set of signatures: ~720000/ ~720000/ ~720000/256 … ~720000/256

Try a Trie q0 q00 q01 q02 qFF 0x00 0x01 0x02 0xFF … q0000 q0001 q0002 q01FF 0x00 0x01 0x02 0xFF … /(256*256) ~ 11 Alfred V. Aho and Margaret J. Corasick, 1975 q0000 Alure ona 0x02

Scanner Demo

Evasive Malware Metamorphic Code: as virus propagates, each new copy is different How hard is it to automatically modify code without changing its behavior?

Detecting Evasive Malware Less exact signatures (e.g., W32.MyLife.E: 7a *40656d61696c2e636f6d) – Dangerous – start matching benign programs if you’re not careful! Behavioral signatures: match the behavior, not the program text – Undecidable in general (we’ll see in a few weeks) – Expensive and difficult in practice (but done by all decent scanners)

Faster String Scanning

Charge We focus on DFAs, NFAs, PDAs, CFGs, etc. as abstract models: Number of states, time to process, etc. don’t matter Lots of real applications of these models: but in practice, what matters is different If you have topics you want me to review, post comments (on today’s class announcement) by 5pm tomorrow.