Text search and closure properties

Slides:



Advertisements
Similar presentations
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Properties.
Advertisements

Deterministic Finite Automata (DFA)
YES-NO machines Finite State Automata as language recognizers.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
Costas Busch - RPI1 Single Final State for NFAs. Costas Busch - RPI2 Any NFA can be converted to an equivalent NFA with a single final state.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
CS 310 – Fall 2006 Pacific University CS310 Strings, String Operators, and Languages Sections: August 30, 2006.
Fall 2004COMP 3351 Single Final State for NFA. Fall 2004COMP 3352 Any NFA can be converted to an equivalent NFA with a single final state.
1 Languages and Finite Automata or how to talk to machines...
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Nondeterminism.
Fall 2006Costas Busch - RPI1 Properties of Regular Languages.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
Lecture 4UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 4.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
1 A Single Final State for Finite Accepters. 2 Observation Any Finite Accepter (NFA or DFA) can be converted to an equivalent NFA with a single final.
Fall 2004COMP 3351 Regular Expressions. Fall 2004COMP 3352 Regular Expressions Regular expressions describe regular languages Example: describes the language.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA to regular.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
CSCI 3130: Formal languages and automata theory Tutorial 2 Chin.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Closure.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong NFA to DFA.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Text search.
1 Module 31 Closure Properties for CFL’s –Kleene Closure –Union –Concatenation CFL’s versus regular languages –regular languages subset of CFL.
CSCI 2670 Introduction to Theory of Computing September 1, 2005.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong NFA to.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
CSCI 3130: Formal languages and automata theory Tutorial 3 Chin.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA minimization.
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
CS 461 – Sept. 2 Review NFA  DFA Combining FAs to create new languages –union, intersection, concatenation, star –We can basically understand how these.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
1 Closure E.g., we understand number systems partly by understanding closure properties: Naturals are closed under +, , but not -, . Integers are closed.
Theory of Languages and Automata By: Mojtaba Khezrian.
Nondeterminism The Chinese University of Hong Kong Fall 2011
Languages.
Properties of Regular Languages
CSE 105 theory of computation
Single Final State for NFA
Intro to Theory of Computation
More on DFA minimization and DFA equivalence
CSE322 PROPERTIES OF REGULAR LANGUAGES
Properties of Regular Languages
CS 154, Lecture 3: DFANFA, Regular Expressions.
COSC 3340: Introduction to Theory of Computation
NFAs, DFAs, and regular expressions
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Closure Properties of Regular Languages
Closure Properties of Regular Languages
NFA to DFA conversion and regular expressions
Chapter 1 Introduction to the Theory of Computation
CSE 105 theory of computation
Non-regular languages
Non-regular languages
Pushdown automata The Chinese University of Hong Kong Fall 2011
Intro to Theory of Computation
Nondeterminism The Chinese University of Hong Kong Fall 2010
COSC 3340: Introduction to Theory of Computation
COSC 3340: Introduction to Theory of Computation
CSE 105 theory of computation
Presentation transcript:

Text search and closure properties The Chinese University of Hong Kong Fall 2010 CSCI 3130: Automata theory and formal languages Text search and closure properties Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

Text search

The program grep grep -E regexp file Searches for the occurrence of patterns matching a regular expression cat|12 {cat, 12} union [abc] {a, b, c} shorthand for a|b|c [ab][12] {a1, a2, b1, b2} concatenation (ab)* {e, ab, abab, ...} star [ab]? {e, a, b} zero or one (cat)+ {cat, catcat, ...} one or more [ab]{2} {aa, ab, ba, bb} {n} copies

Searching with grep Words containing savor or savour Words with 5 consecutive a or b grep –E `savou?r` words grep –E `[ab]{5}` words outsavor savor savored savorer savorily savoriness savoringly savorless savorous savorsome savory savour unsavored unsavoredly unsavoredness unsavorily unsavoriness unsavory grabbable grep –E `zo+zo+` words zoozoo

More grep commands a n f a s u f f e r l r e g u l a r e n s t a r . any symbol 1 2 3 4 5 [a-z] anything in a range a l e n \< beginning of line n f a $ end of line s u f f e r r e g u l a r 1 If a DFA is too hard, I do an ... 2 CSCI 3130 homework makes me ... s t a r 3 If a DFA likes it, it is ... 4 $10000000 = $10? 5 I study 3130 hard because it will make me ... grep –E `\<.ff.u..t` words

how do you look for... Words that start in cat and have another cat grep –E `\<cat.*cat` words Words with at least ten vowels? grep –E `([aeiouy].*){10}` words Words without any vowels? [^R] does not contain R grep –E `\<[^AEIOUYaeiouy]*$` words Words with exactly ten vowels? grep –E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words

How grep (could) work regular expression NFA NFA without e DFA input text file differences in class in grep [ab]?, a+, (cat){3} not allowed allowed input handling matches whole looks for pattern output accept/reject finds pattern

Implementation of grep How do you handle expressions like: [ab]? zero or one → ()|[ab] R? → e|R (cat)+ one or more → (cat)(cat)* R+ → RR* a{3} {n} copies → aaa R{n} → RR...R n times ? [^aeiouy] not containing any

Closure properties

Example The language L of strings that end in 101 is regular How about the language L of strings that do not end in 101? (0+1)*101

Example Hint: w does not end in 101 if and only if it ends in: or it has length 0, 1, or 2 So L can be described by the regular expression 000, 001, 010, 011, 100, 110 or 111 (0+1)*(000+001+010+010+100+110+111) + e + (0 + 1) + (0 + 1)(0 + 1)

Complement The complement L of a language L is the set of all strings that are not in L Examples (S = {0, 1}) L1 = all strings that end in 101 L1 = all strings that do not end in 101 = all strings end in 000, …, 111 or have length 0, 1, or 2 L2 = 1* = {e, 1, 11, 111, …} L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*

Example The language L of strings that contain 101 is regular How about the language L of strings that do not contain 101? (0+1)*101(0+1)* You can write a regular expression, but it is a lot of work!

Closure under complement If L is a regular language, so is L. To argue this, we can use any of the equivalent definitions for regular languages: The DFA definition will be most convenient We assume L has a DFA, and show L also has a DFA regular expression NFA DFA

Arguing closure under complement Suppose L is regular, then it has a DFA M Now consider the DFA M’ with the accepting and rejecting states of M reversed accepts L accepts strings not in L this is exactly L

NO! Food for thought Can we do the same thing with an NFA? (0+1)*10 0, 1 q0 q1 q2 (0+1)*10 1 0, 1 q0 q1 q2 (0+1)*

Intersection The intersection L  L’ is the set of strings that are in both L and L’ Examples: If L, L’ are regular, is L  L’ also regular? L = (0 + 1)*11 L’ = 1* L  L’ = 1*11 L = (0 + 1)*10 L’ = 1* L  L’ = ∅

Closure under intersection If L and L’ are regular languages, so is L  L’. To argue this, we can use any of the equivalent definitions for regular languages: regular expression NFA DFA Suppose L and L’ have DFAs, call them M and M’ Goal: Construct a DFA (or NFA) for L  L’

An example M M’ L = even number of 0s L’ = odd number of 1s M s0 s1 1 M’ L = even number of 0s L’ = odd number of 1s r0, s1 1 r0, s0 r1, s1 r1, s0 L  L’ = even number of 0s and odd number of 1s

Closure under intersection M and M’ DFA for L  L’ states Q = {r1, ..., rn} Q’ = {s1, ..., sn’} Q × Q’ = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn’)} start state ri for M sj for M’ (ri, sj) accepting states F for M F’ for M’ F × F’ = {(ri, sj): ri ∈ F, sj ∈ F’} Whenever M is in state ri and M’ is in state sj, the DFA for L  L’ will be in state (ri, sj)

Closure under intersection M and M’ DFA for L  L’ ri rj a transitions in M ri, si’ rj, sj’ a si’ sj’ a in M’

Reversal The reversal wR of a string w is w written backwards The reversal LR of a language L is the language obtained by reversing all its strings w = cave wR = evac L = {cat, dog} LR = {tac, god}

Reversal of regular languages L = all strings that end in 101 is regular How about LR? This is the language of all strings beginning in 101 Yes, because it is represented by (0+1)*101 101(0+1)*

Closure under reversal If L is a regular language, so is LR. How do we argue? regular expression NFA DFA

Arguing closure under reversal Take a regular expression E for L We will show how to reverse E A regular expression can be of the following types: The special symbols  and e Alphabet symbols like a, b The union, concatenation, or star of simpler expressions

Proof of closure under reversal regular expression E reversal ER Æ Æ e e a (alphabet symbol) a E1 + E2 E1R + E2R E1E2 E2RE1R E1* (E1R)*

? A question LDUP = {ww: w  L} L = {cat, dog} LDUP = {catcat, dogdog} Ex. LDUP = {catcat, dogdog} If L is regular, is LDUP also regular? ? regular expression NFA DFA

LDUP = LL A question Let’s try with regular expression: L = {a, b} Let’s try with NFA: L = {a, b} LDUP = {aa, bb} LL = {aa, ab, ba, bb} LDUP = LL q0 q1  NFA for L

An example L = 0*1 is regular LDUP = {1, 01, 001, 0001, ...} = {0n10n1: n ≥ 0} Let’s try to design an NFA for LDUP

An example LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0} 1 01 1 001 1 0001 1 LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}