Theory of computing, part 1. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of.

Slides:



Advertisements
Similar presentations
Theory Of Automata By Dr. MM Alam
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.
CS5371 Theory of Computation
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
Topics Automata Theory Grammars and Languages Complexities
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Theory of computing, part 1. Von Neumann Turing machine Finite state machines NP complete problems -maximum clique -travelling salesman problem -colour.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Theory of computing, part 3. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
CHAPTER 1 Regular Languages
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
Theory of computing, part 4. 1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 04: Theory of Automata:08 Transition Graphs.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Theory of Languages and Automata By: Mojtaba Khezrian.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Context-Free Grammars: an overview
Lecture 1 Theory of Automata
Regular Expressions.
Complexity and Computability Theory I
Pushdown Automata.
Pushdown Automata.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Intro to Data Structures
CHAPTER 2 Context-Free Languages
Theory of computing, part 3
Presentation transcript:

Theory of computing, part 1

1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field 5Splicing systems 6P systems 7Hairpins 8Detection techniques 9Micro technology introduction 10Microchips and fluidics 11Self assembly 12Regulatory networks 13Molecular motors 14DNA nanowires 15Protein computers 16DNA computing - summery 17Presentation of essay and discussion Course outline

Old computers

Abacus

Born December 26, 1791 in Teignmouth, Devonshire UK, Died 1871, London; Known to some as the Father of Computing for his contributions to the basic design of the computer through his Analytical machine.. The difference engine (1832)

The ENIAC machine occupied a room 30x50 feet. The controls are at the left, and a small part of the output device is seen at the right. ENIAC (1946)

The IBM 360 was a revolutionary advance in computer system architecture, enabling a family of computers covering a wide range of price and performance. ENIAC (1946)

Books

Introduction

 Finite state machines (automata) Pattern recognition Simple circuits (e.g. elevators, sliding doors)  Automata with stack memory (pushdown autom.) Parsing computer languages  Automata with limited tape memory  Automata with infinite tape memory Called `Turing machines’, Most powerful model possible Capable of solving anything that is solvable Models of computing

 Regular grammars  Context free grammars  Context sensitive grammars  Unrestricted grammars Chomsky hierarchy of grammars

Computers can be made to recognize, or accept, the strings of a language. There is a correspondence between the power of the computing model and the complexity of languages that it can recognize!  Finite automata only accept regular grammars.  Push down automata can also accept context free grammars.  Turing machines can accept all grammars. Computers can recognise languages

Languages

 A set is a collection of things called its elements. If x is an element of set S, we can write this: x  S  A set can be represented by naming all its elements, for example: S = {x, y, z}  There is no particular order or arrangement of the elements, and it doesn’t matter if some appear more than once. These are all the same set: {x, y, z}={y, x, z}={y, y, x, z, z} Sets

 A set with no elements is called the empty set, or a null set. It is denoted by  ={}.  If every element of set A is also an element of set B, then A is called a subset of B : A  B  The union of two sets is the set with all elements which appear in either set: C = A  B  The intersection of two sets is the set with all the elements which appear in both sets: C = A  B Combining sets

 A string a sequence of symbols that are placed next to each other in juxtaposition  The set of symbols which make up a string are taken from a finite set called an alphabet E.g. {a, b, c} is the alphabet for the string abbacb.  A string with no elements is called an empty string and is denoted .  If Σ is an alphabet, the infinite set of all strings made up from Σ is denoted Σ*. E.g., if Σ ={a}, then Σ *={ , a, aa, aaa, …} Alphabet and strings

 A language is a set of strings.  If Σ is an alphabet, then a language over Σ is a collection of strings whose components come from Σ.  So Σ* is the biggest possible language over Σ, and every other language over Σ is a subset of Σ*. Languages

 Four simple examples of languages over an alphabet Σ are the sets , {  }, Σ, and Σ*.  For example, if Σ={a} then these four simple languages over Σ are ,{  },{a},and{ , a, aa, aaa, …}.  Recall {  } is the empty string while  is the empty set. Σ* is an infinite set. Examples of languages

 The alphabet is Σ = {a,b,c,d,e…x,y,z}  The English language is made of strings formed from Σ: e.g. fun, excitement.  We could define the English Language as the set of strings over Σ which appear in the Oxford English dictionary (but it is clearly not a unique definition). Example: English

 The natural operation of concatenation of strings places two strings in juxtaposition.  For example, if then the concatenation of the two strings aab and ba is the string aabba.  Use the name cat to denote this operation. cat(aab, ba) = aabba. Concatenation

 Languages are sets of strings, so they can be combined by the usual set operations of union, intersection, difference, and complement.  Also we can combine two languages L and M by forming the set of all concatenations of strings in L with strings in M. Combining languages

 This new language is called the product of L and M and is denoted by L  M.  A formal definition can be given as follows: L  M = {cat(s, t) | s  L and t  M}  For example, if L = {ab, ac} and M = {a, bc, abc}, then the product LM is the language L  M = {aba, abbc, ababc, aca, acbc, acabc} Products of languages

 The following simple properties hold for any language L: L  {  } = {  }  L = L L   =   L =   The product is not commutative. In other words, we can find two languages L and M such that L  M  M  L  The product is associative. In other words, if L, M, and N are languages, then L  (M  N) = (L  M)  N Properties of products

 If L is a language, then the product L  L is denoted by L 2.  The language product L n for every n  {0, 1, 2, …} is as follows: L 0 = {  } L n = L  L n-1 if n > 0 Powers of languages

For example, if L = {a, bb} then the first few powers of L are L 0 = {  } L 1 = L = {a, bb} L 2 = L  L = {aa, abb, bba, bbbb} L 3 = L  L 2 = {aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb} Example

 If L is a language over Σ (i.e. L  Σ*) then the closure of L is the language denoted by L* and is defined as follows: L* = L 0  L 1  L 2  …  The positive closure of L is the language denoted by L + and defined as follows: L + = L 1  L 2  L 3  … Closure of a language

 It follows that L* = L +  {  }. But it’s not necessarily true that L + = L* - {  }  For example, if we let our alphabet be Σ = {a} and our language be L = { , a}, then L + = L*  Can you find a condition on a language L such that L + = L* - {  }? L* vs. L+

The closure of Σ coincides with our definition of Σ* as the set of all strings over Σ. In other words, we have a nice representation of Σ* as follows: Σ* = Σ 0  Σ 1  Σ 2  …. Closure of an alphabet

Let L and M be languages over the alphabet Σ. Then: a){  }* =  * = {  } b)L* = L*  L* = (L*)* c)  L if and only if L + = L* d)(L*  M*)* = (L*  M*)* = (L  M)* e)L  (M  L)* = (L  M)*  L Properties of closure

Grammars

 A grammar is a set of rules used to define the structure of the strings in a language.  If L is a language over an alphabet Σ, then a grammar for L consists of a set of grammar rules of the following form:     where  and  denote strings of symbols taken from Σ and from a set of grammar symbols (non-terminals) that is disjoint from Σ Grammars

A grammar rule    is often called a production, and it can be read in any of several ways as follows: replace  by   produces   rewrites to   reduces to  Productions

 Every grammar has a special grammar symbol called a start symbol, and there must be at least one production with left side consisting of only the start symbol.  For example, if S is the start symbol for a grammar, then there must be at least one production of the form S   Where to begin ……

1.An alphabet N of grammar symbols called non-terminals. (Usually upper case letters.) 2.An alphabet T of symbols called terminals. (Identical to the alphabet of the resulting language.) 3.A specific non-terminal called the start symbol. (Usually S. ) 4.A finite set of productions of the form   , where  and  are strings over the alphabet N  T The 4 parts of a grammar

Let Σ = {a, b, c}. Then a grammar for the language Σ* can be described by the following four productions: S   S  aS S  bS S  cS Or in shorthand: S   | aS | bS | cS S can be replaced by either , or aS, or bS, or cS. Example

If G is a grammar, then the language of G is the set of language strings derived from the start symbol of G. The language of G is denoted by: L(G) Grammar specifies the language

If G is a grammar with start symbol S and set of language strings T, then the language of G is the following set: L(G) = {s | s  T* and S  + s} Grammar specifies the language

If the language is finite, then a grammar can consist of all productions of the form S  w for each string w in the language. For example, the language {a, ab} can be described by the grammar S  a | ab. Finite languages

 If the language is infinite, then some production or sequence of productions must be used repeatedly to construct the derivations.  Notice that there is no bound on the length of strings in an infinite language.  Therefore there is no bound on the number of derivation steps used to derive the strings.  If the grammar has n productions, then any derivation consisting of n + 1 steps must use some production twice Infinite languages

 For example, the infinite language {a n b | n  0 } can be described by the grammar, S  b | aS  To derive the string a n b, use the production S  aS  repeatedly --n times to be exact-- and then stop the derivation by using the production S  b  The production S  aS allows us to say If S derives w, then it also derives aw Infinite languages

 A production is called recursive if its left side occurs on its right side.  For example, the production S  aS is recursive.  A production S   is indirectly recursive if S derives (in two or more steps) a sentential form that contains S. Recursion

 For example, suppose we have the following grammar: S  b | aA A  c | bS  The productions S  aA and A  bS are both indirectly recursive S  aA  abS A  bS  baA Indirect recursion

 A grammar is recursive if it contains either a recursive production or an indirectly recursive production.  A grammar for an infinite language must be recursive!  However, a given language can have many grammars which could produce it. Recursion

 Suppose M and N are languages. We can describe them with grammars have disjoint sets of nonterminals.  Assign the start symbols for the grammars of M and N to be A and B, respectively: M : A   N :B  …  Then we have the following rules for creating new languages and grammars: Combining grammars

The union of the two languages, M  N, starts with the two productions S  A | B followed by the grammars of M and N A   … B   … Combining grammars, union rule

Similarly, the language M  N starts with the production S  AB followed by, as above, A   … B   … Combining grammars, product rule

Finally, the grammar for the closure of a language, M*, starts with the production S  AS |  followed by the grammar of M A   … Combining grammars, closure rule

Suppose we want to write a grammar for the following language: L = { , a, b, aa, bb,..., a n, b n,..} L is the union of the two languages M = { a n | n  N} N = { b n | n  N} Example, union

Thus we can write a grammar for L as follows: S  A | B union rule, A   | aAgrammar for M, B   | bBgrammar for N. Example, union

Suppose we want to write a grammar for the following language: L = { a m b n | m,n  N} L is the product of the two languages M = {a m | m  N} N = {b n | n  N} Example, product

Thus we can write a grammar for L as follows: S  AB product rule, A   | aA grammar for M B   | bBgrammar for N Example, product

Suppose we want to construct the language L of all possible strings made up from zero or more occurrences of aa or bb. L = {aa, bb}* = M* M = {aa, bb} Example, closure

So we can write a grammar for L as follows: S  AS |  closure rule, A  aa | bbgrammar for {aa, bb}. Example, closure

We can simplify this grammar:  Replace the occurrence of A in S  AS by the right side of A  aa to obtain the production S  aaS.   Replace A in S  AS by the right side of A  bb to obtain the production S  bbS.  This allows us to write the the grammar in simplified form as: S  aaS | bbS |  An equivalent grammar

LanguageGrammar {a, ab, abb, abbb} S  a | ab | abb | abbb { , a, aa, aaa, …}S  aS |  {b, bbb, bbbbb, … b 2n+1 } S  bbS | b {b, abc, aabcc, …, a n bc n } S  aSc | b {ac, abc, abbc, …, ab n c} S  aBc B  bB |  Some simple grammars

Regular languages

There are many possible ways of describing the regular languages:  Languages that are accepted by some finite automaton  Languages that are inductively formed from combining very simple languages  Those described by a regular expression  Any language produced by a grammar with a special, very restricted form What is a regular language?

We start with a very simple basis of languages and build more complex ones by combining them in particular ways: Basis: , {  } and {a} are regular languages for all a  Σ. Induction: If L and M are regular languages, then the following languages are also regular: L  M, L  M and L* Building a regular language

For example, the basis of the definition gives us the following four regular languages over the alphabet Σ = {a,b}: ,{  },{a}, {b} Sample building blocks

Regular languages over {a, b}. Language { , b} We can write it as the union of the two regular languages {  } and {b}: { ,b} = {  }  {b} Example 1

Language {a, ab } We can write it as the product of the two regular languages {a} and { , b}: {a, ab} = {a}  { , b} Example 2

Language { , b, bb, …, b n,…} It's just the closure of the regular language {b}: {b}* = { , b, bb,..., b n,...} Example 3

{a, ab, abb,..., ab n,...} = {a}  { , b, bb,..., b n,... } = {a}  {b}* Example 4

{ , a, b, aa, bb,..., a n, …, b m,...} = {a}*  {b}* Example 5

 A regular expression is basically a shorthand way of showing how a regular language is built from the basis  The symbols are nearly identical to those used to construct the languages, and any given expression has a language closely associated with it  For each regular expression E there is a regular language L(E) Regular expressions

The symbols of the regular expressions are distinct from those of the languages Regular expressionLanguage  L (  ) =   L (  ) = {  } aL {a} = {a} Basis of regular expressionslanguages Regular expressions versus languages

There are two binary operations on regular expressions (+ and  ) and one unary operator (*) Regular expressionLanguage R + SL (R + S ) = L (R )  L (S ) R  S, R SL (R  S ) = L (R )  L (S ) R*L (R* ) = L (R )* These are closely associated with the union, product and closure operations on the corresponding languages Operators on regular expressions

Like the languages they represent, regular expressions can be manipulated inductively to form new regular expressions Basis: ,  and a are regular expressions for all a  Σ. Induction: If R and S are regular expressions, then the following expressions are also regular: (R), R + S, R  S and R* Building regular expressions

For example, here are a few of the infinitely many regular expressions over the alphabet Σ = {a, b }: , , a, b  + b, b*, a + (b  a), (a + b)  a, a  b*, a* + b* Regular expressions

 To avoid using too many parentheses, we assume that the operations have the following hierarchy: *highest (do it first)  +lowest (do it last)  For example, the regular expression a + b  a*  can be written in fully parenthesized form as (a + (b  (a*))) Order of operations

 Use juxtaposition instead of  whenever no confusion arises. For example, we can write the preceding expression as a + ba*  This expression is basically shorthand for the regular language {a}  ({b}  ({a}*))  So you can see why it is useful to write an expression instead! Implicit products

Find the language of the regular expression a + bc* L(a + bc*) = L(a)  L(bc*) = L(a)  (L(b)  L(c*)) = L(a)  (L(b)  L(c)*) = {a}  ({b}  {c}*) = {a}  ({b)  { , c, c 2,., c n,…}) = {a}  {b, bc, bc 2, bc n,…} = {a, b, bc, bc 2,..., bc n,...}. Example

Many infinite languages are easily seen to be regular. For example, the language {a, aa, aaa,..., a n,... } is regular because it can be written as the regular language {a}  {a}*, which is represented by the regular expression aa*. Regular language

The slightly more complicated language { , a, b, ab, abb, abbb,..., ab n,...} is also regular because it can be represented by the regular expression  + b + ab* However, not all infinite languages are regular! Regular language

 Distinct regular expressions do not always represent distinct languages.  For example, the regular expressions a + b and b + a are different, but they both represent the same language, L(a + b) = L(b + a) = {a, b} Regular language

We say that regular expressions R and S are equal if L(R) = L(S) and we denote this equality by writing the following familiar relation: R = S Regular language

For example, we know that L (a + b) = {a, b} = {b, a} = L (b + a) Therefore we can write a + b = b + a We also have the equality (a + b) + (a + b) = a + b Regular language

Properties of regular language Additive (+) properties: R + T = T + R R +  =  + R= R R + R = R (R +S) +T = R + (S+ T) These follow simply from the properties of the union of sets

Properties of regular language Product () properties R  =  R =  R  =  R = R (RS)T =R(ST) Distributive properties R(S + T) = RS + RT (S + T)R = SR +TR

 * =  * =  R* = R*R* = (R*)* = R+R* RR* = R*R R(SR)* = (RS)* R (R+S)* = (R*S*)* = (R* + S*)* = R*(SR*)* Closure properties

Show that (  + a + b)* = a*(ba*)* (  + a + b)*= (a + b)*(+ property) = a*(ba*)*(closure property) Example

Regular grammars

A regular grammar is one where each production takes one of the following forms: S   S  w S  T S  wT where the capital letters are non- terminals and w is a non-empty string of terminals Regular grammar

 Only one nonterminal can appear on the right side of a production. It must appear at the right end of the right side.  Therefore the productions A  aBc and S  TU are not part of a regular grammar, but the production A  abcA is. Regular grammar

Finite automata

 Introduction  Deterministic finite automata (DFA’s)  Non-deterministic finite automata (NFA’s)  NFA’s to DFA’s  Simplifying DFA’s  Regular expressions  finite automata Outline

Consider the control system for a one-way swinging door: There are two states: Open and Closed It has two inputs, person detected at position A and person detected at position B If the door is closed, it should open only if a person is detected at A but not B Door should close only if no one is detected AB Automatic one way door

OpenClosed A, no B No A or B A and B A, no B B, no A A and B B, no A No A or B Control schematic

 A finite automaton is usually represented like this as a directed graph  Two parts of a directed graph: The states (also called nodes or vertices) The edges with arrows which represent the allowed transitions  One state is usually picked out as the starting point  For so-called ‘accepting automata,’ some states are chosen to be final states Finite automaton

 The input data are represented by a string over some alphabet and it determines how the machine progresses from state to state.  Beginning in the start state, the characters of the input string cause the machine to change from one state to another.  Accepting automata give only yes or no answers, depending on whether they end up in a ‘final state.’ Strings which end in a final state are accepted by the automaton. Strings and automata

The labeled graph in the figure above represents a FA over the alphabet Σ = {a, b} with start state 0 and final state 3. Final states are denoted by a double circle. Example

 The previous graph was an example of a deterministic finite automaton – every node had two edges (a and b) coming out  A DFA over a finite alphabet Σ is a finite directed graph with the property that each node emits one labeled edge for each distinct element of Σ. Deterministic finite automata (DFA’s)

 A DFA accepts a string w in Σ* if there is a path from the start state to some final state such that w is the concatenation of the labels on the edges of the path.  Otherwise, the DFA rejects w.  The set of all strings accepted by a DFA M is called the language of M and is denoted by L(M) More formally

 Construct a DFA to recognize the regular languages represented by the regular expression (a + b)* over alphabet Σ = {a, b}.  This is the set {a, b}* of all strings over {a, b}. This can be recognised by Example: (a+b)*

 Find a DFA to recognize the language represented by the regular expression a(a + b)* over the alphabet Σ = {a, b}.  This is the set of all strings in Σ* which begin with a. One possible DFA is: Example: a(a+b)*

 Build a DFA to recognize the regular language represented by the regular expression (a + b)*abb over the alphabet Σ = {a, b}.  The language is the set of strings that begin with anything, but must end with the string abb.  Effectively, we’re looking for strings which have a particular pattern to them Example: pattern recognition

The diagram below shows a DFA to recognize this language. If in state 1: the last character was a If in state 2: the last two symbols were ab If in state 3: the last three were abb Solution: (a+b)*abb

State transition function

We can also represent a DFA by a state transition function, which we'll denote by T, where any state transition of the form is represented by: T(i,a) = j To describe a full DFA we need to know:  what states there are,  which are the start and final ones,  the set of transitions between them. State transition function

 The class of regular languages is exactly the same as the class of languages accepted by DFAs!  Kleene (1956)  For any regular language, we can find a DFA which recognizes it! Regular languages

 DFA’s are very often used for pattern matching, e.g. searching for words/structures in strings  This is used often in UNIX, particularly by the grep command, which searches for combinations of strings and wildcards (*, ?)  grep stands for Global (search for) Regular Expressions Parser  DFA’s are also used to design and check simple circuits, verifying protocols, etc.  They are of use whenever significant memory is not required Applications of DFA’s

 DFA’s are called deterministic because following any input string, we know exactly which state its in and the path it took to get there  For NFA’s, sometimes there is more than one direction we can go with the same input character  Non-determinism can occur, because following a particular string, one could be in many possible states, or taken different paths to end at the same state! Non-deterministic finite automata

 A non-deterministic finite automaton (NFA) over an alphabet Σ is a finite directed graph with each node having zero or more edges,  Each edge is labelled either with a letter from Σ or with .  Multiple edges may be emitted from the same node with the same label.  Some letters may not have an edge associated with them. Strings following such paths are not recognised. NFA’s

 If an edge is labelled with the empty string , then we can travel the edge without consuming an input letter. Effectively we could be in either state, and so the possible paths could branch.  If there are two edges with the same label, we can take either path.  NFA’s recognise a string if any one of its many possible states following it is a final state  Otherwise, it rejects it. Non-determinism

DFA for a*a : Why is the top an NFA while the bottom is a DFA? NFA for a*a : NFA’s versus DFA’s

 Draw two NFAs to recognize the language of the regular expression ab + a*a.  This NFA has a  edge, which allows us to travel to state 2 without consuming an input letter.  The upper path corresponds to ab and the lower one to a*a Example

This NFA also recognizes the same language. Perhaps it's easier to see this by considering the equality ab + a*a = ab + aa* An equivalent NFA

 Since there may be non-determinism, we'll let the values of this function be sets of states.  For example, if there are no edges from state k labelled with a, we'll write T(k, a) =   If there are three edges from state k, all labelled with a, going to states i, j and k, we'll write T(k, a) = {i, j, k} NFA transition functions

 All digital computers are deterministic; quantum computers may be another story!  The usual mechanism for deterministic computers is to try one particular path and to backtrack to the last decision point if that path proves poor.  Parallel computers make non-determinism almost realizable. We can let each process make a random choice at each branch point, thereby exploring many possible trees. Comments on non-determinism

 The class of regular languages is exactly the same as the class of languages accepted by NFAs!  Rabin and Scott (1959)  Just like for DFA’s!  Every NFA has an equivalent DFA which recognises the same language. Some facts

 We prove the equivalence of NFA’s and DFA’s by showing how, for any NFA, to construct a DFA which recognises the same language  Generally the DFA will have more possible states than the NFA. If the NFA has n states, then the DFA could have as many as 2 n states!  Example: NFA has three states {A}, {B}, {C} the DFA could have eight: {  }, {A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}  These correspond to the possible states the NFA could be in after any string From NFA’s to DFA’s

 Begin in the NFA start state, which could be a multiple state if its connected to any by   Determine the set of possible NFA states you could be in after receiving each character. Each set is a new DFA state, and is connected to the start by that character.  Repeat for each new DFA state, exploring the possible results for each character until the system is closed  DFA final states are any that contain a NFA final state DFA construction

The start state is A, but following an a you could be in A or B; following a b you could only be in state A AB C a b a,b AA,B a b b A,C a b a NFA DFA Example (a+b)*ab

 Regular expressions represent the regular languages.  DFA’s recognize the regular languages.  NFA’s also recognize the regular languages. Summary

 So far, we’ve introduced two kinds of automata: deterministic and non-deterministic.  We’ve shown that we can find a DFA to recognise anything language that a given NFA recognises.  We’ve asserted that both DFA’s and NFA’s recognise the regular languages, which themselves are represented by regular expressions.  We prove this by construction, by showing how any regular expression can be made into a NFA and vice versa. Finite automata

 Given a regular expression, we can find an automata which recognises its language.  Start the algorithm with a machine that has a start state, a single final state, and an edge labelled with the given regular expression as follows: Regular expressions  finite automata

1.If an edge is labelled with , then erase the edge. 2.Transform any diagram like into the diagram Four step algoritm

3. Transform any diagram like into the diagram Four step algoritm

4. Transform any diagram like into the diagram Four step algoritm

Construct a NFA for the regular expression, a* + ab Start with Apply rule 2 a* + ab ab a* Example a*+ab

ab   a   a a b Apply rule 4 to a* Apply rule 3 to ab Example a*+ab

1Create a new start state s, and draw a new edge labelled with  from s to the original start state. 2Create a new final state f, and draw new edges labelled with  from all the original final states to f Finite automata  regular expressions

3For each pair of states i and j that have more than one edge from i to j, replace all the edges from i to j by a single edge labelled with the regular expression formed by the sum of the labels on each of the edges from i to j. 4Construct a sequence of new machines by eliminating one state at a time until the only states remaining are s and the f. Finite automata  regular expressions

As each state is eliminated, a new machine is constructed from the previous machine as follows:  Let old(i,j) denote the label on edge  i,j  of the current machine. If no edge exists, label it .  Assume that we wish to eliminate state k. For each pair of edges  i,k  (incoming edge) and  k,j  (outgoing edge) we create a new edge label new(i, j) Eliminating states

 The label of this new edge is given by: new(i,j) = old(i,j) + old(i, k) old(k, k)* old(k,j)  All other edges, not involving state k, remain the same: new(i, j) = old(i, j) After eliminating all states except s and f, we wind up with a two-state machine with the single edge  s, f  labelled with the desired regular expression new(s, f) Eliminate state k

Initial DFA Steps 1 and 2 Add start and final states Example

Eliminate state 2 (No path to f) Eliminate state 0 Eliminate state 1 Final regular expression Example

 Sometimes our constructions lead to more complicated automata than we need, having more states than are really necessary  Next, we look for ways of making DFA’s with a minimum number of states  Myhill-Nerode theorem: ‘Every regular expression has a unique* minimum state DFA’ * up to a simple renaming of the states Finding simpler automata

Two steps to minimizing DFA: 1Discover which, if any, pairs of states are indistinguishable. Two states, s and t, are equivalent if for all possible strings w, T(s,w) and T(t,w) are both either final or non-final. 2 Combine all equivalent states into a single state, modifying the transition functions appropriately. Finding minimum state DFA

States 1 and 2 are indistinguishable! Starting in either, b* is rejected and anything with a in it is accepted. a a b b a b 1 2 a,b a b Consider the DFA

1.Remove all inaccessible states, where no path exists to them from start. 2.Construct a grid of pairs of states. 3.Begin by marking those pairs which are clearly distinguishable, where one is final and the other non-final. 4.Next eliminate all pairs, which on the same input, lead to a distinguishable pair of states. Repeat until you have considered all pairs. 5.The remaining pairs are indistinguishable. Part 1, finding indistinguishable pairs

1.Construct a new DFA where any pairs of indistinguishable states form a single state in the new DFA. 2.The start state will be the state containing the original start state. 3.The final states will be those which contain original final states. 4.The transitions will be the full set of transitions from the original states (these should all be consistent.) Part 2, construct minimum DFA

a a a,b a a b bb What are the distinguishable pairs of states? Clearly, {0, 4} {1, 4} {2, 4} {3, 4} are all distinguishable because 4 is final but none of the others are. b Example

 We eliminate these as possible indistinguishable pairs.  Next consider {0, 1}. With input a, this becomes {3, 4} which is distinguishable, so {0, 1} is as well.  Similarly, we can show {0, 2} and {0, 3} are also distinguishable, leading to the modified grid… ? ? ? ? x x x x ? x ? ? x x Grid of pairs of state

We are left with {1, 2}given a{4, 4} given b {2, 1} {2, 3} given a{4, 4} given b {1, 2} {1, 3} given a{4, 4} given b {2, 2} These do not lead to pairs we know to be distinguishable, and are therefore indistinguishable! Remaining pairs

 States 1, 2, and 3 are all indistinguishable, thus the minimal DFA will have three states: {0} {1, 2, 3} {4}  Since originally T(0, a) = 3 and T(0, b) = 1, the new transitions are T(0, a) = T(0, b) = {1,2,3}  Similarly, T({1,2,3}, a) = 4 and T({1,2,3}, b) = {1,2,3}  Finally, as before, T(4, a) = 4 and T(4, b) = 4 Construct minimal DFA

The resulting DFA is much simpler: aa,b b 01, 2, 34 This recognises regular expressions of the form, (a + b) b* a (a + b)* This is the simplest DFA which will recognise this language! Resulting minimal DFA

 We now have many equivalent ways of representing regular languages: DFA’s, NFA’s, regular expressions and regular grammars.  We can also now simply(?!) move between these various representations.  We’ll see next lecture that the automata representation leads to a simple way of recognising some languages which are not regular.  We’ll also begin to consider more powerful language types and correspondingly more powerful computing models! conclusions

M = (Q, Σ, δ, q 0, F) Q= statesa finite set Σ= alphabeta finite set δ= transition functiona total function in Q  Σ  Q q 0 = initial/starting stateq 0  Q F= final statesF  Q Formal definition