Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Why empty strings? A bit like zero –you may think you can do without –but it makes definitions & calculations easier Definitions: –An alphabeth is a finite.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Formal Languages: main findings so far A problem can be formalised as a formal language A formal language can be defined in various ways, e.g.: the language.
Formal Languages Languages: English, Spanish,... PASCAL, C,... Problem: How do we define a language? i.e. what sentences belong to a language? e.g.Large.
Kleene's Theorem We have defined the regular languages, using regular expressions, which are convenient to write down and use. We have also defined the.
4b Lexical analysis Finite Automata
Theory Of Automata By Dr. MM Alam
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
Regular Expressions and DFAs COP 3402 (Summer 2014)
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Determinization of Büchi Automata
Chapter Section Section Summary Set of Strings Finite-State Automata Language Recognition by Finite-State Machines Designing Finite-State.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Intro to DFAs Readings: Sipser 1.1 (pages 31-44) With basic background from Sipser 0.
CS5371 Theory of Computation
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Nondeterminism.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Normal forms for Context-Free Grammars
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Topics Automata Theory Grammars and Languages Complexities
Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Great Theoretical Ideas in Computer Science.
Finite-State Machines with No Output
Basics of automata theory
REGULAR LANGUAGES.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
CS 203: Introduction to Formal Languages and Automata
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Lecture #4 Thinking of designing an abstract machine acts as finite automata. Advanced Computation Theory.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lexical analysis Finite Automata
Non Deterministic Automata
Pushdown Automata.
Chapter 2 FINITE AUTOMATA.
Solution Prove by induction the following statement:
CSC 4170 Theory of Computation Nondeterminism Section 1.2.
Hierarchy of languages
Non-Determinism 12CS45 Finite Automata.
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Intro to Data Structures
Non Deterministic Automata
Introduction to Finite Automata
4b Lexical analysis Finite Automata
Chapter Five: Nondeterministic Finite Automata
4b Lexical analysis Finite Automata
CSC 4170 Theory of Computation Nondeterminism Section 1.2.
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Presentation transcript:

Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising infinite languages? i.e. given a language description and a string, is there an algorithm which will answer yes or no correctly? We will define an abstract machine which takes a candidate string and produces the answer yes or no. The abstract machine will be the specification of the language.

Finite State Automata A finite state automaton is an abstract model of a simple machine (or computer). The machine can be in a finite number of states. It receives symbols as input, and the result of receiving a particular input in a particular state moves the machine to a specified new state. Certain states are finishing states, and if the machine is in one of those states when the input ends, it has ended successfully (or has accepted the input). Example: A a b b b a a a,b

Formal definition of FSAs We’ll present the general case In practice, we’ll focus on a subset of simple FSAs, known as deterministic FSAs (DFSAs)

FSA: Formal Definition A Finite State Automaton (FSA) is a 5-tuple (Q, I, F, T, E) where: Q = states a finite set; I = initial states a nonempty subset of Q; F = final states a subset of Q; T = an alphabet; E = edges a subset of Q  (T + )  Q. FSA can be represented by a labelled, directed graph =set of nodes (some final/initial) + directed arcs (arrows) between nodes + each arc has a label from the alphabet. Example: formal definition of A 1 Q = {1, 2, 3, 4} I = {1} F = {4} T = {a, b} E = { (1,a,2), (1,b,4), (2,a,3), (2,b,4), (3,a,3), (3,b,3), (4,a,2), (4,b,4) } A1A a b b b a a a,b

What does it mean to accept a string/language? If (x,a,y) is an edge, x is its start state and y is its end state. A path is a sequence of edges such that the end state of one is the start state of the next. path p 1 = (2,b,4), (4,a,2), (2,a,3) A path is successful if the start state of the first edge is an initial state, and the end state of the last is a final state. path p 2 = (1,b,4),(4,a,2),(2,b,4),(4,b,4) The label of a path is the sequence of edge labels. label(p 1 ) = baa.

What does it mean to accept a string/language? A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1.

A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1. The language accepted by A 1 is a b b b a a a,b

A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1. The language accepted by A 1 is the set of strings of a's and b's which end in b, and in which no two a's are adjacent a b b b a a a,b

Some simple examples (assuming determinism) 1.Draw an FSA to accept the set of bitstrings starting with 0 2.Draw an FSA to accept the set of bitstrings ending with 0 3.Draw an FSA to accept the set of bitstrings containing a sequence 00 4.Can you draw an FSA to accept the set of bitstrings that contain an equal number of 0 and 1?

L1. Bitstrings starting with 0 q1 q2 q

L1. Bitstrings starting with 0 q1 q2 q Can you make a smaller FSA that accepts the same language?

L2. Bitstrings ending with 0 q1 q2 q At home: Can you find a smaller FSA that accepts the same language?

L3. Bitstrings containing 00 q1 q3 0 1 q

L4. Bitstrings with equal numbers of 0 and 1 This cannot be done. FSAs are not powerful enough. Later we shall meet automata that can do it CS3518: Some problems cannot be solved by any automaton Course on (among other things) computability

Recognition Algorithm Problem: Given a DFSA, A = (Q,I,F,T,E), and a string w, determine whether w  L(A). Note: denote the current state by q, and the current input symbol by t. Since A is deterministic,  (q,t) will always be a singleton set or will be undefined. If it is undefined, denote it by  (  Q). Algorithm: Add symbol # to end of w. q := initial state t := first symbol of w#. while (t  # and q   ) begin q :=  (q,t) t := next symbol of w# end return ((t == #) & (q  F))

Why study NDFSAs? It might appear that NDFSAs are useless We’ll soon see that they are not –λ transitions will be useful

Minimum Size of FSA's Let A = (Q,I,F,T,E). Definition: For any two strings, x, y in T*, x and y are distinguishable w.r.t. A if there is a string z  T* s.t. exactly one of xz and yz are in L(A). z distinguishes x and y w.r.t. A. This means that with x and y as input, A must end in different states - A has to distinguish x and y in order to give the right results for xz and yz. This is used to prove the next result: Theorem: (proof omitted) Let L  T*. If there is a set of n elements of T* s.t. any two of its elements are distinguishable w.r.t. A, then any FSA that recognises L must have at least n states.

Applying the theorem (1) L3 (above) = the set of bitstrings containing 00 Distinguishable: all of {11,10,00} {11,10}: 100 is in L, 110 is out, {11,00}: 001 is in L, 111 is out, {10,00}: 001 is in L, 101 is out. {11,10,00} has 3 elements, hence, the DFA for L3 requires at least 3 states

Applying the theorem (2) L4 again: equal numbers of 0 and 1 n=2: {01,001}  need 2 states n=3: {01,001,0001}  need 3 states n=4: {01,001,0001,00001}  need 4 states … For any finite n, there’s a set of n elements that are distinguishable  the FSA for L4 would need more than finitely many states, (which is not permitted)!

A taste of the theory of Formal Languages This theorem tells you something about the kind of automaton (in terms of its number of states) that’s required given a particular kind of problem (i.e., a particular kind of language) It also tells you that certain languages cannot be accepted by any FSA

Moving beyond DFSAs The above definition of an FSA allows nondeterminism So far, we have not exploited FSA’s ability to behave in nondeterministic ways Let’s see how an FSA can be nondeterministic

Non-Determinism 1.From one state there could be a number of edges with the same label.  have to remember possible branching points as we trace out a path, and investigate all branches. 2.Some of the edges could be labelled with, the empty string  how does this affect our algorithm? 3.May be more than one initial state  where do we start? a a a b b a λ b Read ab (= aλb). What state do you end up in? b

Deterministic FSA A FSA is deterministic if: (i) there are no -labelled edges; (ii) for any pair of state and symbol (q,t), there is at most one edge (q,t, p); and (iii) there is only one initial state. DFSA and NDFSA stand for deterministic and non-deterministic FSA respectively. At home: revise old definition of FSA, to become the definition of a DFSA. All three conditions must hold Note: acceptance (of a string or a language) has been defined in a declarative (i.e., non- procedural) way.

Equivalence of DFSA's and NDFSA's Theorem: DFSA = NDFSA Let L be a language. L is accepted by a NDFSA iff L is accepted by a DFSA Algorithm: NDFSA -> DFSA Proof omitted here There is an algorithm to create a DFSA equivalent (i.e. accepts same language) to a given NDFSA. The reverse direction is trivial.