Functional Design and Programming Lecture 10: Regular expressions and finite state machines.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Chapter 5 Pushdown Automata
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Intro to DFAs Readings: Sipser 1.1 (pages 31-44) With basic background from Sipser 0.
Introduction to Computability Theory
Intro to DFAs Readings: Sipser 1.1 (pages 31-44) With basic background from Sipser 0.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CS5371 Theory of Computation
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Languages and Finite Automata or how to talk to machines...
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CS 490: Automata and Language Theory Daniel Firpo Spring 2003.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Topics Automata Theory Grammars and Languages Complexities
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
Rosen 5th ed., ch. 11 Ref: Wikipedia
Regular Expressions (RE) Empty set Φ A RE denotes the empty set Empty string λ A RE denotes the set {λ} Symbol a A RE denotes the set {a} Alternation M.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
REGULAR LANGUAGES.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
1 Chapter 2 Finite Automata (part b) Windmills in Holland.
Grammars CPSC 5135.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Finite Automata & Regular Languages Sipser, Chapter 1.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 Chapter 2 Finite Automata (part a) Hokkaido, Japan.
Theory of Computation Automata Theory Dr. Ayman Srour.
Department of Software & Media Technology
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CIS Automata and Formal Languages – Pei Wang
Languages.
Lexical analysis Finite Automata
Two issues in lexical analysis
Finite Automata & Regular Languages
Chapter 2 FINITE AUTOMATA.
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Finite Automata.
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Lexical Analysis Uses formalism of Regular Languages
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Presentation transcript:

Functional Design and Programming Lecture 10: Regular expressions and finite state machines

Literature  These notes  Randal C. Nelson’s notes on finite automata and regular expressions: /csc_173/fa/

Exercises  Consider: alphabetic identifiers in Standard ML; file names that end with “.txt”; words in a text that contain subsequence “s”, “e”, “c”, “r”, “e”, “t”; comments in XML.  For each of the above: Write a regular expression that generates the strings Give a finite state automaton that recognizes the strings.  See home page for more exercises.

Overview  Regular expressions  Finite state automata  Applications of finite automata and regular expressions  Regular expressions and context-free grammars  Construction of finite automata  Implementation of deterministic finite automata

Regular expressions  Expressions that describe (possibly) infinite sets of strings.  Examples:.*\.sml: strings ending with “.sml”.*glei.*: strings containting substring “glei”.

Regular expressions: Definition Reg. exp.Set denoted  {} a{a} RQ {st: s  L(R), t  L(Q)} R|Q L(R)  L(Q) R* {s 1 s 2...s n : s i  R}

Derived regular expressions Reg. expDefinition R+RR* R? R|  [a-z]a | b |... | z [^a-z]disjunction of all symbols but a, b,..., z.disjunction of all symbols

Finite State Machines  Finite state automaton: Description of abstract machine with a finite number of different states and transitions on input symbols between states.  Finite state transducer: Like finite state automaton, but additionally with output symbols on transitions.

Finite State Automata  A finite state automaton is a 5-tuple consisting of: a finite set  of characters or symbols (alphabet), a finite set Q of states, a start state q 0  Q, a subset F  Q of accepting (or final) states, a set of transitions (q, a, q’) with q, q’  Q, a  S, written qq’ a

Finite state automata  An FSM accepts a string s = a 1 a 2...a n if there are transitions ending in a final state: q0q’ a1a2an final state

Finite state automata...  An FSM recognizes the language (set of strings) L   which it accepts.  An FSM is deterministic if no state more than 1 transition on any given symbol.  Theorem: The same classes of languages over  are definable (recognizable) by finite state machines, deterministic finite state machines and regular expressions.

Applications of regular expressions and FSM’s  Text searching and processing  State-based protocols  Dialog/interaction control  Hardware verification  Protocol verification  Programming language processing  Natural language processing

Regular expressions and context-free grammars  Regular expressions can be understood as restricted CFG’s: RE = CFG incl. * (Kleene closure) with no (mutual) recursive definitions of nonterminals.  Regular definitions: A sequence of definitions r i = R i for variables r i such that R i is regular expression with possible occurrences of r 1,...,r i-1.

Regular expressions and finite state automata  Regular expressions are often convenient methods of specifying a desired language.  Deterministic finite state machines are a good model for efficient implementation of recognizing the language.

Construction of finite automata Regular expression Nondeterministic finite state automaton (NFA) Deterministic finite state automaton (DFA) subset construction(trivial) Thomson’s construction Path algebra construction

Implementation of deterministic FSM’s (1)  Table-based implementation: Represent states by indexes 0,...,n-1. Represent characters by indexes 0,...,m-1 (e.g., m=256). Represent transitions by two-dimensional vector (vector of vector of indices or 2-dimensional array/vector) T such that T(q,a)= SOME q’ if (q,a,q’) is a transition. [SML: Vector.sub(Vector.sub(T, q), a)]

Implementation of deterministic FSM’s (2)  Sparse table-based implementation: Represent states by indexes 0,...,n-1. Represent transitions by one-dimensional vector of association lists such that lookup(Vector.sub(T, q), a) = SOME q’ if T(q,a)=q’. Optimization: Use a better data structure than association lists (e.g., hash tables, search trees).

Implementation of deterministic FSM’s (3)  Functional implementation: datatype S = STATE of (symbol -> S) * bool. Represent states by by functions of type S. Transitions are represented as part of the state (first component). Whether a state is accepting or not is represented by the second component. Execute a transition by applying the second component of the state: fun trans (STATE (t, f)) a = t a Note: Intuitively trans corresponds to a curried version of T: Q *  Q. It can be implemented more efficiently than the uncurried version (in principle).

Other problems  Minimize a DFA.  Decide whether two DFA’s are equivalent.