Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.

Slides:



Advertisements
Similar presentations
Properties of Regular Languages
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CS 310 – Fall 2006 Pacific University CS310 Finite Automata Sections:1.1 page 44 September 8, 2006.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
Fall 2004COMP 3351 Single Final State for NFA. Fall 2004COMP 3352 Any NFA can be converted to an equivalent NFA with a single final state.
1 Languages and Finite Automata or how to talk to machines...
Cs466(Prasad)L3RE1 Representation of Languages. cs466(Prasad)L3RE2 Need finite descriptions of infinite sets of strings (=> specify languages). Discover.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Languages & Strings String Operations Language Definitions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Grammars CPSC 5135.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2006.
1 State SymbolRead- Q E(Q) a b a b a b Convert to a DFA: Start state: Final States:
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Regular Expressions Hopcroft, Motawi, Ullman, Chap 3.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
Kleene’s Theorem Group No. 3 Presented To Mam Amina Presented By Roll No Roll No Roll No Roll No Group No. 3 Presented To Mam.
CHAPTER 1 Regular Languages
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
Recursive Definations Regular Expressions Ch # 4 by Cohen
Finite Automata Chapter 1. Automatic Door Example Top View.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Recap: Transformation NFA  DFA  s s1s1... snsn p1p1 p2p2... pmpm >...  p1p1  p2p2  pipi s e s1s1 e s2s2 e sisi >
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Conversions Regular Expression to FA FA to Regular Expression.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Akram Salah ISSR Basic Concepts Languages Grammar Automata (Automaton)
P Symbol Q E(Q) a b a b a b Convert to a DFA: Start state: Final States:
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Lecture 15: Theory of Automata:2014 Finite Automata with Output.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
Languages.
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz
PROPERTIES OF REGULAR LANGUAGES
CSE 105 theory of computation
Deterministic Finite Automata
LECTURE NOTES On FINITE AUTOMATA.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Closure Properties for Regular Languages
CSE322 PROPERTIES OF REGULAR LANGUAGES
Properties of Regular Languages
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
Compiler Construction
Convert to a DFA: Start state: Final States: State Symbol Read- Q E(Q)
Convert to a DFA: Start state: Final States: P Symbol Q E(Q) a b.
CSE 105 theory of computation
Chapter 1 Regular Language
CHAPTER 1 Regular Languages
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
CSE 105 theory of computation
Presentation transcript:

Regular Expressions CIS 361

Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite alphabet is uncountable, while the set of descriptions is countable Fundamental Problems

Regular Expressions Language L is regular if there exists a finite acceptor for it Any language that is described by a regular expression can be accepted by some finite automaton

Regular Expressions Regular expressions Combination of strings of symbols from some alphabet, parentheses and operators U,., * U is union (some literature uses +). (or nothing) is concatenation * is star closure or Kleene star superscripted repetition, 0 or more times + is closure superscripted repetition, 1 or more times

Specifying Lexical Structure Using Regular Expressions Have some alphabet  = set of symbols Regular expressions are built from:  - empty string Any letter from  r 1 r 2 – String r 1 followed by r 2 (concatenation) r 1 U r 2 (r 1 + r 2 ) – either regular expression r 1 or r 2 (union) r* - iterated sequence and choice  | r | r r | … Parentheses to indicate grouping/precedence

Regular Expressions Operations Union Complement Intersection Difference Concatenation Repetition Kleene star Plus operator

Regular Expressions Union L  M The union of two regular expressions Q and R is Q U R In terms of automata A and B, respectively create a new initial state q connect it to the initial states of A and B by  transitions

Regular Expressions Complement  * - L To construct the complement of a regular expression L, inspect the automaton that accepts its strings convert the automaton for L to a deterministic automaton flips favorable and nonfavorable states construct a regular expression for strings accepted by the updated automaton

Regular Expressions Complement of bit strings with at least one “1” = bit strings containing no “1”s = 0* Complement of bit strings with exactly one “1” = bit strings containing no “1”s U bit strings with at least two “1”s = 0* U (0* 1 0* 1 0*)(0 U 1)*

Regular Expressions Intersection L  M Apply DeMorgan’s law Union of the complements of L and M

Regular Expressions Difference L – M Can be expressed as the intersection of languages L and  * - M

Regular Expressions Concatenation Strings u and v over alphabet  is string uv Languages L 1 and L 2 concatenated L 1 L 2 ={uv|u  L 1, v  L 2 } Can be extended to any finite number of languages

Regular Expressions Concatenation LM Algorithm connects every favorable state of L to the initial state of M by an arrow labeled  Favorable states of L become non-favorable Favorable states of M become favorable states of the new automaton

Regular Expressions Kleene star L * In terms of automaton connect every favorable state of L to the initial state of L by a transition labeled  create a new initial state s, make it the only favorable state and connect it to the old initial state by  transition

Regular Expressions Plus (+) L + In terms of automaton connect every favorable state of L to the initial state of L by a transition labeled  That’s it. This gets one or more times to a favorable state

Naming Languages Regular sets can be named using the derivation in terms of the seed elements and the closure operations. Regular expressions formalize this approach. Regular sets  Regular Expressions Numbers  Numerals Semantics  Syntax

Regular expressions for strings over {a,b} containing at least one “a”. Focus on the one “a” (a u b)*a(a u b)* Focus on the leftmost “a” b*a(a u b)* Focus on the “a”s b*ab*(ab*)* Further optimization b*(ab*) + Example

Two regular expressions are equivalent if they represent the same regular set. Equivalence of regular expressions

Concept of Language Generated by Regular Expressions Set of all strings generated by a regular expression is the language of the regular expression In general, a language may be (countably) infinite A string in a language is often called a token

Examples of Languages and Regular Expressions  = { 0, 1,. } (0 U 1)*.(0 U 1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros  = {A,…,Z, a,…,z, 0,…,9,_ } (A U … U z)(A U … U z U 0 U … U 9 U _) * identifiers (1 U … U 9)(0 U … U 9) * natural numbers (no negatives) (0|1|2)* - trinary (base 3) numbers

Finite-State Automata Alphabet  Set of states with initial and accepting states Transitions between states, labeled with symbol(s) (0 | 1)*.(0|1)*