Regular expressions Module 04.3 COP4020 – Programing Language Concepts Dr. Manuel E. Bermudez.

Slides:



Advertisements
Similar presentations
Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
CPSC 388 – Compiler Design and Construction
Chapter 2 Languages.
1 Syntax Specification Regular Expressions. 2 Phases of Compilation.
Module 2 How to design Computer Language Huma Ayub Software Construction Lecture 7 1.
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of CFL was to formalize.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Grammars CPSC 5135.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite.
Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Lexical Analyzer in Perspective
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Discrete Mathematical Structures 4 th Edition Kolman, Busby, Ross © 2000 by Prentice-Hall, Inc. ISBN
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
Recursive Definations Regular Expressions Ch # 4 by Cohen
Finite Automata Chapter 1. Automatic Door Example Top View.
Regular Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
Lecture 8 NFA Subset Construction & Epsilon Transitions
Regular Expressions Section 1.3 (also 1.1, 1.2) CSC 4170 Theory of Computation.
1 Section 11.1 Regular Languages Problem: Suppose the input strings to a program must be strings over the alphabet {a, b} that contain exactly one substring.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
Algebra Review. Systems of Equations Review: Substitution Linear Combination 2 Methods to Solve:
Theory of Computation Automata Theory Dr. Ayman Srour.
FINITE-STATE AUTOMATA COP4620 – Programming Language Translators Dr. Manuel E. Bermudez.
Lexical Analyzer in Perspective
Context-free grammars
Theory of Computation Lecture #
Language Theory Module 03.1 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
Chapter 3 Lexical Analysis.
Context-Free Grammars: an overview
Context-free grammars, derivation trees, and ambiguity
Complexity and Computability Theory I
CS314 – Section 5 Recitation 3
Complexity and Computability Theory I
Finite-state automata
Regular grammars Module 04.1 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
REGULAR LANGUAGES AND REGULAR GRAMMARS
Regular Expressions Prof. Busch - LSU.
Review: Compiler Phases:
Lexical Analysis Lecture 2 Mon, Jan 17, 2005.
CHAPTER 2 Context-Free Languages
Digital Control Systems Waseem Gulsher
DFA-> Minimum DFA Module 05.4 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
NFA->DFA Module 05.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
Programming Language Concepts
Specification of tokens using regular expressions
Regular Expression to NFA
Regular Expression to NFA
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
COP46– Programming Language Translators Dr. Manuel E. Bermudez
Operator precedence and AST’s
Operator Precedence and Associativity
LECTURE # 07.
Presentation transcript:

Regular expressions Module 04.3 COP4020 – Programing Language Concepts Dr. Manuel E. Bermudez

Topics Define Regular Expressions Conversion from Right- Linear Grammar to Regular Expression

Regular expressions A compact, easy-to-read language description. Use operators to denote the language constructors described earlier, to build complex languages from simple atomic ones.

Regular expressions Definition: A regular expression over an alphabet Σ is recursively defined as follows: ø denotes language ø ε denotes language {ε} a denotes language {a}, for all a  Σ. (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s. (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s. P* denotes L(P)*, where P is a r.e. To prevent excessive parentheses, we assume left associativity, and the following operator precedence: * (highest), · , + (lowest)

Regular expressions Examples: (O + 1)*: any string of O’s and 1’s. (O + 1)*1: any string of O’s and 1’s, ending with a 1. 1*O1*: any string of 1’s with a single O inserted. Letter (Letter + Digit)*: an identifier. Digit Digit*: an integer. Quote Char* Quote: a string. † # Char* Eoln: a comment. † {Char*}: another comment. † † Assuming that Char does not contain quotes, eoln’s, or } .

Regular expressions Additional Regular Expression Operators: a+ = aa* (one or more a’s) a?= a + ε (one or zero a’s, i.e. a is optional) a list b = a (b a )* (a list of a’s, separated by b’s) Examples: Syntax for a function call: Name '(' Expression list ',' ')' Identifier: Floating-point constant:

Regular expressions Conversion from Right-linear grammars to regular expressions S → aS R → aS S → aS means L(S) ⊇ {a}·L(S) → bR S → bR means L(S) ⊇ {b}·L(R) → ε S → ε means L(S) ⊇ {ε} Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε}, or S = aS + bR + ε Similarly, R → aS means L(R) = {a} ·L(S), or R = aS. Thus, S = aS + bR + ε System of simultaneous equations. R = aS The variables are the nonterminals.

Regular expressions Solving a system of simultaneously equations. S = aS + bR + ε R = aS Back substitute R = aS: S = aS + baS + ε S = (a + ba) S + ε What to do with equations of the form X = X + β ?

Regular expressions Equations of the form: X = X + β β  L(x), so αβ  L(x), ααβ  L(x), αααβ  L(x), … Therefore, L(x)=α*β. In our case, S = (a + ba) S + ε S = (a + ba)* ε S = (a + ba)*

Regular expressions Conversion from Right-linear grammars to regular Set up equations: A = α1 + α2 + … + αn if A → α1 → α2 . . . → αn

Regular expressions If equation is of the form X = α, and X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α. 3. If equation is of the form X = αX + β, and X does not occur in α or β, then replace the equation with X = α*β. Note: Some algebraic manipulations may be needed to obtain the form X = αX + β. Important: Catenation is not commutative!!

Regular expressions Example: S → a R → abaU U → aS → bU → U → b → bR Equations: S = a + bU + bR R = abaU + U = (aba + ε) U U = aS + b Back substitute R: S = a + bU + b(aba + ε) U

Regular expressions S = a + bU + b(aba + ε) U U = aS + b Back substitute U: S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb = a + baS + bb + babaaS + babab = (ba + babaa) S + (a + bb + babab) and therefore S = (ba + babaa)*(a + bb + babab) repeats

Regular expressions Summarizing: RGR RGL Minimum DFA RE NFA DFA Done Coming Up …