Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult.

Slides:



Advertisements
Similar presentations
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
Advertisements

Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult.
22C:19 Discrete Structures Induction and Recursion Fall 2014 Sukumar Ghosh.
Deterministic Finite Automata (DFA)
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION.
Recursive Definitions and Structural Induction
Induction and Recursion. Odd Powers Are Odd Fact: If m is odd and n is odd, then nm is odd. Proposition: for an odd number m, m k is odd for all non-negative.
Induction and recursion
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CSE115/ENGR160 Discrete Mathematics 04/03/12 Ming-Hsuan Yang UC Merced 1.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
CSE115/ENGR160 Discrete Mathematics 03/31/11
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) (1,
Great Theoretical Ideas in Computer Science.
Induction and recursion
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Copyright © Cengage Learning. All rights reserved.
Discrete Mathematics CS 2610 March 26, 2009 Skip: structural induction generalized induction Skip section 4.5.
Chapter 2 Languages.
Induction and recursion
Chapter 4: Induction and Recursion
Section 5.3. Section Summary Recursively Defined Functions Recursively Defined Sets and Structures Structural Induction.
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
1 Chapter 1 Introduction to the Theory of Computation.
Sequences – Page 1CSCI 1900 – Discrete Structures CSCI 1900 Discrete Structures Sequences Reading: Kolman, Section 1.3.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Great Theoretical Ideas in Computer Science.
Chapter 5: Sequences, Mathematical Induction, and Recursion 5.9 General Recursive Definitions and Structural Induction 1 Erickson.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Deterministic Finite Automata COMPSCI 102 Lecture 2.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Mathematical Induction Section 5.1. Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If.
Classifications LanguageGrammarAutomaton Regular, right- linear Right-linear, left-linear DFA, NFA Context-free PDA Context- sensitive LBA Recursively.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
CompSci 102 Discrete Math for Computer Science March 13, 2012 Prof. Rodger Slides modified from Rosen.
Great Theoretical Ideas in Computer Science for Some.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
Chapter 5 With Question/Answer Animations 1. Chapter Summary Mathematical Induction - Sec 5.1 Strong Induction and Well-Ordering - Sec 5.2 Lecture 18.
Lecture 02: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Lecture 03: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Unix RE’s Text Processing Lexical Analysis.   RE’s appear in many systems, often private software that needs a simple language to describe sequences.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
 2004 SDU Uniquely Decodable Code 1.Related Notions 2.Determining UDC 3.Kraft Inequality.
Introduction to Automata Theory
CSE 105 theory of computation
CSE 105 theory of computation
Introduction to Automata Theory
Review: Compiler Phases:
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
Compiler Construction
CSE 105 theory of computation
CSE 105 theory of computation
Presentation transcript:

Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult to input to a computer. The formal description is unpleasant, lengthy, and difficult to write. What we need is a simpler method of specifying languages - something we can write down in a line of text. Something like 0(0+1)*... We call the Regular Expressions

Remember Propositional Logic Some strings of symbols are well-formed formulas of Propositional Logic; others are not So, Propositional Logic is a formal language The strings in the language have a meaning (i.e., semantics) All of these things will also be true of the language of Regular Expressions (Added complication: the purpose of a Regular Expressions is: to define a formal language)

Syntax Definition of Propositional Logic (CS2010) General recursive definition for “statements”: –All proposition letters (P, Q, R, S, etc.) are statements –If α and β are statements then (not α) is a statement (α or β) is a statement (α and β) is a statement –Nothing else is a statement not, or, and are called connectives 3 CS102 2

Formula induction: a trivial example Theorem: every statement contains at least 1 proposition letter Proof by formula induction: Base step: Is it true that the theorem holds for all statements with 0 connectives? Induction step: If the theorem holds for all statements with n connectives, does it follow that it holds for all statements with n+1?

Formula induction: a trivial example Theorem: every statement contains at least 1 proposition letter Proof by formula induction: Base step: Is it true that the theorem holds for all statements with 0 connectives? YES, because they contain 1 Induction step: If the theorem holds for all statements with n connectives, does it follow that it holds for all statements with n+1? YES, because connectives are added, never removed So: theorem holds for any finite number of connectives

Formula induction: 2 nd example Theorem: every statement contains an even number of brackets Proof by (a slightly different!) formula induction: Base step: Is it true that the theorem holds for all statements with 0 connectives? Yes, 0 is even. Induction step: If the theorem holds for all statements with fewer than n connectives, does it follow that it holds for all statements with n connectives? Yes, for proof see next page. It follows that the theorem holds for statements containing any number of connectives (i.e., for all statements)

Formula induction: 2 nd example Proof of the induction step: Suppose the theorem holds for all statements with fewer than n connectives. Now consider any statement with n connectives. Call this statement A. Following the syntax definition, A can have 3 forms: –If A = (not B). Then B has an even number, say k, of brackets (because B contains fewer than n connectives!) hence A has k+2, which is even. –If A = (B or C). Then B has even number, say k, of brackets and C has even number, say m, brackets, hence A has k+m+2 brackets, which is even. –If A = (B and C). Then reason analogously.

Back to Regular Expressions (REs): –syntax and semantics of the language of REs –a proof (with formula induction) about all REs

Regular Expressions (i) denotes { },  denotes {}, t denotes {t} for t  T; (ii) if r and s are regular expressions denoting languages R and S, then (r + s) denoting R + S, (rs) denoting RS, and (r*) denoting R* are regular expressions; (iii) nothing else is a regular expression over T. Let T be an alphabet. A regular expression over T defines a language over T as follows: A language L (over T) is a regular language (over T) if there is a regular expression denoting it. Note: we can omit most of the brackets assuming this precedence rule: * > concatenation > +. For example, we can write ((a*) (b + (c + d))) as a* (b + c + d)

Regular Languages: getting started What language is denoted by a* (b + c + d)

Regular Languages: getting started What language is denoted by a* (b + c + d) A sequence of 0 or more a, followed by b or c or d

Regular Languages: getting started What language is denoted by a* (b + c + d) Please enumerate the first 10 elements of the language using lexical ordering

Regular Languages: getting started What language is denoted by a* (b + c + d) Enumerate the first 10 elements of the language using lexical ordering b,c,d,ab,ac,ad,aab,aac,aad,aaab

Regular Languages: getting started What language is denoted by a* (b + c + d) Enumerate the first 10 elements of the language using lexical ordering b,c,d,ab,ac,ad,aab,aac,aad,aaab With dictionary ordering?

Regular Languages: getting started What language is denoted by a* (b + c + d) Enumerate the first 10 elements of the language using lexical ordering b,c,d,ab,ac,ad,aab,aac,aad,aaab With dictionary ordering? ab, aab, aaab, aaaab, aaaaab, aaaaaab, aaaaaaab, aaaaaaaab, aaaaaaaaab, aaaaaaaaaab

Regular Languages Examples: (i) If a ε T then aT* denotes the language consisting of all strings over T starting with a. (ii) (0+1)(0+1)* denotes...

Regular Languages Examples: (i) If a ε T then aT* denotes the language consisting of all strings over T starting with a. (ii) (0+1)(0+1)* denotes the set of all (nonempty) bitstrings.

Regular Languages Examples: (i) If a ε T then aT* denotes the language consisting of all strings over T starting with a. (ii) (iii) (1+01)*( +0) denotes...

Regular Languages Examples: (i) If a ε T then aT* denotes the language consisting of all strings over T starting with a. (ii) (iii) (1+01)*( +0) denotes the set of all bitstrings not containing two adjacent 0's

Regular Languages Write as a regular expression: 1.The language of strings that contain one a and any number (including 0) of b’s. 2.The language of bitstrings that have 00 as a substring 3.The language of bitstrings having either 00 or 11 as substring.

Regular Languages Write as a regular expression: 1.The language of strings that contain one a and any number (including 0) of b’s: b*ab*

Regular Languages Write as a regular expression: 1.The language of strings that contain one a and any number (including 0) of b’s: b*ab* 2.The language of bitstrings that have 00 as a substring: (0+1)* 00 (0+1)*

Regular Languages Write as a regular expression: 1.The language of strings that contain one a and any number (including 0) of b’s: b*ab* 2.The language of bitstrings that have 00 as a substring: (0+1)* 00 (0+1)* 3.The language of bitstrings having either 00 or 11 as substring: (0+1)* (00+11) (0+1)*

Regular Languages Write as a regular expression: 4. The language of bitstrings that contain exactly one 1 and exactly two The language of bitstrings that contain the same number of 0 and 1

Regular Languages Write as a regular expression: 4. The language of bitstrings that contain exactly one 1 and exactly two

Regular Languages Write as a regular expression: 4. The language of bitstrings that contain exactly one 1 and exactly two The language of bitstrings that contain the same number of 0 and 1 No solution!

Applications of Regular Expressions 1. Searching for strings of characters in UNIX in ex and other editors, and using grep and egrep. Note: ex and grep use restricted versions of regular expressions. Example: the ex command /a*[abc]/ means find any line containing a substring starting with any number of a's followed by an a, b, or c. 2. Lexical analysis, the initial phase in compiling, divides the source program into "tokens". The definition of how to divide up the source code is given by regular expressions.

Regular Languages Theorem: (proof is trivial) If A and B are regular languages, then so are: A + B, AB and A* Additional notations: if T is an alphabet, then T also denotes the regular language consisting of strings over T of length 1. t n denotes ttt...t n times.

Properties of Regular Languages Theorem: If A and B are regular languages, so are A  B and A'. Theorem: Any finite language is regular.

Various additional notations, e.g.: If r and s are regular expressions denoting languages R and S, then (r + ) denoting R + is a regular expression If T is an alphabet, then T also denotes the regular language consisting of strings over T of length 1. t n denotes ttt...t n times. These additions do not extend the expressive power of regular expressions and will not be considered when we prove things …