CMSC 330 Exercise: Write a Ruby function that takes an array of names in “Last, First Middle” format and returns the same list in “First Middle Last” format.

Slides:



Advertisements
Similar presentations
Chapter 6 Languages: finite state machines
Advertisements

Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
YES-NO machines Finite State Automata as language recognizers.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CS5371 Theory of Computation
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Regular expressions Sipser 1.3 (pages 63-76). CS 311 Fall Looks familiar…
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Topics Automata Theory Grammars and Languages Complexities
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Theory Of Automata By Dr. MM Alam
Chapter 2 Languages.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
CSC312 Automata Theory Lecture # 2 Languages.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Introduction to Theory of Automata
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Language Definitions Lecture # 2. Defining Languages The languages can be defined in different ways, such as Descriptive definition, Recursive definition,
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
1 Chapter 1 Introduction to the Theory of Computation.
Grammars CPSC 5135.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2006.
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
1 Module 31 Closure Properties for CFL’s –Kleene Closure –Union –Concatenation CFL’s versus regular languages –regular languages subset of CFL.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Strings and Languages CS 130: Theory of Computation HMU textbook, Chapter 1 (Sec 1.5)
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2010.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
CSC312 Automata Theory Lecture # 3 Languages-II. Formal Language A formal language is a set of words—that is, strings of symbols drawn from a common alphabet.
CS 203: Introduction to Formal Languages and Automata
Recursive Definations Regular Expressions Ch # 4 by Cohen
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Strings and Languages Denning, Section 2.7. Alphabet An alphabet V is a finite nonempty set of symbols. Each symbol is a non- divisible or atomic object.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
Week 13 - Wednesday.  What did we talk about last time?  Exam 3  Before review:  Graphing functions  Rules for manipulating asymptotic bounds  Computing.
1 Strings and Languages Lecture 2-3 Ref. Handout p12-17.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
Theory of Computation Lecture #
Formal Language & Automata Theory
LANGUAGES Prepared by: Paridah Samsuri Dept. of Software Engineering
Assume that p, q, and r are in
Review: Compiler Phases:
Chapter 1 Introduction to the Theory of Computation
Languages Fall 2018.
Presentation transcript:

CMSC 330 Exercise: Write a Ruby function that takes an array of names in “Last, First Middle” format and returns the same list in “First Middle Last” format. CHALLENGE: Can you do it in a single line?

CMSC 330: Organization of Programming Languages Theory of Regular Expressions

CMSC 3303 Switching gears That’s it for the basics of Ruby –If you need other material for your projects, come to office hours or check out the documentation Next up: How do regular expressions work? –Mixture of a very practical tool (string matching) and some nice theory –Stephen Cole Kleene and Ken Thompson (1950’s) –A great computer science result

CMSC 3304 Thinking about Regular Expressions (REs) What does a regular expression represent? –Just a (possibly large) set of strings What are the basic components of REs? –Is there a minimal set of constructs? –E.g., we saw that e+ is the same as ee* How are REs implemented? –We’ll see how to build a structure to parse REs First, some definitions...

CMSC 3305 Definition: Alphabet An alphabet is a finite set of symbols –Usually denoted Σ Example alphabets: –Binary: –Decimal: –Alphanumeric: –Greek: Σ = {0,1} Σ = {0,1,2,3,4,5,6,7,8,9} Σ = {0-9,a-z,A-Z} Σ = {α,β,γ,δ,ε,ζ,η,θ,ι,κ,λ,μ,ν, ξ,ο,π,ρ,σ,ς,τ,υ,φ,χ,ψ,ω}

CMSC 3306 Definition: String A string is a finite sequence of symbols from Σ –ε is the empty string –|s| is the length of string s |Hello| = 5, |ε| = 0 –Note: Ø is the empty set (with 0 elements) Ø = { } Ø ≠ { ε }

CMSC 3307 Definition: Concatenation Concatenation is indicated by juxtaposition –If s 1 = super and s 2 = hero, then s 1 s 2 = superhero –Sometimes also written s 1 ·s 2 –For any string s, we have sε = εs = s –You can concatenate strings from different alphabets, then the new alphabet is the union of the originals: If s 1 = super  Σ 1 = {s,u,p,e,r} and s 2 = hero  Σ 2 = {h,e,r,o}, then s 1 s 2 = superhero  Σ 3 = {e,h,o,p,r,s,u}

CMSC 3308 Definition: Language A language is a set of strings over an alphabet Example: The set of all strings over Σ –Often written Σ* Example: The set of phone numbers over the alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 9, (, ), -} –Give an example element of this language –Are all strings over the alphabet in the language? –Is there a Ruby regular expression for this language? /\(\d{3,3}\)\d{3,3}-\d{4,4}/ No (123)

CMSC 3309 Languages (cont’d) Example: The set of strings of length 0 over the alphabet Σ = {a, b, c} –{s | s ∊ Σ* and |s| = 0} = {ε} ≠ Ø Example: The set of all valid Ruby programs –Is there a regular expression for this language? Can REs represent all possible languages? –The answer turns out to be no! –The languages represented by regular expressions are called, appropriately, the regular languages No. Matching (an arbitrary number of) brackets so that they are balanced is impossible. { { { … } } }

CMSC Operations on Languages Let Σ be an alphabet and let L, L 1, L 2 be languages over Σ Concatenation L 1 L 2 is defined as –L 1 L 2 = {xy | x ∊ L 1 and y ∊ L 2 } –Example: L 1 = {“hi”, “bye”}, L 2 = {“1”, “2”} L 1 L 2 = {“hi1”, “hi2”, “bye1”, “bye2”} Union is defined as –L 1 ∪ L 2 = { x | x ∊ L 1 or x ∊ L 2 } –Example: L 1 = {“hi”, “bye”}, L 2 = {“1”, “2”} L 1 ∪ L 2 = {“hi”, “bye”, “1”, “2”}

CMSC Operations on Languages (cont’d) Define L n inductively as –L 0 = {ε} –L n = LL n-1 for n > 0 In other words, –L 1 = LL 0 = L{ε} = L –L 2 = LL 1 = LL –L 3 = LL 2 = LLL –...

CMSC Examples of L n Let L = {a, b, c} Then –L 0 = {ε} –L 1 = {a, b, c} –L 2 = {aa, ab, ac, ba, bb, bc, ca, cb, cc}

Operations on Languages (cont’d) Kleene Closure: L* –Includes all elements in the languages L 0, L 1, L 2, etc. Example: Let L = {a} –Then, L* = {ε,a,aa,aaa,…} CMSC 330

Definition of Regexps Given an alphabet Σ, the regular expressions of Σ are defined inductively as: CMSC 330 regular expressiondenotes language ØØ ε{ε} each element σ ∊ Σ {σ}

Definition of Regexps (cont’d) Let A and B be regular expressions denoting languages L A and L B, respectively: There are no other regular expressions over Σ We use ()’s as needed for grouping CMSC 330 regular expressiondenotes language ABLALBLALB (A|B)L A U L B A*LA*LA*

CMSC Regexp Precedence Operation precedence (high to low): –Kleene closure: "*" –Concatenation –Union: "|" –Grouping: "(" and ")"

CMSC The Language Denoted by an RE For a regular expression e, we will write [[e]] to mean the language denoted by e –[[a]] = {a} –[[(a|b)]] = {a, b} If s ∊  [[re]], we say that re accepts, describes, or recognizes s.

CMSC Regular Languages The languages that can be described using regular expressions are the regular languages or regular sets Not all languages are regular –Examples (without proof): The set of palindromes over Σ –reads the same backward or forward {a n b n | n > 0 } (a n = sequence of n a’s) Almost all programming languages are not regular –But aspects of them sometimes are (e.g., identifiers) –Regular expressions are commonly used in parsing tools

CMSC Ruby Regular Expressions Almost all of the features we’ve seen for Ruby REs can be reduced to this formal definition –/Ruby/ – concatenation of single-character REs –/(Ruby|Regular)/ – union –/(Ruby)*/ – Kleene closure –/(Ruby)+/ – same as (Ruby)(Ruby)* –/(Ruby)?/ – same as (ε|(Ruby)) (// is ε) –/[a-z]/ – same as (a|b|c|...|z) –/ [^0-9]/ – same as (a|b|c|...) for a,b,c,... ∈ Σ - {0..9} –^, $ – correspond to extra characters in alphabet

Many Types of Regular Expressions POSIX Basic POSIX Extended Perl Ruby The basic theory is the same, and most constructs can be reduced to combinations of grouping and the three operations: 1.Concatenation 2.Union 3.Closure CMSC 330

21 Example 1 All strings over Σ = {a, b, c} such that all the a’s are first, the b’s are next, and the c’s last –Example: aaabbbbccc but not abcb Regexp: –This is a valid regexp because: a is a regexp ([[a]] = {a}) a* is a regexp ([[a*]] = {ε, a, aa,...}) Similarly for b* and c* So a*b*c* is a regular expression (Remember that we need to check this way because regular expressions are defined inductively.) a*b*c*

CMSC Which Strings Does a*b*c* Recognize? aabbbcc Yes; aa ∊  [[a*]], bbb ∊  [[b*]], and cc ∊  [[c*]], so entire string is in [[a*b*c*]] abb Yes, abb = abbε, and ε ∊ [[c*]] ac Yes ε aacbc No abcd No -- outside the language

CMSC Example 2 All strings over Σ = {a, b, c} Regexp: Other regular expressions for the same language? –(c|b|a)* –(a*|b*|c*)* –(a*b*c*)* –((a|b|c)*|abc) –etc. (a|b|c)*

CMSC Example 3 All whole numbers containing the substring 330 Regular expression: What if we want to get rid of leading 0’s? ( (1|...|9)(0|1|...|9)*330(0|1|...|9)* | 330(0|1|...|9)* ) Challenge: What about all whole numbers not containing the substring 330? (0|1|...|9)*330(0|1|...|9)*

CMSC Example 4 What language does this regular expression recognize? –( (1|ε)(0|1|...|9) | (2(0|1|2|3)) ) : (0|1|...|5)(0|1|...|9) All valid times written in 24-hour format –10:17 –23:59 –0:45 –8:30

CMSC Two More Examples (000|00|1)* –Any string of 0's and 1's with no single 0’s (00|0000)* –Strings with an even number of 0’s –Notice that some strings can be accepted more than one way = 00·00·00 = 00·0000 = 0000·00 –How else could we express this language? (00)* (00|000000)* (00|0000|000000)* etc…

CMSC Practice Give the regular expressions for the following languages: All valid DNA strings (including only ACGT and appearing in multiples of 3) All binary strings containing an even length (2 or greater) substring of all 1’s All binary strings containing exactly two 1’s All binary strings that start and end with the same number