Languages and Strings Chapter 2. (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2) Parsing: Create a tree.

Slides:



Advertisements
Similar presentations
CS3012 Formal Languages and Compilers Frank Guerin Room 227 Lectures Monday11:00MT3 Tuesday9:00105 St. Marys Tutorials Thursday13:00 Thursday 14:00definitely.
Advertisements

Formal Languages Languages: English, Spanish,... PASCAL, C,... Problem: How do we define a language? i.e. what sentences belong to a language? e.g.Large.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
CS 310 – Fall 2006 Pacific University CS310 Strings, String Operators, and Languages Sections: August 30, 2006.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
1 Languages and Finite Automata or how to talk to machines...
Normal forms for Context-Free Grammars
CS 3240 – Chuck Allison.  A model of computation  A very simple, manual computer (we draw pictures!)  Our machines: automata  1) Finite automata (“finite-state.
Topics Automata Theory Grammars and Languages Complexities
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Discrete Mathematics CS 2610 March 26, 2009 Skip: structural induction generalized induction Skip section 4.5.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
CSC312 Automata Theory Lecture # 2 Languages.
CSC312 Automata Theory Lecture # 2 Languages.
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Formal Methods in SE Theory of Automata Qasiar Javaid Assistant Professor Lecture # 06.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
1 CSSE 350 Automata, Formal Languages, and Computability.
Two examples English-Words English-Sentences alphabet S ={a,b,c,d,…}
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
MA/CSSE 474 Theory of Computation Functions, Closure, Decision Problems.
1 Chapter 1 Introduction to the Theory of Computation.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet:
1 Exercise: Prove that the set S = { π : π is a permutation of {1, 2, 3, …, n} for some integer n ≥ 1 } is countable.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
Strings and Languages CS 130: Theory of Computation HMU textbook, Chapter 1 (Sec 1.5)
Chapter 4 Pumping Lemma Properties of Regular Languages Decidable questions on Regular Languages.
Why Study the Theory of Computation? Implementations come and go. Chapter 1.
Regular Expressions Chapter 6. Regular Languages Regular Language Regular Expression Finite State Machine Recognizes or Accepts Generates.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CS 203: Introduction to Formal Languages and Automata
Recursive Definations Regular Expressions Ch # 4 by Cohen
Languages and Strings Chapter 2.
Formal Languages and Grammars
Lecture 2 Overview Topics What I forgot from last lecture Proof techniques continued Alphabets, strings, languages Automata June 2, 2015 CSCE 355 Foundations.
Strings and Languages Denning, Section 2.7. Alphabet An alphabet V is a finite nonempty set of symbols. Each symbol is a non- divisible or atomic object.
1 Strings and Languages Lecture 2-3 Ref. Handout p12-17.
Lecture 02: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
1 Let S = { π : π is a permutation of {1, 2, 3, …, n} for some integer n ≥ 1 }. (a) List the elements of S for n= 1, 2, and 3. (b) Prove that the set S.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
MA/CSSE 474 Theory of Computation Languages, prefixes, sets, cardinality, functions.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Languages and Strings Chapter 2. (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2) Parsing: Create a tree.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Languages.
Lexical Analysis CSE 340 – Principles of Programming Languages
Languages Prof. Busch - LSU.
Languages Costas Busch - LSU.
Formal Language & Automata Theory
Theory of Computation Theory of computation is mainly concerned with the study of how problems can be solved using algorithms.  Therefore, we can infer.
Languages, prefixes, sets, cardinality, functions
CMPS 3223 Theory of Computation
Assume that p, q, and r are in
Review: Compiler Phases:
Languages, prefixes, sets, cardinality, functions
Chapter 1 Introduction to the Theory of Computation
CSC312 Automata Theory Lecture # 2 Languages.
MA/CSSE 474 Theory of Computation
Languages Fall 2018.
CSCE 355 Foundations of Computation
Presentation transcript:

Languages and Strings Chapter 2

(1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2) Parsing: Create a tree that corresponds to the sequence of operations that should be executed, e.g., / (3) Optimization: Realize that we can skip the first assignment since the value is never used and that we can precompute the arithmetic expression, since it contains only constants. (4) Termination: Decide whether the program is guaranteed to halt. (5) Interpretation: Figure out what (if anything) useful it does. Let's Look at Some Problems int alpha, beta; alpha = 3; beta = (2 + 5) / 10;

A Framework for Analyzing Problems We need a single framework in which we can analyze a very diverse set of problems. The framework we will use is Language Recognition A language is a (possibly infinite) set of finite length strings over a finite alphabet.

Strings A string is a finite sequence, possibly empty, of symbols drawn from some alphabet .  is the empty string.  * is the set of all possible strings over an alphabet . Alphabet nameAlphabet symbolsExample strings The English alphabet { a, b, c, …, z } , aabbcg, aaaaa The binary alphabet { 0, 1 } , 0, A star alphabet{ ,, , , ,  } ,,  A music alphabet { , , , , , , } , 

Functions on Strings Counting: |s| is the number of symbols in s. |  | = 0 | | = 7 # c (s) is the number of times that c occurs in s. # a ( abbaaa ) = 4.

More Functions on Strings Concatenation: st is the concatenation of s and t. If x = good and y = bye, then xy = goodbye. Note that |xy| = |x| + |y|.  is the identity for concatenation of strings. So:  x (x  =  x = x). Concatenation is associative. So:  s, t, w ((st)w = s(tw)).

More Functions on Strings Replication: For each string w and each natural number i, the string w i is: w 0 =  w i+1 = w i w Examples: a 3 = aaa ( bye ) 2 = byebye a 0 b 3 = bbb

More Functions on Strings Reverse: For each string w, w R is defined as: if |w| = 0 then w R = w =  if |w|  1 then:  a   (  u   * (w = ua)). So define w R = a u R.

Concatenation and Reverse of Strings Theorem: If w and x are strings, then (w x) R = x R w R. Example: ( nametag ) R = ( tag ) R ( name ) R = gateman

Concatenation and Reverse of Strings Proof: By induction on |x|: |x| = 0: Then x = , and (wx) R = (w  ) R = (w) R =  w R =  R w R = x R w R.  n  0 (((|x| = n)  ((w x) R = x R w R ))  ((|x| = n + 1)  ((w x) R = x R w R ))): Consider any string x, where |x| = n + 1. Then x = u a for some character a and |u| = n. So: (w x) R = (w (u a)) R rewrite x as ua = ((w u) a) R associativity of concatenation = a (w u) R definition of reversal = a (u R w R )induction hypothesis = (a u R ) w R associativity of concatenation = (ua) R w R definition of reversal = x R w R rewrite ua as x

Relations on Strings aaa is a substring of aaabbbaaa aaaaaa is not a substring of aaabbbaaa aaa is a proper substring of aaabbbaaa Every string is a substring of itself.  is a substring of every string.

The Prefix Relations s is a prefix of t iff:  x   * (t = sx). s is a proper prefix of t iff: s is a prefix of t and s  t. Examples: The prefixes of abba are: , a, ab, abb, abba. The proper prefixes of abba are: , a, ab, abb. Every string is a prefix of itself.  is a prefix of every string.

The Suffix Relations s is a suffix of t iff:  x   * (t = xs). s is a proper suffix of t iff: s is a suffix of t and s  t. Examples: The suffixes of abba are: , a, ba, bba, abba. The proper suffixes of abba are: , a, ba, bba. Every string is a suffix of itself.  is a suffix of every string.

Defining a Language A language is a (finite or infinite) set of strings over a finite alphabet . Examples: Let  = { a, b } Some languages over  : , {  }, { a, b }, { , a, aa, aaa, aaaa, aaaaa } The language  * contains an infinite number of strings, including: , a, b, ab, ababaa.

Example Language Definitions L = {x  { a, b }* : all a ’s precede all b ’s} , a, aa, aabbb, and bb are in L. aba, ba, and abc are not in L. What about: , a, aa, and bb ?

Example Language Definitions L = {x :  y  { a, b }* : x = y a } Simple English description:

More Example Language Definitions L = {} =  L = {  }

English not Well-Defined L = {w: w is a sentence in English}. Examples: Kerry hit the ball. Colorless green ideas sleep furiously. The window needs fixed. Ball the Stacy hit blue.

A Halting Problem Language L = {w: w is a C program that halts on all inputs}. Well specified. Useful

Languages Are Sets Computational definition: Generator (enumerator) Recognizer

Enumeration Enumeration: Arbitrary order More useful: lexicographic order Shortest first Within a length, dictionary order The lexicographic enumeration of: {w  { a, b }* : |w| is even} :

How Large is a Language? The smallest language over any  is , with cardinality 0. The largest is  *. How big is it?

How Large is a Language? Theorem: If    then  * is countably infinite. Proof: The elements of  * can be lexicographically enumerated by the following procedure: Enumerate all strings of length 0, then length 1, then length 2, and so forth. Within the strings of a given length, enumerate them in dictionary order. This enumeration is infinite since there is no longest string in  *. Since there exists an infinite enumeration of  *, it is countably infinite.

How Large is a Language? So the smallest language has cardinality 0. The largest is countably infinite. So every language is either finite or countably infinite.

How Many Languages Are There? Theorem: If    then the set of languages over  is uncountably infinite. Proof: The set of languages defined on  is P (  *).  * is countably infinite. If S is a countably infinite set, P (S) is uncountably infinite. So P (  *) is uncountably infinite.

Functions on Languages Set operations Union Intersection Complement Language operations Concatenation Kleene star

Concatenation of Languages If L 1 and L 2 are languages over  : L 1 L 2 = {w   * :  s  L 1 (  t  L 2 (w = st))} Examples: L 1 = { cat, dog } L 2 = { apple, pear } L 1 L 2 ={ catapple, catpear, dogapple, dogpear }

Concatenation of Languages {  } is the identity for concatenation: L{  } = {  }L = L  is a zero for concatenation: L  =  L = 

Kleene Star Kleene operator or Kleene closure L* = {  }  {w   * :  k  1 (  w 1, w 2, … w k  L (w = w 1 w 2 … w k ))} In other words, L* is the set of strings that can be formed by concatenating together zero of more strings in L Is closed under the concatenation Example: L = { dog, cat, fish } L* = { , dog, cat, fish, dogdog, dogcat, fishcatfish, fishdogdogfishcat, …} If L = , L* = ? If L = {  }, L* = ?

Stephen Cole Kleene 1909 – 1994, mathematical logician One of many distinguished students (e.g., Alan Turing) of Alonzo Church (lambda calculus) at Princeton. Best known as a founder of the branch of mathematical logic known as recursion theory. Also invented regular expressions. Kleene pronounced his last name KLAY-nee. `kli:ni and `kli:n are common mispronunciations. His son, Ken Kleene, wrote: "As far as I am aware this pronunciation is incorrect in all known languages. I believe that this novel pronunciation was invented by my father. " Kleeneness is next to Godelness Cleanliness is next to Godliness A pic of pic by me

The + Operator Sometimes useful to require at least one element of L be selected L + = L L* L + = L* - {  } iff   L L + is the closure of L under concatenation. recall closer of L needs to contain L

Concatenation and Reverse of Languages Theorem: (L 1 L 2 ) R = L 2 R L 1 R. Proof:  x (  y ((xy) R = y R x R )) Theorem 2.1 (L 1 L 2 ) R = {(xy) R : x  L 1 and y  L 2 }Definition of concatenation of languages = {y R x R : x  L 1 and y  L 2 }Lines 1 and 2 = L 2 R L 1 R Definition of concatenation of languages

What About Meaning? A n B n = { a n b n : n  0}. Do these strings mean anything? But some languages are useful because thery strings have meanings. English, Chinese, Java, C++, Perl Need a precise way to map each string to its meaning (semantics).

Semantic Interpretation Functions A semantic interpretation function assigns meanings to the strings of a language. Language is mostly infinite, no way to map each string Must define a function that knows the meanings of basic units and can compose those meanings according to some fixed set of rules, to build meanings for larger expressions Compositional semantic interpretation function The semantic interpretation function for English is arguably compositional. I gave him a fizding I’m going to give him a piece of my mind

Semantic Interpretation Functions When we define a formal language for a specific purpose, we design it so that there exists a compositional semantic interpretation function For example, C++ and Java One significant property of semantic interpretation functions for useful languages is that they are generally not one-to-one “Chocolate, please”, “I’d like chocolate”, “I’ll have chocolate” … “x++”, “x = x + 1”