Context Free Grammars.

Slides:



Advertisements
Similar presentations
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
CS5371 Theory of Computation
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
Transparency No. P2C1-1 Formal Language and Automata Theory Part II Pushdown Automata and Context-Free Languages.
Specifying Languages CS 480/680 – Comparative Languages.
COP4020 Programming Languages
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Theory Of Automata By Dr. MM Alam
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Lecture # 19. Example Consider the following CFG ∑ = {a, b} Consider the following CFG ∑ = {a, b} 1. S  aSa | bSb | a | b | Λ The above CFG generates.
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
Lecture # 5 Pumping Lemma & Grammar
Complexity and Computability Theory I Lecture #9 Instructor: Rina Zviel-Girshin Lea Epstein.
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CMSC 330: Organization of Programming Languages
Lecture 11 Theory of AUTOMATA
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Context-Free Languages
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Lecture 16: Modeling Computation Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output.
Lecture 04: Theory of Automata:08 Transition Graphs.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Chomsky Normal Form.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Lecture 02: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Lecture 03: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Theory of Computation Automata Theory Dr. Ayman Srour.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Recap lecture 31 Context Free Grammar, Terminals, non- terminals, productions, CFG, context Free language, examples.
Context Free Grammars & Parsing CPSC 388 Fall 2001 Ellen Walker Hiram College.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Context-Free Grammars: an overview
Formal Language & Automata Theory
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Theory Of Automata By Dr. MM Alam
CHAPTER 2 Context-Free Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Theory of Computation Lecture #
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Recap lecture 30 Deciding whether two languages are equivalent or not, example, deciding whether an FA accept any string or not, method 3, examples, finiteness.
Context Free Grammars-II
Presentation transcript:

Context Free Grammars

Context Free Grammars Three fundamental areas covered in the book are 1. Theory of Automata 2. Theory of Formal Languages 3. Theory of Turing Machines We have completed the first area. We begin exploring the second area in this chapter.

Chapter topics Syntax As a Method for Defining Languages Symbolism for Generative Grammars Trees Lukasiewicz Notation Ambiguity The Total Language Tree

Syntax as a Method for Defining Languages In Chapter 3 we recursively defined the set of valid arithmetic expressions as follows: Rule 1: Any number is in the set AE. Rule 2: If x and y are in AE, then so are (x), -(x), (x + y), (x - y), (x * y), (x/y), (x ** y) where ** is our notation for exponentiation Note that we use parentheses around every component factor to avoid ambiguity expressions such as 3 + 4 - 5 and 8/4/2. There is a different way for defining the set AE: using a set of substitution rules similar to the grammatical rules.

Defining AE by substitution rules Start → AE AE → (AE + AE) AE → (AE - AE) AE → (AE * AE) AE → (AE/AE) AE → (AE ** AE) AE → (AE) AE → -(AE) AE → AnyNumber

Example We will show that ((3 + 4) * (6 + 7)) is in AE Start  AE  (AE * AE) ((AE + AE) * (AE + AE)) ((3 + 4) * (6 + 7))

Definition of Terms A word that cannot be replaced by anything is called terminal. – In the above example, the terminals are the phrase AnyNumber, and the symbols + - * / ** ( ) A word that must be replaced by other things is called nonterminal. – The nonterminals are Start and AE. The sequence of applications of the rules that produces the finished string of terminals from the starting symbol is called a derivation or a generation of the word. The grammatical rules are referred to as productions.

Symbolism for Generative Grammars Definition: A context-free grammar (CFG) is a collection of three things: 1. An alphabet ∑ of letters called terminals from which we are going to make strings that will be the words of a language. 2. A set of symbols called nonterminals, one of which is the symbol S, standing for “start here”. 3. A finite set of productions of the form: One nonterminal → finite string of terminals and/or nonterminals where the strings of terminals and nonterminals can consist of only terminals, or of only nonterminals, or of any mixture of terminals and nonterminals, or even the empty string. We require that at least one production has the nonterminal S as its left side.

Definition: The language generated by a CFG is the set of all strings of terminals that can be produced from the start symbol S using the productions as substitutions. A language generated by a CFG is called a context-free language (CFL). Notes: The language generated by a CFG is also called the language defined by the CFG, or the language derived from the CFG, or the language produced by the CFG. We insist that nonterminals be designated by capital letters, whereas terminals are designated by lowercase letters and special symbols.

Example Let the only terminal be a and the productions be Prod1 S → aS Prod2 S → Λ If we apply Prod 1 six times and then apply Prod 2, we generate the following: S  aS  aaS  aaaS  aaaaS  aaaaaS  aaaaaaS  aaaaaaΛ = aaaaaa If we apply Prod2 without Prod1, we find that Λ is in the language generated by this CFG. Hence, this CFL is exactly a*. Note: the symbol “→” means “can be replaced by”, whereas the symbol “” means “can develop to”.

The word ab can be generated by the derivation S  aS  abS  abΛ = ab Let the terminals be a and b, the only nonterminal be S, and the productions be Prod1 S → aS Prod2 S → bS Prod3 S → Λ The word ab can be generated by the derivation S  aS  abS  abΛ = ab The word baab can be generated by S  bS  baS  baaS  baabS  baabΛ = baab Clearly, the language generated by the above CFG is (a + b)*. SaS SbS Sa Sb

Example Let the terminals be a and b, the the nonterminal be S and X, and the productions be Prod 1 S → XaaX Prod 2 X → aX Prod 3 X → bX Prod 4 X → Λ We already know from the previous example that the last three productions will generate any possible strings of a’s and b’s from the nonterminal X. Hence, the words generated from S have the form anything aa anything

Hence, the language produced by this CFG is (a + b)*aa(a + b)* which is the language of all words with a double a in them somewhere. For example, the word baabb can be generated by S  XaaX  bXaaX  baaX  baaX  baabX  baabbX  baabbΛ = baabb

Example Consider the CFG: S → aSb S → Λ It is easy to verify that the language generated by this CFG is the non-regular language {anbn}. For example, the word a4b4 is derived by S  aSb  aaSbb  aaaSbbb  aaaaSbbbb  aaaaΛbbbb = aaaabbbb

Derivation and some Symbols If v and w are strings of terminals and non-terminals denotes the derivation of w from v of length steps derivation of w from v in one or more steps derivation of w from v in zero or more steps of application of rules of grammar G.

Sentential Form A string w ε(n U ∑)* is a sentential form of G if there is a derivation A string w is a sentence of G if there is a derivation in G The language of G, denoted by L(G) is the set

Example It is not difficult to show that the following CFG generates the non-regular language {anban}: S → aSa S → b Can you show that the CFG below generates the language PALINDROME, another non-regular language? S → bSb S → a S → Λ

Disjunction Symbol | Let us introduce the symbol | to mean disjunction (or). We use this symbol to combine all the productions that have the same left side. For example, the CFG Prod 1 S → XaaX Prod 2 X → aX Prod 3 X → bX Prod 4 X → Λ can be written more compactly as Prod 1 S → XaaX Prod 2 X → aX|bX|Λ

Example SaSa | aBa BbB | b First production builds equal number of a’s on both sides and recursion is terminated by SaBa Recursion of BbB may add any number of b’s and terminates with Bb L(G) = {anbman n>0, m>0}

example L(G) = {anbmcmd2n | n>0, m>0} Consider relationship between leading a’s and trailing d’s. SaSdd In the middle equal number of b’s and c’s SA AbAc This middle recursion terminates by Abc. Grammar will be SaSdd | aAdd AbAc | bc

Example Consider another CFG SaSb | aSbb | Λ Language defined is L(G) = {anbm | 0 ≤ n≤ m≤ 2n}

Examples Let G S  abScB | Λ B  bB | b L(G) = {(ab)n(cbm)n |n≥0,m>0} G1: SAB AaA | a BbB | Λ And G2: S aS | aB Both generates a+b*

Example A grammar that generates the language consisting of even-length string over {a, b} S  aO | bO | Λ O  aS | bS S and O work as counters i.e. when an S is in a sentential form that marks even number of terminals have been generated Presence of O in a sentential form indicates that an odd number of terminals have been generated. The strategy can be generalized, say for string of length exactly divisible by 3 we need three counters to mark 0, 1, 2 S  aP | bP | Λ P  aQ | bQ Q  aS | bS

Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s We can use the strategy of previous example i.e. non-terminal represents the current states of the derived string The non-terminals with interpretations are: Non-Terminal Interpetation S Even a’s and Even b’s A Even a’s and Odd b’s B Odd a’s and Even b’s C Odd a’s and Odd b’s

Even-Even contd. The grammar will be S  aB | bA | Λ A  aC | bS B  aS | bC C  aA | bB When S is present, the derived string satisfies the Even-Even condition as S  Λ Construct a CFG over {a, b} generating all strings that do not contain abc?

Remarks We have seen that some regular languages can be generated by CFGs, and some non-regular languages can also be generated by CFGs. In Chapter 13, we will show that ALL regular languages can be generated by CFGs. In Chapter 16, we will see that there is some non-regular language that cannot be generated by any CFG. Thus, the set of languages generated by CFGs is properly larger than the set of regular languages, but properly smaller than the set of all possible languages.

Trees Consider the following CFG: S → AA A → AAA|bA|Ab|a The derivation of the word bbaaaab is as follows: S  AA  bAAAA  bbAaaAb  bbaaaab

We can use a tree diagram to show that derivation process: We start with the symbol S. Every time we use a production to replace a nonterminal by a string, we draw downward lines from the nonterminal to EACH character in the string. Reading from left to right produces the word bbaaaab. Tree diagrams are also called syntax trees, parse trees, generation trees, production trees, or derivation trees.

Lukasiewicz Notation Consider the following CFG for a simplified version of arithmetic expressions: S  S + S | S * S | number where the only nonterminal is S, and the terminals are number together with the symbols +,* . Obviously, the expression 3 + 4 * 5 is a word in the language defined by this CFG; however, it is ambiguous since it is not clear whether it means (3 + 4) * 5 (which is 35), or 3 + (4 * 5) (which is 23). To avoid ambiguity, we often need to use parentheses, or adopt the convention of “hierarchy of operators” (i.e., * is to be executed before +). We now present a new notation that is unambiguous but does not rely on operator hierarchy or on the use of parentheses.

New Notation: Lukasiewicz notation We can now construct a new notation for arithmetic expressions: – We walk around the tree and write down symbols, once each, as we encounter them. – We begin on the left side of the start symbol S and head south. – As we walk around the tree, we always keep our left hand on the tree.

Using the algorithm above, the first derivation tree is converted into the notation: + 3 * 4 5. The second derivation tree is converted into * + 3 4 5.

To evaluate a string in this new notation, we proceed as follows: The above strings are said to be in operator prefix notation because the operator is written in front of the operands it combines. We will show that this notation is unambiguous. Each of the above strings determines a unique calculation without the need for using parentheses or operator hierarchy. To evaluate a string in this new notation, we proceed as follows: – Read the string from left to right. – When we find the first substring of the form operator - operand - operand (or o-o-o for short), we replace these three symbols with the result of the indicated arithmetic calculation. – We then re-scan the string form the left. – We continue this process until there is only one number left, which is the value of the original expression.

Example Consider the expression: + 3 * 4 5:

Consider the second expression: * + 3 4 5:

Example Convert the following arithmetic expression into operator prefix notation: ((1 + 2) * (3 + 4) + 5) * 6. This normal notation is called operator infix notation, with which we need parentheses to avoid ambiguity. Let’s us draw the derivation tree:

Reading around the tree gives the equivalent prefix notation expression: * + * + 1 2 + 3 4 5 6.

Evaluate the String

This operator prefix notation was invented by Lukasiewicz (1878 - 1956) and is often called Polish notation. There is a similar operator postfix notation (also called Polish notation), in which the operation symbols (+, -, ...) come after the operands. This can be derived by tracing around the tree of the other side, keeping our right hand on the tree and then reversing the resultant string. Both these methods of notation are useful for computer science: Compilers often convert infix to prefix and then to assembler code.

Ambiguity- example Consider the language generated by the following CFG: S → AB A → a B → b There are two derivations of the word ab: S  AB  aB  ab or S  AB  Ab  ab

However, These two derivations correspond to the same syntax tree: The word ab is therefore not ambiguous. In general, when all the possible derivation trees are the same for a given word, then the word is unambiguous.

Ambiguity - Definition A CFG is called ambiguous if for at least one word in the language that it generates, there are two possible derivations of the word that correspond to different syntax trees. If a CFG is not ambiguous, it is called unambiguous.

Example The following CFG defines the language of all non-null strings of a’s: S → aS| Sa |a The word a3 can be generated by 4 different trees: