Context Free Grammars Chapter 12.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Context Free Grammars.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Theory Of Automata By Dr. MM Alam
CS 3240 – Chapter 6.  6.1: Simplifying Grammars  Substitution  Removing useless variables  Removing λ  Removing unit productions  6.2: Normal Forms.
CS5371 Theory of Computation
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Specifying Languages CS 480/680 – Comparative Languages.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Lecture # 19. Example Consider the following CFG ∑ = {a, b} Consider the following CFG ∑ = {a, b} 1. S  aSa | bSb | a | b | Λ The above CFG generates.
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
Lecture # 5 Pumping Lemma & Grammar
ISBN Chapter 3 Describing Syntax and Semantics.
Context Free Grammars.
CPS 506 Comparative Programming Languages Syntax Specification.
Lecture 11 Theory of AUTOMATA
Chapter 3 Describing Syntax and Semantics
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Chomsky Normal Form.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Lecture 17: Theory of Automata:2014 Context Free Grammars.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
CONTEXT-FREE LANGUAGES
Theory of Computation Lecture #
Transition Graphs.
Context-Free Grammars: an overview
CS510 Compiler Lecture 4.
Chapter 3 – Describing Syntax
Recap lecture 33 Example of trees, Polish Notation, examples, Ambiguous CFG, example,
Syntax Specification and Analysis
What does it mean? Notes from Robert Sebesta Programming Languages
Complexity and Computability Theory I
Automata and Languages What do these have in common?
CSC312 Automata Theory Grammatical Format Chapter # 13 by Cohen
Recap Lecture 34 Example of Ambiguous Grammar, Example of Unambiguous Grammer (PALINDROME), Total Language tree with examples (Finite and infinite trees),
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Theory Of Automata By Dr. MM Alam
Compiler Construction
Syntax versus Semantics
PARSE TREES.
Compiler Construction (CS-636)
Lecture 14 Grammars – Parse Trees– Normal Forms
Context-Free Grammars
Chapter 6 Simplification of Context-free Grammars and Normal Forms
CHAPTER 2 Context-Free Languages
R.Rajkumar Asst.Professor CSE
Finite Automata and Formal Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chapter 3 Describing Syntax and Semantics.
Recap lecture 30 Deciding whether two languages are equivalent or not, example, deciding whether an FA accept any string or not, method 3, examples, finiteness.
LECTURE # 07.
COMPILER CONSTRUCTION
Presentation transcript:

Context Free Grammars Chapter 12

Compiler: Grammar: Parsing the sentence: A compiler is program that converts a high level language code into its equivalent assembly language. Grammar: Grammar is a set of rules by which a valid sentence in a language is constructed. Parsing the sentence: Parsing is the process of analyzing a text, made of a sequence of tokens (e.g. words), to determine its grammatical structure with respect to a given formal grammar.

Context Free Grammar (CFG): general definition Semantics: The grammatical rules which involve the meaning of words are called Semantics e.g. in English language, the sentence “Buildings sing” make no sense. Syntactics: The grammatical rules that don’t involve the meaning of the words but the structure of the words. Context Free Grammar (CFG): general definition A grammar or language based on rules that describe a change in the string without reference to elements not in the string. The concept of CFG was introduce by the linguist Noam Chomsky in 1956.

CFG Terminology: Terminals: The symbols that cannot be replaced by anything are called terminals. Non-Terminals: The symbols that must be replaced by other things are called non- terminals. e.g. variable = expr; Derivation: The sequence of application of the rules that produces the finished string of terminal from the starting symbol is called a derivation. Productions: The grammatical rules are often called productions.

Context Free Grammar (CFG): technical definition A CFG is a collection of three things; An alphabet  of letters called terminal, from which strings or words of the language are formed. A set of symbols called non-terminals, one of which is the symbol S, standing for :start here”. A finite set of productions of the form One non-terminal  finite string of terminals and /or non-terminals

Context Free Grammars By definition a context-free grammar is a finite set of variables (also called non-terminals or syntactic categories - synonym for "variable") each of which represents a language. The languages represented by the variables are described recursively in terms of each other and primitive symbols called terminals. The rules relating the variables are called productions.

Context Free Grammars Example Strings with at least one double letter S → aS S → Λ Continuous strings of as Strings with at least one double letter S → ADA A → aA A → bA A → Λ D → aa D → bb

Example S  aA | bX A  bA X  cX

Context Free Language (CFL): The language generated by CFG is called context Free Language (CFL). Note: CFG can generate all regular languages and some non-regular languages, but not all the non-regular languages. Examples:

Context Free Grammars A context-free grammar, is a collection of three things An alphabet  of letters called terminals from which strings of language are generated A set of symbols called nonterminals, one of which is a symbol S, termed as the start symbol A finite set of productions (production rules) of the form One nonterminal Finite string of Terminals and / or Nonterminals The strings of terminals and nonterminals can consist of only terminals or of only nonterminals, or of any mixture of terminals and nonterminals or even the empty string A CFG must has at least one production that has the nonterminal S at its left side

Context Free Grammars Nonterminal / Variables / Syntactic category A symbol that can be substituted by some other symbol(s) Variable because the same non-terminal can have multiple substitutions Terminal A symbol that cannot be substituted further Letters from the alphabet set

Context Free Grammars Conventions for CFG Nonterminals are written in upper case letters Terminals Symbols are written in lower case Terminal symbols are also called atomic symbols

Context Free Grammars Terminologies Generation or Derivation The sequence of applications of the rules that produces the finished string of terminals from the starting symbol is called a generation or a derivation of the word Production The grammatical rules are called productions

Context Free Languages The language generated by a CFG is the set of all strings of terminals that can be produced from the start symbol S using the productions as substitutions. A language generated by a CFG is called a Context Free Language (CFL)

Context Free Grammars Non terminals vs. terminals S → X S → Y X → Λ Y → aY Y → bY Y → a Y → b

Context Free Grammars S → XaaX X → aX X → bX X → Λ (a+b)* aa (a+b)*

CFG Examples All strings that don’t end at ba All strings that contain the substring “bbb” All strings that start and end with different letters

CFG Which languages do these CFGs define S → abS S → ab S → aS S → bb

Context Free Grammars CFG For L = {anbn n 0 1 2 3 4 …} CFG For EQUAL S → aSb S → Λ S → ab CFG For EQUAL S → aB S → bA A → a A → aS A → bAA B → b B → bS B → aBB

Context Free Grammars CFG For EQUAL Can be compactly written as S → aB S → bA A → a A → aS A → bAA B → b B → bS B → aBB Can be compactly written as S → aB | bA A → a | aS | bAA B → b | bS | Abb <S> ::= a<B> | b<A> <A >::= a | a<S> | b<A><A> <B> ::= b | b<S> | <A>bb

Backus-Naur Form This format for writing a CFG is called Backus-Naur Form It is abbreviated as BNF Also called Backus Normal Form Consist of arrows to define production Vertical Bars to present choices (disjunction) Terminals and non Terminals to build a production

Variations in CFG Notations → or ::= <> For NonTerminals Underline the non terminals Symbol for null Λ, , 

Context Free Grammars IDENTIFIER → ALPHA ALPHANUMERIC CFG For identifier IDENTIFIER → ALPHA ALPHANUMERIC ALPHA → A|B|….|Z|a|b|c….|z ALPHANUMERIC → ALPHA ALPHANUMERIC | NUMERIC ALPHANUMERIC | Λ NUMERIC → 0|1|2…|9

Context Free Grammars CFG For arithmetic expressions <expression>  <expression> + <expression> <expression>  <expression> * <expression> <expression>  <expression> - <expression> <expression>  (<expression>) <expression>  <number>

Context Free Grammars Derivation or Generation S → abS | Λ S  abS  ababS  abababS  ababab  abab

Parse Trees A tree format used for the derivation of a string from the CFG Parse tree, Syntax tree, Generation tree, Production tree, Derivation tree Start symbol of the CFG at root Non terminals are represented as nodes Terminals as leaves Every next level of tree is a derivation from a production of CFG The yield of a parse tree is a terminal string held at all the leaves

Parse Trees Examples S → abS | Λ Derivation of abababab S a b S a b S

Derivation Left Most Derivation Right Most Derivation If a word w is generated by a CFG by a certain derivation and at each step in the derivation a rule of production is applied to the leftmost nonterminal in the working string then this derivation is called a leftmost derivation Right Most Derivation

Ambiguity A CFG is called ambiguous if for at least one word in the language that it generated, there are two possible derivations of the word that corresponds to different syntax trees. A CFG which is not ambiguous is called unambiguous CFG

Ambiguous Grammars S → aS |Sa |a Derivation of aaa S → aS | a S S S S

Total language Tree A tree with Start symbol at its root and whose nodes are working strings of terminals and nonterminal The descendant of each node are all the possible results of applying every applicable production to the working string one at a time. A string of all terminals is a terminal node in the tree Total Language Tree

Total Language Tree S → aa | bX |aXX X → ab | b S aa aXX bX aabX abX aXab aXb bab bb aabab aabb abab abb aabab abab aabb aabb

EBNF BNF grammars are not an ideal notation for communicating the rules to the practicing programmer EBNF provides a complex set of recursive rules

EBNF Notational Extensions An optional element may be indicated by enclosing the element in square brackets [] A choice of alternatives may use the symbol | within a single rule optionally enclosed by parenthesis if needed An arbitrary sequence of instances of an element may be indicated by enclosing the element in braces followed by an asterisk {…}*

EBNF Examples BNF <integer> ::=<number>| +<number> | -<number> <number> ::= <digit> | <number><digit> EBNF <integer>::= [+|-]<digit>{<digit>}*

Problems CFG for Variable Declaration VarDec → Type Identifier; Type → int | float | double | char Identifier → Alpha Alphanumeric Alpha → a | b | … | z | A | B … | Z Aplhanumeric → Alpha Alphanumeric | Numeric Alphanumeric | Λ Numeric → 0 | 1 | 2 | … | 9

Lukasiewicz Notation Prefix Notation S → S + S| S * S| number 3 + 4 * 5 S → (S + S)|(S * S)| number Derivations by replacement of NT with calculated results Arithmetic Operators are binary having operands already in proper format

Lukasiewicz Notation S + * 5 4 3 3+(4*5) (3+4)*5

Lukasiewicz Notation The operators no more remain nonterminal S → *| + |number + → ++|+*|+number|*+|**|*number| number+| number*| number number * → ++|+*|+number|*+|**|*number|number+| number*| number number Left most derivation Pre-order traversal of the tree built from this notation gives the expression Evaluation (1+2) * (3+4) * 5 (looking for first o-o-o substring)

Language Span of CFGs All possible languages can be generated by CFGs All regular languages and some of the non-regular languages can be generated by CFGs Some regular (not all) and some non-regular languages can be generated by the CFGs Which statement is true?

Regular Languages and CFG A semiword is a string of terminals(may be none) concatenated with exactly one nonterminal on the right. It is of the form (terminal)(terminal)…(terminal)Nonterminal

Regular Languages and CFGs All regular languages are also Context Free Therefore CFGs can be written for all RLs Theorem Given any FA, there is a CFG that generates exactly the same language accepted by the FA. All regular languages are Context Free We will prove this using the Constructive Proof of the Theorem i.e. Reduction of an FA into a CFG describing the same languages

Regular languages and CFGs Conversion Algorithm The non terminals in the CFG will be all the names of the states in the FA with the start state renamed S. For every edge at a state X leading to State Y Create the production X→aY and do the same for b edges For loops add the production X → aX For every final state X, create the production X → Λ x y a

Regular Languages and CFG The CFG generated through this procedure generates the same language as accepted by the FA Proof (i) Every word accepted by FA can be generated by CFG (ii) Every word generated by CFG is accepted by FA

Regular Languages and CFG Example a a,b b a S- M F+ b S → aM S → bS M →aF M →bS F →aF F →bF F → Λ Derivation of babbaaba through CFG and traversal through FA

Regular Languages and CFG FA to CFG Words that contain a double aa All words having different first and last letters

Regular Languages and CFG Can a CFG be converted back to an FA, RE or a TG. Need a constructive algorithm if possible Would this algorithm be applicable to all CFGs What about CFGs defining non RLs: Failure !!!! FAs cant be built for non RLs Solution Differentiate CFGs defining RLs and those defining non RLs

Regular Languages and CFGs Theorem If all the productions in a given CFG fit one of the two forms Nonterminal → semiword Nonterminal → word Where word can be null, the language generated by this CFG is regular

Regular languages and CFGs Proof Consider a general CFG of this form N1 → w1N2 N2 → w2N3 N3 → w3N4 N4 →w5 (Can have many more productions) Ns are non-terminals while ws are terminals. Together they form a familiar pattern: semiword Draw and label circles for all Ns and one extra circle labeled with a +. Mark the S circle with -. For every production of the form Nx → wyNz draw a directed edge from state Nx to Nz labelled with the word w If Nx = Nz then the path is a loop For every production of the form Np → wq draw a directed edge from Np to + and label it with the word wq, even if wq is Null

Regular Languages and CFGs The resultant figure is a transition graph Each path in this TG from – to + corresponds to a word generated by the CFG Conversely derivation of a word from this CFG corresponds to a path in the TG from – to +. The language of this CFG is regular

Regular Grammars Regular Grammars Example A CFG is called a regular grammar if each of its productions is of one of the two forms Nonterminals → semiword Nonterminals → word Example S → aA | bB A → aS | a B → bS | b

Λ Productions Productions of the form are called null (Λ) productions All grammars that generate the Λ string include at least one null production Some grammars that do not generate Λ string still might contain null productions S → aX X → Λ

Λ Productions Hazards of Λ Productions Solution Create ambiguity in word derivation Pose problems in some advanced algorithms following shortly Solution Kill Them !!!

Killing Null Productions Theorem If L is a context free language generated by CFG that includes Λ- productions then there is a different CFG that has no Λ- productions that generates exactly the same language L with the exception of only Λ.

Killing Λ Productions Constructive Algorithm Example Identify Null Productions Remove each of them one by one For each NT having a null production, add productions where the NT has been replaced by null Example S  aSa | bSb |Λ becomes S  aSa | bSb |aa |bb

Killing Λ Productions Problem Identified !!! S  a | Xb | aYa X  Y | Λ Y  b | X

Killing Λ Productions Null able Non-terminal In CFG a nonterminal N is called nullable if There is a production N → Λ, or There is a derivation that starts at N and leads to Λ (N  ….  Λ)

Killing Λ Productions Problem Solved !!! Modified Replacement Rule Delete all Λ-productions Add the following productions: For every production X → old string add new productions of the form X → .. Where the right side will account for any modification of the old string that can be formed by deleting all possible subsets of nullable nonterminals while avoiding introduction of a null production in this process

Killing Null Productions Not So Fast !!!!!!!!!! S → Xay | YY | aX | ZYX X → Za | bZ | ZZ | Yb Y → Ya| XY | Λ Z → aX | YYY How could one identify a nullable NT in such a complex grammar Solution A bucket of Blue Paint

Example Consider the CFG S  a | Xb | aYa X  Y | Λ Y  b | X Old nullable New Production Production X  Y nothing X  Λ nothing Y  X nothing S  Xb S  b S  aYa S  aa So the new CFG is S  a | Xb | aa | aYa |b X  Y Y  b | X

Example Consider the CFG S  Xa X  aX | bX | Λ Old nullable New Production roduction S  Xa S  a X  aX X  a X  bX X  b So the new CFG is S  a | Xa X  aX | bX | a | b

Example S  XY X  Zb Y  bW Z  AB W  Z A  aA | bA | Λ B  Ba | Bb | Λ Null-able Non-terminals are? A, B, Z and W

Example Contd. So the new CFG is S  XY X  Zb | b Y  bW | b Z  AB W  Z A  aA | bA | Λ B  Ba | Bb | Λ Old nullable New Production Production X  Zb X  b Y  bW Y  b Z  AB Z  A and Z  B W  Z Nothing new A  aA A  a A  bA A  b B  Ba B a B  Bb B  b So the new CFG is S  XY X  Zb | b Y  bW | b Z  AB | A | B W  Z A  aA | bA | a | b B  Ba | Ba | a | b

Unit Productions A production of the form Nonterminal → one Nonterminal Is called a unit production Unit productions are some times required to change the form of a working string (Arbitrary)A(arbitrary) (Arbitrary)B(Arbitrary) Unit Production are also problematic and thus need to be exterminated

Killing Unit Productions Theorem If there is a CFG for the language that has no Λ-productions, then there is also a CFG for L with no Λ-productions and no unit productions

Killing Unit Productions Naïve Elimination Rule Eliminate unit productions one by one and replace them with new productions without changing the language being generated by the CFG Infinite loop and no benefit Example S → A |bb A → B | b B → S | a Modified Elimination Rule Eliminate all unit productions simultaneously Look for any sequence of productions that lead to a replacement with a unit production. Replace all such derived unit productions with the final replacement.

Killing Unit Productions Example S → A | bb A → B | b B → S | a Unit Productions S → A A → B B → S Derived Unit Production S → A → B A → B → S B → S → A

Killing Unit Productions New CFG S → bb|b|a A → b|a|bb B → a|bb|b

New Format for CFG Theorem If L is a language generated by some CFG, then there is another CFG that generated all the non-Λ words of L, all of whose productions are of one of the two basic forms Nonterminal → string of only Nonterminals Nonterminal → one terminal

New Format for CFG Proof Suppose a CFG contains non terminals S, X1, X2,X3 … and two terminals a and b Add two new nonterminals A and B and two productions A → a B → b For every previous production involving terminals, replace each a with the nonterminal a and b with the nonterminal B Any production which is already in the desired form should be left untouched to avoid introduction of unit productions All the productions now are of the form Nonterminal → strings of only nonterminals Nonterminal → one terminal

New format for CFG Example S → X1 | X2aX2 | aSb | b X1 → X2X2 | b X2 → aX2 | aaX1

Chomsky Normal Form: The Ultimate Target ! If a CFG has only productions of the form Nonterminals → strings of exactly two Nonterminals Nonterminals → one terminal It is said to be in Chomsky Normal Form, or CNF Theorem For any context Free language L, the non Λ words of the language can be generated by a CFG in CNF format

CNF Proof Any CFG can be converted to the following format Nonterminal → strings of Nonterminals or Nonterminal → one terminal For this new CFG modify the productions so that they become in the CNF This conversion requires addition of new nonterminals S → X1X2X3X4 will be converted to S → X1R1 R1 → X2R2 R2 → X3X4

CNF Example CNF S → aSa | bSb | a | b | aa | bb S → AR1 R1 → SA S → BR3 S → AA S → BB S → b S → a A → a B → b