Download presentation
Presentation is loading. Please wait.
1
Context Free Grammars Chapter 12
2
Compiler: Grammar: Parsing the sentence:
A compiler is program that converts a high level language code into its equivalent assembly language. Grammar: Grammar is a set of rules by which a valid sentence in a language is constructed. Parsing the sentence: Parsing is the process of analyzing a text, made of a sequence of tokens (e.g. words), to determine its grammatical structure with respect to a given formal grammar.
3
Context Free Grammar (CFG): general definition
Semantics: The grammatical rules which involve the meaning of words are called Semantics e.g. in English language, the sentence “Buildings sing” make no sense. Syntactics: The grammatical rules that don’t involve the meaning of the words but the structure of the words. Context Free Grammar (CFG): general definition A grammar or language based on rules that describe a change in the string without reference to elements not in the string. The concept of CFG was introduce by the linguist Noam Chomsky in 1956.
4
CFG Terminology: Terminals: The symbols that cannot be replaced by anything are called terminals. Non-Terminals: The symbols that must be replaced by other things are called non- terminals. e.g. variable = expr; Derivation: The sequence of application of the rules that produces the finished string of terminal from the starting symbol is called a derivation. Productions: The grammatical rules are often called productions.
5
Context Free Grammar (CFG): technical definition
A CFG is a collection of three things; An alphabet of letters called terminal, from which strings or words of the language are formed. A set of symbols called non-terminals, one of which is the symbol S, standing for :start here”. A finite set of productions of the form One non-terminal finite string of terminals and /or non-terminals
6
Context Free Grammars By definition a context-free grammar is a finite set of variables (also called non-terminals or syntactic categories - synonym for "variable") each of which represents a language. The languages represented by the variables are described recursively in terms of each other and primitive symbols called terminals. The rules relating the variables are called productions.
7
Context Free Grammars Example Strings with at least one double letter
S → aS S → Λ Continuous strings of as Strings with at least one double letter S → ADA A → aA A → bA A → Λ D → aa D → bb
8
Example S aA | bX A bA X cX
9
Context Free Language (CFL): The language generated by CFG is called context Free Language (CFL). Note: CFG can generate all regular languages and some non-regular languages, but not all the non-regular languages. Examples:
10
Context Free Grammars A context-free grammar, is a collection of three things An alphabet of letters called terminals from which strings of language are generated A set of symbols called nonterminals, one of which is a symbol S, termed as the start symbol A finite set of productions (production rules) of the form One nonterminal Finite string of Terminals and / or Nonterminals The strings of terminals and nonterminals can consist of only terminals or of only nonterminals, or of any mixture of terminals and nonterminals or even the empty string A CFG must has at least one production that has the nonterminal S at its left side
11
Context Free Grammars Nonterminal / Variables / Syntactic category
A symbol that can be substituted by some other symbol(s) Variable because the same non-terminal can have multiple substitutions Terminal A symbol that cannot be substituted further Letters from the alphabet set
12
Context Free Grammars Conventions for CFG
Nonterminals are written in upper case letters Terminals Symbols are written in lower case Terminal symbols are also called atomic symbols
13
Context Free Grammars Terminologies Generation or Derivation
The sequence of applications of the rules that produces the finished string of terminals from the starting symbol is called a generation or a derivation of the word Production The grammatical rules are called productions
14
Context Free Languages
The language generated by a CFG is the set of all strings of terminals that can be produced from the start symbol S using the productions as substitutions. A language generated by a CFG is called a Context Free Language (CFL)
15
Context Free Grammars Non terminals vs. terminals S → X S → Y X → Λ
Y → aY Y → bY Y → a Y → b
16
Context Free Grammars S → XaaX X → aX X → bX X → Λ (a+b)* aa (a+b)*
17
CFG Examples All strings that don’t end at ba
All strings that contain the substring “bbb” All strings that start and end with different letters
18
CFG Which languages do these CFGs define S → abS S → ab S → aS S → bb
19
Context Free Grammars CFG For L = {anbn n 0 1 2 3 4 …} CFG For EQUAL
S → aSb S → Λ S → ab CFG For EQUAL S → aB S → bA A → a A → aS A → bAA B → b B → bS B → aBB
20
Context Free Grammars CFG For EQUAL Can be compactly written as
S → aB S → bA A → a A → aS A → bAA B → b B → bS B → aBB Can be compactly written as S → aB | bA A → a | aS | bAA B → b | bS | Abb <S> ::= a<B> | b<A> <A >::= a | a<S> | b<A><A> <B> ::= b | b<S> | <A>bb
21
Backus-Naur Form This format for writing a CFG is called Backus-Naur Form It is abbreviated as BNF Also called Backus Normal Form Consist of arrows to define production Vertical Bars to present choices (disjunction) Terminals and non Terminals to build a production
22
Variations in CFG Notations
→ or ::= <> For NonTerminals Underline the non terminals Symbol for null Λ, ,
23
Context Free Grammars IDENTIFIER → ALPHA ALPHANUMERIC
CFG For identifier IDENTIFIER → ALPHA ALPHANUMERIC ALPHA → A|B|….|Z|a|b|c….|z ALPHANUMERIC → ALPHA ALPHANUMERIC | NUMERIC ALPHANUMERIC | Λ NUMERIC → 0|1|2…|9
24
Context Free Grammars CFG For arithmetic expressions
<expression> <expression> + <expression> <expression> <expression> * <expression> <expression> <expression> - <expression> <expression> (<expression>) <expression> <number>
25
Context Free Grammars Derivation or Generation S → abS | Λ S abS
ababS abababS ababab abab
26
Parse Trees A tree format used for the derivation of a string from the CFG Parse tree, Syntax tree, Generation tree, Production tree, Derivation tree Start symbol of the CFG at root Non terminals are represented as nodes Terminals as leaves Every next level of tree is a derivation from a production of CFG The yield of a parse tree is a terminal string held at all the leaves
27
Parse Trees Examples S → abS | Λ Derivation of abababab S a b S a b S
28
Derivation Left Most Derivation Right Most Derivation
If a word w is generated by a CFG by a certain derivation and at each step in the derivation a rule of production is applied to the leftmost nonterminal in the working string then this derivation is called a leftmost derivation Right Most Derivation
29
Ambiguity A CFG is called ambiguous if for at least one word in the language that it generated, there are two possible derivations of the word that corresponds to different syntax trees. A CFG which is not ambiguous is called unambiguous CFG
30
Ambiguous Grammars S → aS |Sa |a Derivation of aaa S → aS | a S S S S
31
Total language Tree A tree with Start symbol at its root and whose nodes are working strings of terminals and nonterminal The descendant of each node are all the possible results of applying every applicable production to the working string one at a time. A string of all terminals is a terminal node in the tree Total Language Tree
32
Total Language Tree S → aa | bX |aXX X → ab | b S aa aXX bX aabX abX
aXab aXb bab bb aabab aabb abab abb aabab abab aabb aabb
33
EBNF BNF grammars are not an ideal notation for communicating the rules to the practicing programmer EBNF provides a complex set of recursive rules
34
EBNF Notational Extensions
An optional element may be indicated by enclosing the element in square brackets [] A choice of alternatives may use the symbol | within a single rule optionally enclosed by parenthesis if needed An arbitrary sequence of instances of an element may be indicated by enclosing the element in braces followed by an asterisk {…}*
35
EBNF Examples BNF <integer> ::=<number>| +<number> | -<number> <number> ::= <digit> | <number><digit> EBNF <integer>::= [+|-]<digit>{<digit>}*
36
Problems CFG for Variable Declaration VarDec → Type Identifier;
Type → int | float | double | char Identifier → Alpha Alphanumeric Alpha → a | b | … | z | A | B … | Z Aplhanumeric → Alpha Alphanumeric | Numeric Alphanumeric | Λ Numeric → 0 | 1 | 2 | … | 9
37
Lukasiewicz Notation Prefix Notation S → S + S| S * S| number
3 + 4 * 5 S → (S + S)|(S * S)| number Derivations by replacement of NT with calculated results Arithmetic Operators are binary having operands already in proper format
38
Lukasiewicz Notation S + * 5 4 3 3+(4*5) (3+4)*5
39
Lukasiewicz Notation The operators no more remain nonterminal
S → *| + |number + → ++|+*|+number|*+|**|*number| number+| number*| number number * → ++|+*|+number|*+|**|*number|number+| number*| number number Left most derivation Pre-order traversal of the tree built from this notation gives the expression Evaluation (1+2) * (3+4) * 5 (looking for first o-o-o substring)
40
Language Span of CFGs All possible languages can be generated by CFGs
All regular languages and some of the non-regular languages can be generated by CFGs Some regular (not all) and some non-regular languages can be generated by the CFGs Which statement is true?
41
Regular Languages and CFG
A semiword is a string of terminals(may be none) concatenated with exactly one nonterminal on the right. It is of the form (terminal)(terminal)…(terminal)Nonterminal
42
Regular Languages and CFGs
All regular languages are also Context Free Therefore CFGs can be written for all RLs Theorem Given any FA, there is a CFG that generates exactly the same language accepted by the FA. All regular languages are Context Free We will prove this using the Constructive Proof of the Theorem i.e. Reduction of an FA into a CFG describing the same languages
43
Regular languages and CFGs
Conversion Algorithm The non terminals in the CFG will be all the names of the states in the FA with the start state renamed S. For every edge at a state X leading to State Y Create the production X→aY and do the same for b edges For loops add the production X → aX For every final state X, create the production X → Λ x y a
44
Regular Languages and CFG
The CFG generated through this procedure generates the same language as accepted by the FA Proof (i) Every word accepted by FA can be generated by CFG (ii) Every word generated by CFG is accepted by FA
45
Regular Languages and CFG
Example a a,b b a S- M F+ b S → aM S → bS M →aF M →bS F →aF F →bF F → Λ Derivation of babbaaba through CFG and traversal through FA
46
Regular Languages and CFG
FA to CFG Words that contain a double aa All words having different first and last letters
47
Regular Languages and CFG
Can a CFG be converted back to an FA, RE or a TG. Need a constructive algorithm if possible Would this algorithm be applicable to all CFGs What about CFGs defining non RLs: Failure !!!! FAs cant be built for non RLs Solution Differentiate CFGs defining RLs and those defining non RLs
48
Regular Languages and CFGs
Theorem If all the productions in a given CFG fit one of the two forms Nonterminal → semiword Nonterminal → word Where word can be null, the language generated by this CFG is regular
49
Regular languages and CFGs
Proof Consider a general CFG of this form N1 → w1N2 N2 → w2N3 N3 → w3N4 N4 →w (Can have many more productions) Ns are non-terminals while ws are terminals. Together they form a familiar pattern: semiword Draw and label circles for all Ns and one extra circle labeled with a +. Mark the S circle with -. For every production of the form Nx → wyNz draw a directed edge from state Nx to Nz labelled with the word w If Nx = Nz then the path is a loop For every production of the form Np → wq draw a directed edge from Np to + and label it with the word wq, even if wq is Null
50
Regular Languages and CFGs
The resultant figure is a transition graph Each path in this TG from – to + corresponds to a word generated by the CFG Conversely derivation of a word from this CFG corresponds to a path in the TG from – to +. The language of this CFG is regular
51
Regular Grammars Regular Grammars Example
A CFG is called a regular grammar if each of its productions is of one of the two forms Nonterminals → semiword Nonterminals → word Example S → aA | bB A → aS | a B → bS | b
52
Λ Productions Productions of the form
are called null (Λ) productions All grammars that generate the Λ string include at least one null production Some grammars that do not generate Λ string still might contain null productions S → aX X → Λ
53
Λ Productions Hazards of Λ Productions Solution
Create ambiguity in word derivation Pose problems in some advanced algorithms following shortly Solution Kill Them !!!
54
Killing Null Productions
Theorem If L is a context free language generated by CFG that includes Λ- productions then there is a different CFG that has no Λ- productions that generates exactly the same language L with the exception of only Λ.
55
Killing Λ Productions Constructive Algorithm Example
Identify Null Productions Remove each of them one by one For each NT having a null production, add productions where the NT has been replaced by null Example S aSa | bSb |Λ becomes S aSa | bSb |aa |bb
56
Killing Λ Productions Problem Identified !!! S a | Xb | aYa
X Y | Λ Y b | X
57
Killing Λ Productions Null able Non-terminal
In CFG a nonterminal N is called nullable if There is a production N → Λ, or There is a derivation that starts at N and leads to Λ (N …. Λ)
58
Killing Λ Productions Problem Solved !!! Modified Replacement Rule
Delete all Λ-productions Add the following productions: For every production X → old string add new productions of the form X → .. Where the right side will account for any modification of the old string that can be formed by deleting all possible subsets of nullable nonterminals while avoiding introduction of a null production in this process
59
Killing Null Productions
Not So Fast !!!!!!!!!! S → Xay | YY | aX | ZYX X → Za | bZ | ZZ | Yb Y → Ya| XY | Λ Z → aX | YYY How could one identify a nullable NT in such a complex grammar Solution A bucket of Blue Paint
60
Example Consider the CFG S a | Xb | aYa X Y | Λ Y b | X
Old nullable New Production Production X Y nothing X Λ nothing Y X nothing S Xb S b S aYa S aa So the new CFG is S a | Xb | aa | aYa |b X Y Y b | X
61
Example Consider the CFG S Xa X aX | bX | Λ
Old nullable New Production roduction S Xa S a X aX X a X bX X b So the new CFG is S a | Xa X aX | bX | a | b
62
Example S XY X Zb Y bW Z AB W Z A aA | bA | Λ
B Ba | Bb | Λ Null-able Non-terminals are? A, B, Z and W
63
Example Contd. So the new CFG is S XY X Zb | b Y bW | b
Z AB W Z A aA | bA | Λ B Ba | Bb | Λ Old nullable New Production Production X Zb X b Y bW Y b Z AB Z A and Z B W Z Nothing new A aA A a A bA A b B Ba B a B Bb B b So the new CFG is S XY X Zb | b Y bW | b Z AB | A | B W Z A aA | bA | a | b B Ba | Ba | a | b
64
Unit Productions A production of the form
Nonterminal → one Nonterminal Is called a unit production Unit productions are some times required to change the form of a working string (Arbitrary)A(arbitrary) (Arbitrary)B(Arbitrary) Unit Production are also problematic and thus need to be exterminated
65
Killing Unit Productions
Theorem If there is a CFG for the language that has no Λ-productions, then there is also a CFG for L with no Λ-productions and no unit productions
66
Killing Unit Productions
Naïve Elimination Rule Eliminate unit productions one by one and replace them with new productions without changing the language being generated by the CFG Infinite loop and no benefit Example S → A |bb A → B | b B → S | a Modified Elimination Rule Eliminate all unit productions simultaneously Look for any sequence of productions that lead to a replacement with a unit production. Replace all such derived unit productions with the final replacement.
67
Killing Unit Productions
Example S → A | bb A → B | b B → S | a Unit Productions S → A A → B B → S Derived Unit Production S → A → B A → B → S B → S → A
68
Killing Unit Productions
New CFG S → bb|b|a A → b|a|bb B → a|bb|b
69
New Format for CFG Theorem
If L is a language generated by some CFG, then there is another CFG that generated all the non-Λ words of L, all of whose productions are of one of the two basic forms Nonterminal → string of only Nonterminals Nonterminal → one terminal
70
New Format for CFG Proof
Suppose a CFG contains non terminals S, X1, X2,X3 … and two terminals a and b Add two new nonterminals A and B and two productions A → a B → b For every previous production involving terminals, replace each a with the nonterminal a and b with the nonterminal B Any production which is already in the desired form should be left untouched to avoid introduction of unit productions All the productions now are of the form Nonterminal → strings of only nonterminals Nonterminal → one terminal
71
New format for CFG Example S → X1 | X2aX2 | aSb | b X1 → X2X2 | b
X2 → aX2 | aaX1
72
Chomsky Normal Form: The Ultimate Target !
If a CFG has only productions of the form Nonterminals → strings of exactly two Nonterminals Nonterminals → one terminal It is said to be in Chomsky Normal Form, or CNF Theorem For any context Free language L, the non Λ words of the language can be generated by a CFG in CNF format
73
CNF Proof Any CFG can be converted to the following format
Nonterminal → strings of Nonterminals or Nonterminal → one terminal For this new CFG modify the productions so that they become in the CNF This conversion requires addition of new nonterminals S → X1X2X3X4 will be converted to S → X1R1 R1 → X2R2 R2 → X3X4
74
CNF Example CNF S → aSa | bSb | a | b | aa | bb S → AR1 R1 → SA
S → BR3 S → AA S → BB S → b S → a A → a B → b
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.