Context-Free Grammars Chapter 11 1. Languages and Machines 2.

Slides:

Advertisements

Similar presentations

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,

Advertisements

Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.

CS5371 Theory of Computation

Context-Free Grammars Lecture 7

104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.

Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)

Normal forms for Context-Free Grammars

MA/CSSE 474 Theory of Computation

Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.

INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.

Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.

Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.

1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.

Pushdown Automata (PDA) Intro

Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]

Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.

TM Design Universal TM MA/CSSE 474 Theory of Computation.

Context-Free Grammars Chapter 11. Languages and Machines.

A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.

Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.

Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:

Chapter 5 Context-Free Grammars

Grammars CPSC 5135.

PART I: overview material

Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.

Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.

Context-Free Grammars Chapter 11. Languages and Machines.

Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.

Chapter 3 Describing Syntax and Semantics

Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.

CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.

11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.

Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.

Chapter 5 Context-free Languages

Top-Down Parsing.

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.

Context-Free Languages

Context-Free Grammars Chapter 11. Languages and Machines.

Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,

Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)

Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.

CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.

Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.

1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.

Context-Free Grammars Chapter 11. Languages and Machines.

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,

Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.

Theory of Languages and Automata By: Mojtaba Khezrian.

1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.

Context-Free and Noncontext-Free Languages Chapter 13.

Normal Forms (Chomsky and Greibach) Pushdown Automata (PDA) Intro PDA examples MA/CSSE 474 Theory of Computation.

CONTEXT-FREE LANGUAGES

Context-Free Grammars: an overview

CS510 Compiler Lecture 4.

Context-Free Languages

Context-Free Grammars

CHAPTER 2 Context-Free Languages

Context-Free Grammars

Structure and Ambiguity

CFGs: Formal Definition

Answer Questions about Exam2 problems

Presentation transcript:

Context-Free Grammars Chapter 11 1

Languages and Machines 2

Rewrite Systems and Grammars A rewrite system (or production system or rule- based system) is: ● a list of rules, and ● an algorithm for applying them. Each rule has a left-hand side and a right hand side. Example rules: S  a S b a S   a S b  b S ab S a 3

Simple-rewrite simple-rewrite(R: rewrite system, w: initial string) = 1. Set working-string to w. 2. Until told by R to halt do: Match the lhs of some rule against some part of working-string. Replace the matched part of working-string with the rhs of the rule that was matched. 3. Return working-string. 4

A Rewrite System Formalism A rewrite system formalism specifies: ● The form of the rules ● How simple-rewrite works: ● How to choose rules? ● When to quit? 5

An Example w = S a S (Is this ONE string or many?) Rules: [1] S  a S b [2] a S   ● What order to apply the rules? ● When to quit? SaS  aSbaS   baS  b   b SaS  aSbaS  aaSbbaS  a  bbaS  abb   abb The answer to the question is “many”? How many strings does w represent? 6

Rule-Based Systems ● Expert systems ● Cognitive modeling ● Business practice modeling ● General models of computation ● Grammars 7

Grammars Define Languages A grammar is a set of rules that are stated in terms of two alphabets: a terminal alphabet, , that contains the symbols that make up the strings in L(G), and a nonterminal alphabet, the elements of which will function as working symbols that will be used while the grammar is operating. These symbols will disappear by the time the grammar finishes its job and generates a string. A grammar has a unique start symbol, often called S. 8

Using a Grammar to Derive a String Simple-rewrite (G, S) will generate the strings in L(G). We will use the symbol  to indicate steps in a derivation. A derivation could begin with: S  a S b  aa S bb  … 9

Generating Many Strings Multiple rules may match. Given: S  a S b, S  b S a, and S   Derivation so far: S  a S b  aa S bb  Three choices at the next step: S  a S b  aa S bb  aaa S bbb (using rule 1), S  a S b  aa S bb  aab S abb (using rule 2), S  a S b  aa S bb  aabb (using rule 3). 10

Generating Many Strings One rule may match in more than one way. Given: S  a TT b, T  b T a, and T   Derivation so far: S  a TT b  Two choices at the next step: S  a TT b  ab T a T b  S  a TT b  a T b T ab  11

When to Stop May stop when: 1.The working string no longer contains any nonterminal symbols (including, when it is  ). In this case, we say that the working string is generated by the grammar. Example: S  a S b  aa S bb  aabb 12

When to Stop May stop when: 2.There are nonterminal symbols in the working string but none of them appears on the left-hand side of any rule in the grammar. In this case, we have a blocked or non-terminated derivation but no generated string. Example: Rules:S  a S b, S  b T a, and S   Derivations: S  a S b  ab T ab  [blocked] 13

When to Stop It is possible that neither (1) nor (2) is achieved. Example: G contains only the rules S  B a and B  b B, with S the start symbol. Then all derivations proceed as: S  B a  b B a  bb B a  bbb B a  bbbb B a ... In the last 2 cases, you have a “bad” grammar!! 14

Context-free Grammars, Languages, and PDAs Context-free Language Context-free Grammar PDA L Accepts 15

Recursive Grammar A grammar is recursive if it contains at least one production (rule) of the following forms: S  wSx, w or x may be empty S  wTx, T  uSv Any set of rules that begin at terminal S and derive terminal S w,x,u,v are elements of V* *Recursive grammars allow a finite grammar to generate an infinite language! 16

Self-Embedding Grammar A grammar is self-embedding if it contains at least one production (rule) of the following form: S  wTx, T  uSv w,x,u,v are elements of ∑ + –That is, Not empty Self embedding grammar allows development of non-empty strings on both sides of the embedded non-terminal 17

Self-Embedding Grammar Examples A non-empty string can be formed on both sides of a non-terminal S  aSb S  aT, T  Sb –Which is equivalent to S  aSb 18

More Powerful Grammars Regular grammars must always produce strings one character at a time, moving left to right. But it may be more natural to describe generation more flexibly. Example 1: L = ab * a S  a B a S  a B B   vs. B  a B  b B Example 2: L = { a n b * a n, n  0 } S  B S  a S a B   B  b B Key distinction: Example 1 is not self-embedding. 19

Context-Free Grammars No restrictions on the form of the right hand sides. S  ab D e FG ab But require single non-terminal on left hand side. S  but not ASB  20

AnBnAnBn 21

AnBnAnBn S  S  aSbS  S  aSb 22

Balanced Parentheses 23

Balanced Parentheses S   S  SS S  (S) 24

Context-Free Grammars A context-free grammar G is a quadruple, (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals and terminals. ●  (the set of terminals) is a subset of V, ● R (the set of rules) is a finite subset of (V -  )  V*, ● S (the start symbol) is an element of V - . Example: ({S, a, b }, { a, b }, {S  a S b, S   }, S) 25

Derivations x  G y iff x =  A  and A   is in R y =    w 0  G w 1  G w 2  G...  G w n is a derivation in G. Let  G * be the reflexive, transitive closure of  G. Then the language generated by G, denoted L(G), is: {w   * : S  G * w}. 26

An Example Derivation Example: Let G = ({S, a, b }, { a, b }, {S  a S b, S   }, S) S  a S b  aa S bb  aaa S bbb  aaabbb S  * aaabbb 27

Definition of a Context-Free Grammar A language L is context-free iff it is generated by some context-free grammar G. 28

Recursive Grammar Rules A rule is recursive iff it is X  w 1 Yw 2, where: Y  * w 3 Xw 4 for some w 1, w 2, w 3, and w in V*. A grammar is recursive iff it contains at least one recursive rule. Examples: S  (S) 29

Recursive Grammar Rules A rule is recursive iff it is X  w 1 Yw 2, where: Y  * w 3 Xw 4 for some w 1, w 2, w 3, and w in V*. A grammar is recursive iff it contains at least one recursive rule. Examples: S  (S) S  (T) 30

Recursive Grammar Rules A rule is recursive iff it is X  w 1 Yw 2, where: Y  * w 3 Xw 4 for some w 1, w 2, w 3, and w in V*. A grammar is recursive iff it contains at least one recursive rule. Examples: S  (S) S  (T) T  (S) 31

Self-Embedding Grammar Rules A rule in a grammar G is self-embedding iff it is : X  w 1 Yw 2, where Y  * w 3 Xw 4 and both w 1 w 3 and w 4 w 2 are in  +. A grammar is self-embedding iff it contains at least one self-embedding rule. Example: S  a S a is self-embedding S  a S is recursive but not self- embedding S  a T T  S a is self-embedding 32

Recursive and Self-Embedding Grammar Rules A rule in a grammar G is self-embedding iff it is : X  w 1 Yw 2, where Y  * w 3 Xw 4 and both w 1 w 3 and w 4 w 2 are in  +. A grammar is self-embedding iff it contains at least one self-embedding rule. Example: S  a S a is self-embedding S  a S is recursive but not self- embedding 33

Where Context-Free Grammars Get Their Power If a grammar G is not self-embedding then L(G) is regular. If a language L has the property that every grammar that defines it is self-embedding, then L is not regular. 34

PalEven = {ww R : w  { a, b }*} 35

PalEven = {ww R : w  { a, b }*} G = {{S, a, b }, { a, b }, R, S}, where: R = { S  a S a S  b S b S   }. 36

Equal Numbers of a ’s and b ’s Let L = {w  { a, b }*: # a (w) = # b (w)}. 37

Equal Numbers of a ’s and b ’s Let L = {w  { a, b }*: # a (w) = # b (w)}. G = {{S, a, b }, { a, b }, R, S}, where: R = { S  a S b S  b S a S  SS S   }. 38

Arithmetic Expressions G = (V, , R, E), where V = {+, *, (, ), id, E},  = {+, *, (, ), id }, R = { E  E + E E  E  E E  (E) E  id } 39

BNF The symbol | should be read as “or”. Example: S  a S b | b S a | SS |  Allow a nonterminal symbol to be any sequence of characters surrounded by angle brackets. Examples of nonterminals: A notation for writing practical context-free grammars 40

BNF for a Java Fragment ::= { } | {} ::= | ::= | while ( ) | if ( ) | do while ( ); | ; | return | return | ; 41

Spam Generation These production rules yield 1,843,200 possible spellings. How Many Ways Can You Spell By Brian HayesBrian Hayes American Scientist, July-August

HTML Item 1, which will include a sublist First item in sublist Second item in sublist Item 2 A grammar: /* Text is a sequence of elements. HTMLtext  Element HTMLtext |  Element  UL | LI | … (and other kinds of elements that are allowed in the body of an HTML document) /* The and tags must match. UL  HTMLtext /* The and tags must match. LI  HTMLtext 43

English S  NP VP NP  the Nominal | a Nominal | Nominal | ProperNoun | NP PP Nominal  N | Adjs N N  cat | dogs | bear | girl | chocolate | rifle ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VP PP V  like | likes | thinks | shots | smells PP  Prep NP Prep  with 44

Designing Context-Free Grammars ● Generate related regions together. A n B n ● Generate concatenated regions: A  BC ● Generate outside in: A  a A b 45

Outside-In Structure and RNA Folding 46

A Grammar for RNA Folding  [1]  C G[.23]  G C[.23]  A U[.23]  U A[.23]  G U[.03]  U G[.03]  … 47

Concatenating Independent Languages Let L = { a n b n c m : n, m  0}. The c m portion of any string in L is completely independent of the a n b n portion, so we should generate the two portions separately and concatenate them together. 48

Concatenating Independent Languages Let L = { a n b n c m : n, m  0}. The c m portion of any string in L is completely independent of the a n b n portion, so we should generate the two portions separately and concatenate them together. G = ({S, N, C, a, b, c }, { a, b, c }, R, S} where: R = { S  NC N  a N b N   C  c C C   }. 49

L = { : k  0 and  i (n i  0)} Examples of strings in L: , abab, aabbaaabbbabab Note that L = { a n b n : n  0}*. 50

L = { : k  0 and  i (n i  0)} Examples of strings in L: , abab, aabbaaabbbabab Note that L = { a n b n : n  0}*. G = ({S, M, a, b }, { a, b }, R, S} where: R = { S  MS S   M  a M b M   }. 51

L = { a n b m : n  m } G = (V, , R, S), where V = { a, b, S, },  = { a, b }, R = Another Example: Unequal a ’s and b ’s 52

Another Example: Unequal a’s and b’s L = { a n b m : n  m } G = (V, , R, S), where V = { a, b, S, A, B},  = { a, b }, R = S  A/* more a ’s than b ’s S  B/* more b ’s than a ’s A  a /* at least one extra a generated A  a A A  a A b B  b /* at least one extra b generated B  B b B  a B b 53

Simplifying Context-Free Grammars G = ({S, A, B, C, D, a, b }, { a, b }, R, S), where R = {S  AB | AC A  a A b |  B  a A C  b C a D  AB } 54

Unproductive Nonterminals removeunproductive(G: CFG) = 1.G = G. 2.Mark every nonterminal symbol in G as unproductive. 3.Mark every terminal symbol in G as productive. 4.Until one entire pass has been made without any new symbol being marked do: For each rule X   in R do: If every symbol in  has been marked as productive and X has not yet been marked as productive then: Mark X as productive. 5.Remove from G every unproductive symbol. 6.Remove from G every rule that contains an unproductive symbol. 7.Return G. 55

Unreachable Nonterminals removeunreachable(G: CFG) = 1.G = G. 2.Mark S as reachable. 3.Mark every other nonterminal symbol as unreachable. 4.Until one entire pass has been made without any new symbol being marked do: For each rule X   A  (where A  V -  ) in R do: If X has been marked as reachable and A has not then: Mark A as reachable. 5.Remove from G every unreachable symbol. 6.Remove from G every rule with an unreachable symbol on the left-hand side. 7.Return G. 56

Proving the Correctness of a Grammar A n B n = { a n b n : n  0} G = ({S, a, b }, { a, b }, R, S), R = { S  a S b S   } ● Prove that G generates only strings in L. ● Prove that G generates all the strings in L. 57

Proving the Correctness of a Grammar To prove that G generates only strings in L: Imagine the process by which G generates a string as the following loop: 1.st := S. 2.Until no nonterminals are left in st do: 2.1. Apply some rule in R to st. 3.Output st. Then we construct a loop invariant I and show that: ● I is true when the loop begins, ● I is maintained at each step through the loop, and ● I  (st contains only terminal symbols)  st  L. 58

A n B n = { a n b n : n  0}. G = ({S, a, b }, { a, b }, R, S), R = {S  a S b S   }. ● Prove that G generates only strings in L: Let I = (# a (st) = # b (st))  (st  a *(S   ) b *). Proving the Correctness of a Grammar 59

A n B n = { a n b n : n  0}. G = ({S, a, b }, { a, b }, R, S), R = {S  a S b S   }. ● Prove that G generates all the strings in L: Base case: |w| = 0. Prove: If every string in A n B n of length k, where k is even, can be generated by G, then every string in A n B n of length k + 2 can also be generated. For any even k, there is exactly one string in A n B n of length k: a k/2 b k/2. There is also only one string of length k + 2, namely aa k/2 b k/2 b. It can be generated by first applying rule (1) to produce a S b, and then applying to S whatever rule sequence generated a k/2 b k/2. By the induction hypothesis, such a sequence must exist. Proving the Correctness of a Grammar 60

L = {w  { a, b }*: # a (w) = # b (w)} 61

L = {w  { a, b }*: # a (w) = # b (w)} G = {{S, a, b }, { a, b }, R, S}, where: R = {S  a S b (1) S  b S a (2) S  SS(3) S   }.(4) ● Prove that G generates only strings in L: Let  (w) = # a (w) - # b (w). Let I = st  { a, b, S}*   (st) = 0. 62

L = {w  { a, b }*: # a (w) = # b (w)} G = {{S, a, b }, { a, b }, R, S}, where: R = {S  a S b (1) S  b S a (2) S  SS(3) S   }.(4) ● Prove that G generates all the strings in L: Base case: Induction step: if every string of length k can be generated, then every string w of length k+2 can be. w is one of: a x b, b x a, a x a, or b x b. Suppose w is a x b or b x a : Apply rule (1) or (2), then whatever sequence generates x. Suppose w is a x a or b x b : 63

L = {w  { a, b }*: # a (w) = # b (w)} G = {{S, a, b }, { a, b }, R, S}, where: R = {S  a S b (1) S  b S a (2) S  SS(3) S   }.(4) Suppose w is a x a : |w|  4. We show that w = vy, where v and y are in L, 2  |v|  k, and 2  |y|  k. If that is so, then G can generate w by first applying rule (3) to produce SS, and then generating v from the first S and y from the second S. By the induction hypothesis, it must be possible for it to do that since both v and y have length  k. 64

L = {w  { a, b }*: # a (w) = # b (w)} G = {{S, a, b }, { a, b }, R, S}, where: R = {S  a S b (1) S  b S a (2) S  SS(3) S   }.(4) Suppose w is a x a : we show that w = vy, where v and y are in L, 2  |v|  k, and 2  |y|  k. Build up w one character at a time. After one character, we have a.  ( a ) = 1. Since w  L,  (w) = 0. So  ( a x) = -1. The value of  changes by exactly 1 each time a symbol is added to a string. Since  is positive when only a single character has been added and becomes negative by the time the string a x has been built, it must at some point before then have been 0. Let v be the shortest nonempty prefix of w to have a value of 0 for . Since v is nonempty and only even length strings can have  equal to 0, 2  |v|. Since  became 0 sometime before w became a x, v must be at least two characters shorter than w, so |v|  k. Since  (v) = 0, v  L. Since w = vy, we know bounds on the length of y: 2  |y|  k. Since  (w) = 0 and  (v) = 0,  (y) must also be 0 and so y  L. 65

Accepting Strings Regular languages: We care about recognizing patterns and taking appropriate actions. 66

Context free languages: We care about structure. E E +E id E * E 3 id id 5 7 Structure 67

To capture structure, we must capture the path we took through the grammar. Derivations do that. Example: S   S  SS S  (S) S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() But the order of rule application doesn’t matter. Derivations 68

Parse trees capture essential structure: S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() S S S ( S ) ( S ) ( S )   Derivations 69

Parse Trees A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of   {  }, ● The root node is labeled S, ● Every other node is labeled with some element of: V – , and ● If m is a nonleaf node labeled X and the children of m are labeled x 1, x 2, …, x n, then R contains the rule X  x 1, x 2, …, x n. 70

S NP VP Nominal VNP Adjs N Nominal AdjN the smart cat smells chocolate Structure in English 71

Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between: ● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and ● G’s strong generative capacity, defined to be the set of parse trees that G generates. 72

Algorithms Care How We Search Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation. S (S)(S) (S )   73

Derivations of The Smart Cat A left-most derivation is: S  NP VP  the Nominal VP  the Adjs N VP  the Adj N VP  the smart N VP  the smart cat VP  the smart cat V NP  the smart cat smells NP  the smart cat smells Nominal  the smart cat smells N  the smart cat smells chocolate A right-most derivation is: S  NP VP  NP V NP  NP V Nominal  NP V N  NP V chocolate  NP smells chocolate  the Nominal smells chocolate  the Adjs N smells chocolate  the Adjs cat smells chocolate  the Adj cat smells chocolate  the smart cat smells chocolate 74

Regular ExpressionRegular Grammar ( a  b )* a ( a  b )*S  a S  b S choose a from ( a  b )S  a S choose a from ( a  b )S  a T choose a T  a T  b choose a T  a T choose a from ( a  b )T  b T choose a from ( a  b ) Derivation is Not Necessarily Unique The is True for Regular Languages Too 75

Ambiguity A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree. For most applications of context-free grammars, this is a problem. 76

An Arithmetic Expression Grammar E  E + E E  E  E E  (E) E  id 77

Even a Very Simple Grammar Can be Highly Ambiguous S   S  SS S  (S) 78

Inherent Ambiguity Some languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous. Example: L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0}. 79

Inherent Ambiguity L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0}. One grammar for L has the rules: S  S 1 | S 2 S 1  S 1 c | A/* Generate all strings in { a n b n c m }. A  a A b |  S 2  a S 2 | B/* Generate all strings in { a n b m c m }. B  b B c |  Consider any string of the form a n b n c n. L is inherently ambiguous. 80

Inherent Ambiguity Both of the following problems are undecidable: Given a context-free grammar G, is G ambiguous? Given a context-free language L, is L inherently ambiguous? 81

But We Can Often Reduce Ambiguity We can get rid of: ●  rules like S  , ● rules with symmetric right-hand sides, e.g., S  SS E  E + E ● rule sets that lead to ambiguous attachment of optional postfixes. 82

A Highly Ambiguous Grammar S   S  SS S  (S) 83

Resolving the Ambiguity with a Different Grammar The biggest problem is the  rule. A different grammar for the language of balanced parentheses: S*   S*  S S  SS S  (S) S  () 84

Nullable Variables Examples: S  a T a T   S  a T a T  A B A   B   85

Nullable Variables A variable X is nullable iff either: (1) there is a rule X  , or (2) there is a rule X  PQR… and P, Q, R, … are all nullable. So compute N, the set of nullable variables, as follows: 1. Set N to the set of variables that satisfy (1). 2. Until an entire pass is made without adding anything to N do Evaluate all other variables with respect to (2). If any variable satisfies (2) and is not in N, insert it. 86

A General Technique for Getting Rid of  -Rules Definition: a rule is modifiable iff it is of the form: P   Q , for some nullable Q. removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable variables in G. 3. Repeat until G contains no modifiable rules that haven’t been processed: Given the rule P   Q , where Q  N, add the rule P   if it is not already present and if    and if P  . 4. Delete from G all rules of the form X  . 5. Return G. L(G) = L(G) – {  } 87

An Example G = {{S, T, A, B, C, a, b, c }, { a, b, c }, R, S), R = {S  a T a T  ABC A  a A | C B  B b | C C  c |  } 88

What If   L? atmostoneEps(G: cfg) = 1. G  = removeEps(G). 2. If S G is nullable then/* i. e.,   L(G) 2.1 Create in G  a new start symbol S*. 2.2 Add to R G  the two rules: S*   S*  S G. 3. Return G . 89

But There is Still Ambiguity S*   What about ()()() ? S*  S S  SS S  (S) S  () 90

But There is Still Ambiguity S*   What about ()()() ? S*  S S  SS S  (S) S  () 91

But There is Still Ambiguity S*   What about ()()() ? S*  S S  SS S  (S) S  () 92

Eliminating Symmetric Recursive Rules S*   S*  S S  SS S  (S) S  () Replace S  SS with one of: S  SS 1 /* force branching to the left S  S 1 S /* force branching to the right So we get: S*   S  SS 1 S*  SS  S 1 S 1  (S) S 1  () 93

Eliminating Symmetric Recursive Rules So we get: S*   S*  S S  SS 1 S  S 1 S 1  (S) S 1  () S* S SS 1 S 1 ( ) ( ) ( ) 94

Arithmetic Expressions E  E + E E  E  E E  (E) E  id } E E EE E id  id  id Problem 1: Associativity 95

Arithmetic Expressions E  E + E E  E  E E  (E) E  id } E E EE E id  id + id Problem 2: Precedence 96

Arithmetic Expressions - A Better Way E  E + T E  T T  T * F T  F F  (E) F  id Examples: id + id * id id * id * id 97

Arithmetic Expressions - A Better Way E  E + T E  T T  T * F T  F F  (E) F  id 98

The Language of Boolean Logic G = (V, , R, E), where V = { , , , , (, ), id, E, },  = { , , , , (, ), id }, R = { E  E  E 1 E  E 1 E 1  E 1  E 2 E 1  E 2 E 2  E 2  E 3 E 2  E 3 E 3   E 3 E 3  E 4 E 4  (E) E 4  id 99

Boolean Logic isn’t Regular Suppose BL were regular. Then there is a k as specified in the Pumping Theorem. Let w be a string of length 2k + 1 of the form: w = ( ( ( ( ( ( id ) ) ) ) ) ) k x y y = ( p for some p > 0 Then the string that is identical to w except that it has p additional (’s at the beginning would also be in BL. But it can’t be because the parentheses would be mismatched. So BL is not regular. 100

Ambiguous Attachment The dangling else problem: ::= if then ::= if then else Consider: if cond 1 then if cond 2 then st 1 else st 2 101

Ambiguous Attachment The dangling else problem: ::= if then ::= if then else Consider: if cond 1 then if cond 2 then st 1 else st 2 102

Ambiguous Attachment The dangling else problem: ::= if then ::= if then else Consider: if cond 1 then if cond 2 then st 1 else st 2 103

::= | | ::= | | … ::= if ( ) else ::= if ( ) else if (cond) else The Java Fix 104

Java Audit Rules Try to Catch These From the CodePro Audit Rule Set: Dangling Else Severity: Medium Summary Use blocks to prevent dangling else clauses. Description This audit rule finds places in the code where else clauses are not preceded by a block because these can lead to dangling else errors. Example if (a > 0) if (a > 100) b = a - 100; else b = -a; 105

Proving that G is Unambiguous A grammar G is unambiguous iff every string derivable in G has a single leftmost derivation. S*   (1) S*  S(2) S  SS 1 (3) S  S 1 (4) S 1  (S)(5) S 1  () (6) ● S*: ● S 1 : If the next two characters to be derived are (), S 1 must expand by rule (6). Otherwise, it must expand by rule (5). 106

S*   (1) S*  S(2) S  SS 1 (3) S  S 1 (4) S 1  (S)(5) S 1  () (6) The siblings of m is the smallest set that includes any matched set p adjacent to m and all of p’s siblings. Example: ( ( ) ( ) ) ( ) ( ) The set () labeled 1 has a single sibling, 2. The set (()()) labeled 5 has two siblings, 3 and 4. The Proof, Continued 107

The Proof, Continued S*   (1) S*  S(2) S  SS 1 (3) S  S 1 (4) S 1  (S)(5) S 1  () (6) ● S: ● S must generate a matched set, possibly with siblings. ● So the first terminal character in any string that S generates is (. Call the string that starts with that ( and ends with the ) that matches it, s. ● S 1 must generate a single matched set with no siblings. ● Let n be the number of siblings of s. In order to generate those siblings, S must expand by rule (3) exactly n times before it expands by rule (4). 108

The Proof, Continued S*   (1) S*  S(2) S  SS 1 (3) S  S 1 (4) S 1  (S)(5) S 1  () (6) ● S: ((()())) () () (()()) s s has 3 siblings. S must expand by rule (3) 3 times before it uses rule (4). Let p be the number of occurrences of S 1 to the right of S. If p < n, S must expand by rule (3). If p = n, S must expand by rule (4). 109

Going Too Far S  NP VP NP  the Nominal | Nominal | ProperNoun | NP PP Nominal  N | Adjs N N  cat | girl | dogs | ball | chocolate | bat ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VP PP V  like | likes | thinks | hits PP  Prep NP Prep  with ● Chris likes the girl with the cat. ● Chris shot the bear with a rifle. 110

Going Too Far ● Chris likes the girl with the cat. ● Chris shot the bear with a rifle. 111

● Chris likes the girl with the cat. ● Chris shot the bear with a rifle. Going Too Far 112

Comparing Regular and Context-Free Languages Regular LanguagesContext-Free Languages ● regular exprs. or ● regular grammars ● context-free grammars ● recognize ● parse 113

A Testimonial Also, you will be happy to know that I just made use of the context-free grammar skills I learned in your class! I am working on Firefox at IBM this summer and just found an inconsistency between how the native Firefox code and a plugin by Adobe parse SVG path data elements. In order to figure out which code base exhibits the correct behavior I needed to trace through the grammar T hanks to your class I was able to determine that the bug is in the Adobe plugin. Go OpenSource! 114