Context-Free Grammars Chapter 11. Languages and Machines.

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
1 、 Alphabet Non-empty set of symbols , usually expressed in  、 V or Other Upper-case Greece Letter 2 、 Symbol(Character) Elements in alphabet, finest.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Chapter 3: Formal Translation Models
MA/CSSE 474 Theory of Computation
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Pushdown Automata (PDA) Intro
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
Context-Free Grammars Chapter 11. Languages and Machines.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
1 Context-Free Languages Not all languages are regular. L 1 = {a n b n | n  0} is not regular. L 2 = {(), (()), ((())),...} is not regular.  some properties.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Introduction to Language Theory
Pushdown Automata Part I: PDAs Chapter Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2)
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Chapter 3 Describing Syntax and Semantics
Context-Free Grammars Chapter Languages and Machines 2.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Chapter 5 Context-free Languages
Context-Free Grammars Chapter 11. Languages and Machines.
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Spring
Context-Free Grammars Chapter 11. Languages and Machines.
Lecture 6: Context-Free Languages
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Normal Forms (Chomsky and Greibach) Pushdown Automata (PDA) Intro PDA examples MA/CSSE 474 Theory of Computation.
PROGRAMMING LANGUAGES
CONTEXT-FREE LANGUAGES
Context-Free Grammars: an overview
Formal Language & Automata Theory
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
PARSE TREES.
Context-Free Grammars
CHAPTER 2 Context-Free Languages
Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Structure and Ambiguity
CFGs: Formal Definition
Answer Questions about Exam2 problems
Presentation transcript:

Context-Free Grammars Chapter 11

Languages and Machines

Rewrite Systems and Grammars A rewrite system (or production system or rule-based system) is: ● a list of rules, and ● an algorithm for applying them. Each rule has a left-hand side and a right hand side. Example rules: S  a S b a S   a S b  b S ab S a

Simple-rewrite simple-rewrite(R: rewrite system, w: initial string) = 1. Set working-string to w. 2. Until told by R to halt do: Match the lhs of some rule against some part of working-string. Replace the matched part of working-string with the rhs of the rule that was matched. 3. Return working-string.

A Rewrite System Formalism A rewrite system formalism specifies: ● The form of the rules ● How simple-rewrite works: ● How to choose rules? ● When to quit?

An Example w = S a S Rules: [1] S  a S b [2] a S   ● What order to apply the rules? ● When to quit?

Rule Based Systems ● Expert systems ● Cognitive modeling ● Business practice modeling ● General models of computation ● Grammars

Grammars Define Languages A grammar is a set of rules that are stated in terms of two alphabets: a terminal alphabet, , that contains the symbols that make up the strings in L(G), and a nonterminal alphabet, the elements of which will function as working symbols that will be used while the grammar is operating. These symbols will disappear by the time the grammar finishes its job and generates a string. A grammar has a unique start symbol, often called S.

Using a Grammar to Derive a String Simple-rewrite (G, S) will generate the strings in L(G). We will use the symbol  to indicate steps in a derivation. A derivation could begin with: S  a S b  aa S bb  …

Generating Many Strings Multiple rules may match. Given: S  a S b, S  b S a, and S   Derivation so far: S  a S b  aa S bb  Three choices at the next step: S  a S b  aa S bb  aaa S bbb (using rule 1), S  a S b  aa S bb  aab S abb (using rule 2), S  a S b  aa S bb  aabb (using rule 3).

Generating Many Strings One rule may match in more than one way. Given: S  a TT b, T  b T a, and T   Derivation so far: S  a TT b  Two choices at the next step: S  a TT b  ab T a T b  S  a TT b  a T b T ab 

When to Stop May stop when: 1.The working string no longer contains any nonterminal symbols (including, when it is  ). In this case, we say that the working string is generated by the grammar. Example: S  a S b  aa S bb  aabb

When to Stop May stop when: 2.There are nonterminal symbols in the working string but none of them appears on the left-hand side of any rule in the grammar. In this case, we have a blocked or non-terminated derivation but no generated string. Example: Rules:S  a S b, S  b T a, and S   Derivations: S  a S b  ab T ab  [blocked]

When to Stop It is possible that neither (1) nor (2) is achieved. Example: G contains only the rules S  B a and B  b B, with S the start symbol. Then all derivations proceed as: S  B a  b B a  bb B a  bbb B a  bbbb B a ...

Context-free Grammars, Languages, and PDAs Context-free Language Context-free Grammar PDA L Accepts

More Powerful Grammars Regular grammars must always produce strings one character at a time, moving left to right. But it may be more natural to describe generation more flexibly. Example 1: L = ab * a S  a B a S  a B B   vs. B  a B  b B Example 2: L = { a n b * a n, n  0 } S  B S  a S a B   B  b B

Context-Free Grammars No restrictions on the form of the right hand sides. S  ab D e FG ab But require single non-terminal on left hand side. S  but not ASB 

AnBnAnBn

AnBnAnBn S  S  aSbS  S  aSb

Balanced Parentheses

S   S  SS S  (S)

Context-Free Grammars A context-free grammar G is a quadruple, (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals and terminals. ●  (the set of terminals) is a subset of V, ● R (the set of rules) is a finite subset of (V -  )  V*, ● S (the start symbol) is an element of V - . Example: ({S, a, b }, { a, b }, {S  a S b, S   }, S)

Derivations x  G y iff x =  A  and A   is in R y =    w 0  G w 1  G w 2  G...  G w n is a derivation in G. Let  G * be the reflexive, transitive closure of  G. Then the language generated by G, denoted L(G), is: {w   * : S  G * w}.

An Example Derivation Example: Let G = ({S, a, b }, { a, b }, {S  a S b, S   }, S) S  a S b  aa S bb  aaa S bbb  aaabbb S  * aaabbb

Definition of a Context-Free Grammar A language L is context-free iff it is generated by some context-free grammar G.

Recursive Grammar Rules A rule is recursive iff it is X  w 1 Yw 2, where: Y  * w 3 Xw 4 for some w 1, w 2, w 3, and w in V*. A grammar is recursive iff it contains at least one recursive rule. Examples: 1. S  (S) 2. S  (T) T  (S)

Self-Embedding Grammar Rules A rule in a grammar G is self-embedding iff it is : X  w 1 Yw 2, where Y  * w 3 Xw 4 and both w 1 w 3 and w 4 w 2 are in  +. A grammar is self-embedding iff it contains at least one self-embedding rule. Example: - S  a S a is self-embedding - S  a S is recursive but not self-embedding - S  a T is self-embedding T  S a

Context-Free Grammars and Regular Languages If a grammar G is not self-embedding then L(G) is regular. If a language L has the property that every grammar that defines it is self-embedding, then L is not regular.

PalEven = {ww R : w  { a, b }*}

G = {{S, a, b }, { a, b }, R, S}, where: R = { S  a S a S  b S b S   }.

Equal Numbers of a ’s and b ’s Let L = {w  { a, b }*: # a (w) = # b (w)}.

Equal Numbers of a ’s and b ’s Let L = {w  { a, b }*: # a (w) = # b (w)}. G = {{S, a, b }, { a, b }, R, S}, where: R = { S  a S b S  b S a S  SS S   }.

Arithmetic Expressions G = (V, , R, E), where V = {+, *, (, ), id, E},  = {+, *, (, ), id }, R = { E  E + E E  E  E E  (E) E  id }

BNF – Backus Naur Form The symbol | should be read as “or”. Example: S  a S b | b S a | SS |  Allow a nonterminal symbol to be any sequence of characters surrounded by angle brackets. Examples of nonterminals: John Backus and Pete Naur A notation for writing practical context-free grammars

BNF for a Java Fragment ::= { } | {} ::= | ::= | while ( ) | if ( ) | do while ( ); | ; | return; | return ; | ;

Spam Generation These production rules yield 1,843,200 possible spellings. How Many Ways Can You Spell By Brian HayesBrian Hayes American Scientist, July-August

HTML Item 1, which will include a sublist First item in sublist Second item in sublist Item 2 A grammar: /* Text is a sequence of elements. HTMLtext  Element HTMLtext |  Element  UL | LI | … (and other kinds of elements that are allowed in the body of an HTML document) /* The and tags must match. UL  HTMLtext /* The and tags must match. LI  HTMLtext

English S  NP VP NP  the Nominal | a Nominal | Nominal | ProperNoun | NP PP Nominal  N | Adjs N N  cat | dogs | bear | girl | chocolate | rifle ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VP PP V  like | likes | thinks | shots | smells PP  Prep NP Prep  with

Designing Context-Free Grammars ● Generate related regions together. A n B n ● Generate concatenated regions: A  BC ● Generate outside in: A  a A b

Concatenating Independent Languages Let L = { a n b n c m : n, m  0}. The c m portion of any string in L is completely independent of the a n b n portion, so we should generate the two portions separately and concatenate them together.

Concatenating Independent Languages Let L = { a n b n c m : n, m  0}. The c m portion of any string in L is completely independent of the a n b n portion, so we should generate the two portions separately and concatenate them together. G = ({S, N, C, a, b, c }, { a, b, c }, R, S} where: R = { S  NC N  a N b N   C  c C C   }.

L = { : k  0 and  i (n i  0)} Examples of strings in L: , abab, aabbaaabbbabab Note that L = { a n b n : n  0}*.

L = { : k  0 and  i (n i  0)} Examples of strings in L: , abab, aabbaaabbbabab Note that L = { a n b n : n  0}*. G = ({S, M, a, b }, { a, b }, R, S} where: R = { S  MS S   M  a M b M   }.

L = { a n b m : n  m } G = (V, , R, S), where V = { a, b, S, },  = { a, b }, R = Another Example: Unequal a ’s and b ’s

L = { a n b m : n  m } G = (V, , R, S), where V = { a, b, S, A, B},  = { a, b }, R = S  A /* more a ’s than b ’s S  B /* more b ’s than a ’s A  a /* at least one extra a generated A  a A A  a A b B  b /* at least one extra b generated B  B b B  a B b

Accepting Strings Regular languages: We care about recognizing patterns and taking appropriate actions.

Context free languages: We care about structure. E E +E id E * E 3 id id 5 7 Structure

To capture structure, we must capture the path we took through the grammar. Derivations do that. Example: S   S  SS S  (S) S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() But the order of rule application doesn’t matter. Derivations

Parse trees capture essential structure: S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() S S S ( S ) ( S ) ( S )   Derivations

Parse Trees A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of   {  }, ● The root node is labeled S, ● Every other node is labeled with some element of: V – , and ● If m is a nonleaf node labeled X and the children of m are labeled x 1, x 2, …, x n, then R contains the rule X  x 1, x 2, …, x n.

S NP VP Nominal VNP Adjs N Nominal AdjN the smart cat smells chocolate Structure in English

Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between: ● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and ● G’s strong generative capacity, defined to be the set of parse trees that G generates.

Algorithms Care How We Search Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation. S (S)(S) (S )  

Derivations of The Smart Cat A left-most derivation is: S  NP VP  the Nominal VP  the Adjs N VP  the Adj N VP  the smart N VP  the smart cat VP  the smart cat V NP  the smart cat smells NP  the smart cat smells Nominal  the smart cat smells N  the smart cat smells chocolate A right-most derivation is: S  NP VP  NP V NP  NP V Nominal  NP V N  NP V chocolate  NP smells chocolate  the Nominal smells chocolate  the Adjs N smells chocolate  the Adjs cat smells chocolate  the Adj cat smells chocolate  the smart cat smells chocolate

Regular ExpressionRegular Grammar ( a  b )* a ( a  b )*S  a S  b S choose a from ( a  b )S  a S choose a from ( a  b )S  a T choose a T  a T  b choose a T  a T choose a from ( a  b )T  b T choose a from ( a  b ) Derivation is Not Necessarily Unique The is True for Regular Languages Too

Ambiguity A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree. For most applications of context-free grammars, this is a problem.

An Arithmetic Expression Grammar E  E + E E  E  E E  (E) E  id

Even a Very Simple Grammar Can be Highly Ambiguous S   S  SS S  (S)

Inherent Ambiguity Some languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous. Example: L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0}.

Inherent Ambiguity L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0}. One grammar for L has the rules: S  S 1 | S 2 S 1  S 1 c | A/* Generate all strings in { a n b n c m }. A  a A b |  S 2  a S 2 | B/* Generate all strings in { a n b m c m }. B  b B c |  Consider any string of the form a n b n c n. L is inherently ambiguous.

Inherent Ambiguity Both of the following problems are undecidable: Given a context-free grammar G, is G ambiguous? Given a context-free language L, is L inherently ambiguous? But We Can Often Reduce Ambiguity