Download presentation
Presentation is loading. Please wait.
Published byCorinne Dexter Modified over 9 years ago
1
Grammars, Languages and Parse Trees
2
Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e., L V* L may be finite or infinite Programming language –Set of all possible programs (valid, very long string) –Programs with syntax errors are not in the set –Infinite number of programs
3
Language Representation Finite –Enumerate all sentences Infinite language –Cannot be specified by enumeration –Use a generative device, i.e., a grammar Specifies the set of all legal sentences Defined recursively (or inductively)
4
Sample Grammar Simple arithmetic expressions (E) Basis Rules: –A Variable is an E –An Integer is an E Inductive Rules: –If E 1 and E 2 are Es, so is (E 1 + E 2 ) –If E 1 and E 2 are Es, so is (E 1 * E 2 ) Examples: x, y, 3, 12, (x + y), (z * (x + y)), ((z * (x + y)) + 12)
5
Production Rules Use symbols (aka syntactical categories) and meta-symbols to define basis and inductive rules For our example: E V E I E (E + E) E (E * E) Inductive Rules Basis Rules
6
Formal Definition of a Grammar G = (V N, V T, S, ), where – V N, V T, sets of non-terminal and terminal symbols – S V N, a start symbol – = a finite set of relations from (V T V N ) + to (V T V N ) * An element ( , ) of , is written as and is called a production rule or a rewrite rule
7
Sample Grammar Revisited 1.E V | I | (E + E) | (E * E) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z V N : E, V, I, D, L V T : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, x, y, z S = E : rules 1-5
8
Another Simple Grammar Symbols: S: sentence V: verb O: object A: article N: noun SP: subject phrase VP: verb phrase NP: noun phrase Rules: S SP VP SP A N A a | the N monkey | banana | tree VP V O V ate | climbs O NP NP A N
9
Context-Free Grammar A context-free grammar is a grammar with the following restriction: – The relation is a finite set of relations from V N to (V T V N ) + The left hand side of a production is a single non-terminal The right hand side of any production cannot be empty Context-free grammars generate context-free languages. With slight variations, essentially all programming languages are context-free languages. We will focus on context-free grammars
10
More Grammars G 1 = (V N, V T, S, ), where: V N = {S, B} V T = {a, b, c} S = S = { S aBSc, S abc, Ba aB, Bb bb } G 2 = (V N, V T, S, ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I = { I L | ID | IL, L a | b | … | z, D 0 | 1 | … | 9 } G 3 = (V N, V T, S, ), where: = { S aA, V N = {S, A, B } A aA | bB, V T = {a, b} B bB | } S = S Which are context-free?
11
Direct Derivative Let G = (V N, V T, S, ) be a grammar Let α, β (V N V T ) * β is said to be a direct derivative of α, written α β, if there are strings 1 and 2 such that: α = 1 L 2, β = 1 λ 2, L V N and L λ is a production of G We go from α to β using a single rule
12
Examples of Direct Derivatives G = (V N, V T, S, ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I = { I L | ID | IL L a | b | … | z D 0 | 1 | … | 9 } αβRule Used 11 22 IL I L IbLb I L b Lbab L a b IDDI0D D 0 ID
13
Derivation Let G = (V N, V T, S, ) be a grammar A string α produces ω, or α reduces to ω, or ω is a derivation of α, written α + ω, if there are strings 1, …, n (n≥1) such that: α 1 2 … n-1 n ω We go from α to ω using several rules
14
1.E V | I | (E + E) | (E * E) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( z * ( x + y ) ) + 12 ) ? Example of Derivation E ( E + E ) ( ( E * E ) + E ) ( ( E * ( E + E ) ) + E ) ( ( V * ( V + V ) ) + I ) ( ( L * ( L + L ) ) + ID ) ( ( z * ( x + y ) ) + DD ) ( ( z * ( x + y ) ) + 12 ) How about: ( x + 2 ) ( 21 * ( x4 + 7 ) ) 3 * z 2y
15
Grammar-generated Language If G is a grammar with start symbol S, a sentential form is any derivative of S A language L generated by a grammar G is the set of all sentential forms whose symbols are all terminals: L(G) = { | S + and V T * }
16
Example of Language Let G = (V N, V T, S, ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I = { I L | ID | IL L a | b | … | z D 0 | 1 | … | 9 } L(G) = {abc12, x, m934897773645, a1b2c3, …} I ID IDD ILDD ILLDD LLLDD aLLDD abLDD abcDD abc1D abc12
17
Syntax Analysis: Parsing The parse of a sentence is the construction of a derivation for that sentence The parsing of a sentence results in – acceptance or rejection – and, if acceptance, then also a parse tree We are looking for an algorithm to parse a sentence (i.e., to parse a program) and produce a parse tree
18
Parse Trees A parse tree is composed of – interior nodes representing elements of V N – leaf nodes representing elements of V T For each interior node N, the transition from N to its children represents the application of one production rule
19
Parse Tree Construction Top-down – Start with the root (start symbol) – Proceed downward to leaves using productions Bottom-up – Start from leaves – Proceed upward to the root Although these seem like reasonable approaches to develop a parsing algorithm, we’ll see later that neither is ideal we’ll find a better way!
20
1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( z * ( x + y ) ) + 1 2 ) ( ( L * ( L + L ) ) + D D ) 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( V * ( V + V ) ) + I D ) 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( A * ( A + A ) ) + I ) 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( A * A ) + A ) 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( A + A ) 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z A 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( z * ( x + y ) ) + 12 ) Top down
21
( ( z * ( x + y ) ) + 1 2 ) ( ( V * ( V + V ) ) + I D) A ( A + A ) ( ( L * ( L + L ) ) + D D) ( ( A * ( A + A ) ) + I ) ( ( A * A ) + A ) 1.A V | I | (A + A) | (A * A) 2.V L | VL | VD 3.I D | ID 4.D 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L x | y | z ( ( z * ( x + y ) ) + 12 ) Bottom up
22
Lexical Analyzer and Parser Lexical analyzers –Input: symbols of length 1 –Output: classified tokens Parsers –Input: classified tokens –Output: parse tree (i.e., syntactically correct program) A syntactically correct program will run. Will it do what you want? [a monkey ate a banana / a banana climbs the tree]
23
Backus-Naur Form (BNF) A traditional meta-language to represent grammars for programming languages – Every non-terminal is enclosed in – Instead of the symbol , we use ::= Example I L | ID | IL L a | b | … | z D 0 | 1 | … | 9 ::= | | ::= a | b | … | z ::= 0 | 1 | … | 9 WHY?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.