Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,

Similar presentations


Presentation on theme: "Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,"— Presentation transcript:

1 Grammars, Languages and Parse Trees

2 Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e., L  V* L may be finite or infinite Programming language –Set of all possible programs (valid, very long string) –Programs with syntax errors are not in the set –Infinite number of programs

3 Language Representation Finite –Enumerate all sentences Infinite language –Cannot be specified by enumeration –Use a generative device, i.e., a grammar Specifies the set of all legal sentences Defined recursively (or inductively)

4 Sample Grammar Simple arithmetic expressions (E) Basis Rules: –A Variable is an E –An Integer is an E Inductive Rules: –If E 1 and E 2 are Es, so is (E 1 + E 2 ) –If E 1 and E 2 are Es, so is (E 1 * E 2 ) Examples: x, y, 3, 12, (x + y), (z * (x + y)), ((z * (x + y)) + 12)

5 Production Rules Use symbols (aka syntactical categories) and meta-symbols to define basis and inductive rules For our example: E  V E  I E  (E + E) E  (E * E) Inductive Rules Basis Rules

6 Formal Definition of a Grammar G = (V N, V T, S,  ), where – V N, V T, sets of non-terminal and terminal symbols – S  V N, a start symbol –  = a finite set of relations from (V T  V N ) + to (V T  V N ) * An element ( ,  ) of , is written as    and is called a production rule or a rewrite rule

7 Sample Grammar Revisited 1.E  V | I | (E + E) | (E * E) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z V N : E, V, I, D, L V T : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, x, y, z S = E  : rules 1-5

8 Another Simple Grammar Symbols: S: sentence V: verb O: object A: article N: noun SP: subject phrase VP: verb phrase NP: noun phrase Rules: S  SP VP SP  A N A  a | the N  monkey | banana | tree VP  V O V  ate | climbs O  NP NP  A N

9 Context-Free Grammar A context-free grammar is a grammar with the following restriction: – The relation  is a finite set of relations from V N to (V T  V N ) + The left hand side of a production is a single non-terminal The right hand side of any production cannot be empty Context-free grammars generate context-free languages. With slight variations, essentially all programming languages are context-free languages. We will focus on context-free grammars

10 More Grammars G 1 = (V N, V T, S,  ), where: V N = {S, B} V T = {a, b, c} S = S  = { S  aBSc, S  abc, Ba  aB, Bb  bb } G 2 = (V N, V T, S,  ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL, L  a | b | … | z, D  0 | 1 | … | 9 } G 3 = (V N, V T, S,  ), where:  = { S  aA, V N = {S, A, B } A  aA | bB, V T = {a, b} B  bB |  } S = S Which are context-free?

11 Direct Derivative Let G = (V N, V T, S,  ) be a grammar Let α, β  (V N  V T ) * β is said to be a direct derivative of α, written α  β, if there are strings  1 and  2 such that: α =  1 L  2, β =  1 λ  2, L  V N and L  λ is a production of G We go from α to β using a single rule

12 Examples of Direct Derivatives G = (V N, V T, S,  ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 } αβRule Used 11 22 IL I  L  IbLb I  L  b Lbab L  a  b IDDI0D D  0 ID

13 Derivation Let G = (V N, V T, S,  ) be a grammar A string α produces ω, or α reduces to ω, or ω is a derivation of α, written α  + ω, if there are strings  1, …,  n (n≥1) such that: α   1   2  …   n-1   n  ω We go from α to ω using several rules

14 1.E  V | I | (E + E) | (E * E) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 12 ) ? Example of Derivation E  ( E + E )  ( ( E * E ) + E )  ( ( E * ( E + E ) ) + E )  ( ( V * ( V + V ) ) + I )  ( ( L * ( L + L ) ) + ID )  ( ( z * ( x + y ) ) + DD )  ( ( z * ( x + y ) ) + 12 ) How about: ( x + 2 ) ( 21 * ( x4 + 7 ) ) 3 * z 2y

15 Grammar-generated Language If G is a grammar with start symbol S, a sentential form is any derivative of S A language L generated by a grammar G is the set of all sentential forms whose symbols are all terminals: L(G) = {  | S  +  and   V T * }

16 Example of Language Let G = (V N, V T, S,  ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 } L(G) = {abc12, x, m934897773645, a1b2c3, …} I  ID  IDD  ILDD  ILLDD  LLLDD  aLLDD  abLDD  abcDD  abc1D  abc12

17 Syntax Analysis: Parsing The parse of a sentence is the construction of a derivation for that sentence The parsing of a sentence results in – acceptance or rejection – and, if acceptance, then also a parse tree We are looking for an algorithm to parse a sentence (i.e., to parse a program) and produce a parse tree

18 Parse Trees A parse tree is composed of – interior nodes representing elements of V N – leaf nodes representing elements of V T For each interior node N, the transition from N to its children represents the application of one production rule

19 Parse Tree Construction Top-down – Start with the root (start symbol) – Proceed downward to leaves using productions Bottom-up – Start from leaves – Proceed upward to the root Although these seem like reasonable approaches to develop a parsing algorithm, we’ll see later that neither is ideal  we’ll find a better way!

20 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 1 2 ) ( ( L * ( L + L ) ) + D D ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( V * ( V + V ) ) + I D ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( A * ( A + A ) ) + I ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( A * A ) + A ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( A + A ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z A 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 12 ) Top down

21 ( ( z * ( x + y ) ) + 1 2 ) ( ( V * ( V + V ) ) + I D) A ( A + A ) ( ( L * ( L + L ) ) + D D) ( ( A * ( A + A ) ) + I ) ( ( A * A ) + A ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 12 ) Bottom up

22 Lexical Analyzer and Parser Lexical analyzers –Input: symbols of length 1 –Output: classified tokens Parsers –Input: classified tokens –Output: parse tree (i.e., syntactically correct program) A syntactically correct program will run. Will it do what you want? [a monkey ate a banana / a banana climbs the tree]

23 Backus-Naur Form (BNF) A traditional meta-language to represent grammars for programming languages – Every non-terminal is enclosed in – Instead of the symbol , we use ::= Example I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 ::= | | ::= a | b | … | z ::= 0 | 1 | … | 9 WHY?


Download ppt "Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,"

Similar presentations


Ads by Google