Download presentation
Presentation is loading. Please wait.
1
Discrete Maths 13. Grammars Objectives
242/ , Semester 2, 13. Grammars Objectives to introduce grammars and show their importance for defining programming languages; to show the connection between REs and grammars
2
Overview Why Grammars? Languages Using a Grammar Parse Trees
Ambiguous Grammars Kinds of Grammars More Information
3
1. Why Grammars? Grammars are the standard way of defining programming languages. Tools exist for semi-autiomatically translating grammars into compilers (e.g. JavaCC, lex, yacc, ANTLR) this saves weeks of work
4
2. Languages We use a natural language to communicate
its grammar rules are very complex the rules don’t cover important things We use a formal language to define a programming language its grammar rules are fairly simple the rules cover almost everything continued
5
A formal language is a set of legal strings.
The strings are legal if they correctly use the language’s alphabet and grammar rules. The alphabet is often called the language’s terminal symbols (or terminals).
6
Example 1 Alphabet (terminals) = {1, 2, 3}
not shown here; see later Alphabet (terminals) = {1, 2, 3} Using the grammar rules, the language is: L1 = { 11, 12, 13, 21, 22, 23, 31, 32, 33} L1 is the set of strings of length 2.
7
Example 2 Terminals = {1, 2, 3} Using different grammar rules, the language is: L2 = { 111, 222, 333} L2 is the set of strings of length 3, where all the terminals are the same.
8
Example 3 Terminals = {1, 2, 3} Using different grammar rules, the language is: L3 = {2, 12, 22, 32, 112, 122, 132, ...} L3 is the set of strings whose numerical value is divisible by 2.
9
3. Using a Grammar A grammar is a notation for defining a language, and is made from 4 parts: the terminal symbols the syntactic categories (nonterminal symbols) e.g. statement, expression, noun, verb the grammar rules (productions) e,g, A => B1 B Bn the starting nonterminal the top-most syntactic category for this grammar continued
10
We define a grammar G as a 4-tuple:
G = (T, N, P, S) T = terminal symbols N = nonterminal symbols P = productions S = starting nonterminal
11
3.1. Example 1 Consider the grammar: T = {0, 1} N = {S, R}
P = { S => 0 S => 0 R R => 1 S } S is the starting nonterminal the right hand sides of productions usually use a mix of terminals and nonterminals
12
Is “01010” in the language? Start with a S rule: Rule String Generated S S => 0 R 0 R R => 1 S S S => 0 R R R => 1 S S S => No more rules can be applied since there are no more nonterminals left in the string. Yes, it is in the language.
13
Example 2 Consider the grammar: T = {a, b, c, d, z} N = {S, R, U, V}
P = { S => R U z | z R => a | b R U => d V U | c V => b | c } S is the starting nonterminal
14
is shorthand for the two rules:
The notation: X => Y | Z is shorthand for the two rules: X => Y X => Z Read ‘|’ as ‘or’.
15
Is “adbdbcz” in the language?
Rule String Generated S S => R U z R U z R => a a U z U => d V U a d V U z V => b a d b U z U => d V U a d b d V U z V => b a d b d b U z U => c a d b d b c z Yes! This grammar has choices about how to rewrite the string.
16
Is “abdbcz” in the language?
No Rule String Generated S S => R U z R U z R => a a U z which U rule? U must be replaced by something beginning with a ‘b’, but the only U rule is: U => d V U | c
17
3.2. BNF BNF is a shorthand notation for productions
Backus Normal Form, or Backus-Naur Form We have already used ‘|’: X => Y1 | Y2 | ... | Yn John Backus (1924 – 2007) Peter Naur (1928 – ) continued
18
X => Y [Z] is shorthand for two rules:
X => Y X => Y Z [Z] means 0 or 1 occurrences of Z. continued
19
X => Y { Z } is shorthand for an infinite number of rules:
X => Y X => Y Z X => Y Z Z X => Y Z Z Z : { Z } means 0 or more occurrences of Z.
20
3.3. A Grammar for Expressions
Consider the grammar: T = { 0, 1, 2,..., 9, +, -, *, /, (, ) } N = { Expr, Number } P = { Expr => Number Expr => ( Expr ) Expr => Expr + Expr | Expr - Expr | Expr * Expr | Expr / Expr } Expr is the starting nonterminal
21
Defining Number The RE definition for a number is:
number = digit digit* digit = [0-9] The productions for Number are: Number => Digit { Digit } Digit => 0 | 1 | 2 | 3 | … | 9 or Number => Number Digit | Digit Digit => 0 | 1 | 2 | 3 | ... | 9
22
Using Productions Expand Expr into (125-2)*3
Expr => Expr * Expr => ( Expr ) * Expr => ( Expr - Expr ) * Expr => ( Number - Number ) * Number : => ( ) * 3 continued
23
Expand Number into 125 Number => Number Digit => Number Digit Digit => Digit Digit Digit =>
24
3.4. Grammars are not Unique
Two grammars that do the same thing: Balanced => e Balanced => ( Balanced ) Balanced and: Balanced => e Balanced => ( Balanced ) Balanced => Balanced Balanced Both generate the same strings: (()(())) () e (()())
25
4. Parse Trees A parse tree is a graphical way of showing how productions are used to generate a string. Data structures representing parse trees are used inside compilers to store information about the program being compiled.
26
Example 1 Consider the grammar: T = { a, b } N = { S }
P = { S => S S | a S b | a b | b a } S is the starting nonterminal
27
Parse Tree for “aabbba”
expand the symbol in the circle Parse Tree for “aabbba” S The root of the tree is the start symbol S: Expand using S => S S S S S Expand using S => a S b continued
28
S S S a S b Expand using S => a b S S S a S b a b Expand using S => b a continued
29
Stop when there are no more nonterminals in leaf positions.
b b a a b Stop when there are no more nonterminals in leaf positions. Read off the string by reading the leaves left to right.
30
Example 2 Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F }
P = { E => T | T + E T => F | F * T F => a | ( E ) } E is the starting nonterminal
31
Is “a+a*a” in the Language?
Expand using E => T + E E T + E Expand using T => F E T + E F continued
32
Continue expansion until:
+ E F T a F * T a F a
33
5. Ambiguous Grammars A grammar is ambiguous when a string can be represented by more than one parse tree it means that the string has more than one “meaning” in the language e.g. a variant of the last grammar example: P = { E => E + E | E * E | ( E ) | a }
34
Parse Trees for “a+a*a”
and a E * E E + E a a a a a continued
35
The two parse trees allow a string like “5+5
The two parse trees allow a string like “5+5*5” to be read in two different ways: (the left hand tree) 10*5 (the right hand tree)
36
Why is Ambiguity Bad? In a programming language, a string with more than one meaning means that the compiler and run-time system will not know how to process it. e.g in C: x = * 5; // what is the value in x?
37
6. Kinds of Grammars There are 4 main kinds of grammar, of increasing expressive power: regular (type 3) grammars context-free (type 2) grammars context-sensitive (type 1) grammars unrestricted (type 0) grammars They vary in the kinds of productions they allow. Avram Noam Chomsky (1928 – )
38
6.1. Regular Grammars Every production is of the form:
S => wT T => xT T => a Every production is of the form: A => a | a B | e A, B are nonterminals, a is a terminal These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last. Regular grammars are equivalent to REs (and also to automata).
39
An Equivalence Diagram
Regular Grammars Automata same expressive power REs
40
Example Integer => + UInt | - UInt | Digits | 1 Digits | ... | 9 Digits UInt => 0 Digits | 1 Digits | ... | 9 Digits Digits => 0 Digits | 1 Digits | ... | 9 Digits | e
41
6.2. Context-Free Grammars
A => a A => aBcd B => ae Every production is of the form: A => d A is a nonterminal, d can be any number of nonterminals or terminals Most of our examples have been context-free grammars used widely to define programming languages they subsume regular grammars
42
6.3. Context-Sensitive Grammars
A => a 11A => aB2d B2 => ae Every production is of the form: a => d a, d can contain any number of terminals and nonterminals a must contain at least 1 nonterminal size(d) >= size(a) d cannot be e continued
43
Context-sensitive rules allow the grammar to specify a context for a rewrite
e.g. A1a0 => 1b00 the string 2A1a01 becomes 21b001 Context-sensitive grammars are more powerful than context-free grammars because of this context ability.
44
Example The language: E = {012, 001122, 000111222, ... }
or, in brief, E = {0n 1n 2n | n >= 1} can only be expressed using a context-sensitive grammar: S => 0 A 1 2 | A => 0 A 1 C | 0 1 C C 1 => 1 C C 2 => 2 2
45
Rewrite S to 001122 S => O A 1 2 0 A 1 2 => 0 0 1 C 1 2
0 0 1 C 1 2 => C 2 C 2 =>
46
6.4. Unrestricted Grammars
A => e 11A => a B2 => aeA Every production is of the form: a => d a, d can contain any number of terminals and nonterminals; a must contain at least 1 nonterminal no restrictions on size(d) it may be smaller than size(a) d can be e Also called phrase-structure grammars. more general than context sensitive
47
Example The language: E = {e, 012, 001122, 000111222, ... }
or, in brief, E = {0n 1n 2n | n >= 0} can only be expressed using an unrestricted grammar: S => 0 A 1 2 | e A => 0 A 1 C | e C 1 => 1 C C 2 => 2 2 new features
48
Rewrite S to 012 S => 0 A 1 2 0 A 1 2 => using A ==> e
49
6.5. Why so many Grammar Kinds?
More powerful grammars are more expressive, but also harder to implement efficiently a trade-off between power and implementation continued
50
For example, most compilers have two grammar-based components:
the lexical analyzer uses REs (regular grammars) to parse basic nonterminals such as identifier and number the syntax analyzer uses (context-free) grammars to deal with complex syntactic categories such as loops and expressions
51
Lexical and Syntax Analyzers
the compiler program text file .... lexical analyzer ; 43 = chars: 'i' 'n' 't' ' ' 'x' '=' '4' '3' ';' ... x tokens int syntax analyzer .... parse tree code generation x = 43 ; int
52
7. More Information Discrete Mathematics and its Applications Kenneth H. Rosen McGraw Hill, 2007, 7th edition chapter 13, section 13.1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.