Download presentation
Presentation is loading. Please wait.
1
Programming Languages
Grammars
2
Why Study This Chapter? To declare variables and constants
To be able to create identifiers To declare variables and constants To write assignment and I/O statements To design and write simple programs
3
Introduction Syntax: the form or structure of the expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units
4
Language Syntax and semantics provide a language’s definition
Users of a language definition Other language designers Implementers Programmers (the users of the language)
5
Syntax of a Languages Motivation for using a subset of C: Grammar
Language (pages) Reference Pascal 5 Jensen & Wirth C Kernighan & Richie C Stroustrup Java 14 Gosling, et. al. The grammar fits on one page, so it’s a far better tool for studying language design.
6
Examples of poor syntax
In FORTRAN, variables don’t have to be declared Therefore, every misspelling is a new variable In Pascal, semicolons go between statements Therefore, adding a statement to a block involves adding a semicolon to the previous line
7
Grammar a description of a language - can be used for generation or, given the grammar, a language recognizer (known as a parser) can be created
8
Formal Methods of Describing Syntax
Context-Free Grammars (CFGs) Developed by Noam Chomsky in the mid-1950s Language generators, meant to describe the syntax of natural languages Define a class of languages called context-free languages Backus-Naur Form (1959) or Backus Normal Form (BNF) Invented by John Backus to describe Algol 58 BNF is equivalent to context-free grammars
9
C Grammar: Statements Program int main ( ) { Declarations Statements } Declarations { Declaration } Declaration Type Identifier [ [ Integer ] ] { , Identifier [ [ Integer ] ] } ; Type int | bool | float | char Statements { Statement } Statement ; | Block | Assignment | IfStatement | WhileStatement Block { Statements } Assignment Identifier [ [ Expression ] ] = Expression ; IfStatement if ( Expression ) Statement [ else Statement ] WhileStatement while ( Expression ) Statement
10
C Grammar: Expressions
Expression Conjunction { || Conjunction } Conjunction Equality { && Equality } Equality Relation [ EquOp Relation ] EquOp == | != Relation Addition [ RelOp Addition ] RelOp < | <= | > | >= Addition Term { AddOp Term } AddOp + | - Term Factor { MulOp Factor } MulOp * | / | % Factor [ UnaryOp ] Primary UnaryOp - | ! Primary Identifier [ [ Expression ] ] | Literal | ( Expression ) | Type ( Expression )
11
Describing Syntax: Terminology
A sentence is a string of characters over some alphabet A language is a set of sentences A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) A token is a category of lexemes (e.g., identifier)
12
Lexeme & Token Lexeme Token Example (in Java): index = 2 * count + 17;
lowest level syntactic unit in the language Token a language category for the lexemes Example (in Java): index = 2 * count + 17; Lexeme: Token: index identifier = assignment_operator integer_literal * mult_operator count identifier addition_operator integer_literal ; semicolon
13
Formal Definition of Languages
Recognizers A recognition device reads input strings of the language and decides whether the input strings belong to the language Example: syntax analysis part of a compiler Generators A device that generates sentences of a language One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator
14
Grammars A meta-language is a language used to define other languages
A grammar is a meta-language used to define the syntax of a language. It consists of: Finite set of terminal symbols Finite set of non-terminal symbols Finite set of production rules Start symbol Language = (possibly infinite) set of all sequences of symbols that can be derived by applying production rules starting from the start symbol Backus-Naur Form (BNF)
15
Describing Syntax Backus-Naur Form and Context-Free Grammars (BNF)
Most widely known method for describing programming language syntax Extended BNF Improves readability and writability of BNF
16
Backus-Naur Form (BNF)
Invented by John Backus to describe Algol 58 BNF is equivalent to context-free grammars BNF is a meta-language used to describe another language In BNF, abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called non-terminal symbols)
17
Example: Binary Digits
Consider the grammar: binaryDigit 0 binaryDigit 1 or equivalently: binaryDigit 0 | 1 Here, | is a metacharacter that separates alternatives.
18
Example: Decimal Numbers
Grammar for unsigned decimal integers Terminal symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Non-terminal symbols: Digit, Integer Production rules: Integer Digit | Integer Digit Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Start symbol: Integer Can derive any unsigned integer using this grammar Language = set of all unsigned decimal integers
19
Derivation of 352 as an Integer
Production rules: Integer Digit | Integer Digit Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Integer Integer Digit Integer 2 Integer Digit 2 Integer 5 2 Digit 5 2 3 5 2 Rightmost derivation At each step, the rightmost non-terminal is replaced
20
Parse Trees A hierarchical structure displaying the derivation of an expression in some grammar Leaf nodes are terminals, non-leaf nodes are non-terminals Parser Process which takes a sentence and breaks it into its component parts, deriving a parse tree If the parser cannot generate a parse tree, then the sentence is not legal If the parser can generate two or more parse trees for the same sentence, then the grammar is ambiguous You might recall parse trees from your grammar school days when you were learning English grammar. Here’s a very simple example of a parse tree in English. Here’s an example of an assignment statement broken down as a parse tree. Notice that each node has the lexeme and its associated (recognized or identified) token.
21
Parse Tree for 352 as an Integer
22
Grammar and Parse Tree <assign> <id> = <expr>
<id> A | B | C <expr> <id> + <expr> | <id> * <expr> | (<expr>) | <id> <assign> <id> = <expr> A <id> * <expr> B ( <expr> ) <id> <expr> A <id> C Above we see how the derivation is used to create the parse tree. Each step of the derivation provides a new level (depth) in the tree. The highest level starts with <assign>. We now match the rule <assign> <id> = <expr> and so the next level of the parse tree as <id> = <expr>. Next, we use the rule <id> A. The next rule is <expr> <id> * <expr>. We place A (on the left side of the tree) and <id> * <expr> (on the right side of the tree) at the same level because the two rules were applied independently. As we will see, the level that a rule was applied will have an implication on the order of precedence in terms of when the operation is physically performed by the computer. Above, we apply + first (after getting the values of A & C), then *, and finally =. Parse tree for the derivation <assign> <id> = <expr> A = <expr> A = <id> * <expr> A = B * <expr> A = B * (<expr>) A = B * (<id> + <expr>) A = B * (A + <expr>) A = B * (A + <id> ) A = B * (A + C)
23
Abstract Syntax Tree for If Statement
) exp true ( if else elsePart return if true st return
24
Arithmetic Expression Grammar
The following grammar defines the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr Expr + Term | Expr – Term | Term Term 0 | ... | 9 | ( Expr )
25
Parse of the String 5-4+3
26
BNF Fundamentals Non-terminals: BNF abstractions
Terminals: lexemes and tokens Grammar: a collection of rules Examples of BNF rules: <ident_list> → identifier | identifier, <ident_list> <if_stmt> → if <logic_expr> then <stmt>
27
A derivation from the grammar
<program> => begin <stmt_list> end => begin <stmt> ; <stmt>_list> end => begin <var> = <expression> ; <stmt_list> end => begin A = <expression> ; <stmt_list> end => begin A = <var> + <var> ; <stmt_list> end => begin A = B + <var> ; <stmt_list> end => begin A = B + C ; <stmt_list> end => begin A = B + C ; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end So, the program begin A = B + C; B = C end is legal We generate a leftmost derivation, where the next rule applied is applied to the leftmost non-terminal symbol (which is why we worked on the first assignment statement before we generated the second <stmt>)
28
Regular Expressions x character x \x escaped character, e.g., \n
{ name } reference to a name M | N M or N M N M followed by N M* 0 or more occurrences of M M+ 1 or more occurrences of M [x1 … xn] One of x1 … xn Example: [aeiou] – vowels, [0-9] - digits
29
An Example Grammar <program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | const
30
Derivation A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols), e.g., <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
31
Examples S SS | (S)S | () | S SS (S)SS (S)S(S)S (S)S(())S
((())())(()) E E O E | (E) | id O + | - | * | / E E O E (E) O E (E O E) O E * ((E O E) O E) O E ((id O E)) O E) O E ((id + E)) O E) O E ((id + id)) O E) O E * ((id + id)) * id) + id
32
Leftmost Derivation Rightmost Derivation
Each step of the derivation is a replacement of the leftmost nonterminals in a sentential form. E E O E (E) O E (E O E) O E (id O E) O E (id + E) O E (id + id) O E (id + id) * E (id + id) * id Each step of the derivation is a replacement of the rightmost nonterminals in a sentential form. E E O E E O id E * id (E) * id (E O E) * id (E O id) * id (E + id) * id (id + id) * id
33
Parse Tree A hierarchical representation of a derivation
<program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b
34
CFG For Floating Point Numbers
::= stands for production rule; <…> are non-terminals; | represents alternatives for the right-hand side of a production rule Sample parse tree:
35
Ambiguity in Expressions
Which operation is to be done first? solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. solved by associativity If an operator is right-associative (or left-associative), an operand in between 2 operators is associated to the operator to the right (left). Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z
36
Associativity and Precedence
A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and - . Consider the grammar G1: Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 | ... | 9 | ( Expr )
37
Precedence (cont’d) E E + E | E - E | F F F * F | F / F | X
X ( E ) | id (id1 + id2) * id3 * id4 * F X + E id4 id3 ) ( id1 id2
38
Precedence E E + E E E - E E E * E E E / E E id E F
F F * F F F / F F id + E * id3 id2 id1 E + * F id3 id2 id1
39
Associativity and Precedence
for Grammar Precedence Associativity Operators 3 right ** left * / % \ left Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level.
40
Associativity of Operators
Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const (ambiguous) <expr> -> <expr> + const | const (unambiguous) <expr> <expr> <expr> + const <expr> + const const
41
Associativity Left-associative operators E E + F | E - F | F
/ F X + E id3 ) ( * id1 id4 id2 Left-associative operators E E + F | E - F | F F F * X | F / X | X X ( E ) | id (id1 + id2) * id3 / id4 = (((id1 + id2) * id3) / id4)
42
Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees
43
Example of Dangling Else
With which ‘if’ does the following ‘else’ associate? if (x < 0) if (y < 0) y = y - 1; else y = 0; Answer: either one!
44
An Ambiguous Grammar <assign> <id> := <expr>
<id> A | B | C <expr> <expr> + <expr> | <expr> * <expr> | (<expr>) | <id> With this grammar, the sentence <assign> A = B + A * C has two distinct parse trees see next slide The reason this is important is that the second parse tree represents an interpretation of the expression where + has higher precedence than * which would give us an incorrect answer We need to ensure that * has a higher precedence than + so that the meaning of the above statement is really A = B + (A * C). Our grammar is ambiguous however, so that in fact the previous statement could possible be interpreted as A = (B + A) * C if we perform leftmost derivations. We see this on the next slide.
45
Two Parse Trees for A = B + A * C
<assign> <id> = <expr> A <expr> * <expr> <expr> <expr> <id> <id> <id> C B A <assign> <id> = <expr> A <expr> <expr> <id> <expr> * <expr> B <id> <id> A C The lower down the operator in the parse tree, the higher its precedence so on the left, * has a higher precedence than + (which is as it should be) but the tree on the right is incorrect, essentially being A = (B + A) * C even though there are no parentheses specified to alter the precedence
46
An Ambiguous Expression Grammar
<expr> <expr> <op> <expr> | const <op> / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const
47
Example: Ambiguous Grammar
E E + E E E - E E E * E E E / E E id + E * id3 id2 id1 * E id2 + id1 id3
48
Solving the dangling else ambiguity
Algol 60, C, C++: associate each else with closest if; use {} or begin…end to override. Algol 68, Modula, Ada: use explicit delimiter to end every conditional (e.g., if…fi) Java: rewrite the grammar to limit what can appear in a conditional: IfThenStatement -> if ( Expression ) Statement IfThenElseStatement -> if ( Expression ) StatementNoShortIf else Statement The category StatementNoShortIf includes all except IfThenStatement.
49
Ambiguity in Expressions
Which operation is to be done first? solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. solved by associativity If an operator is right-associative (or left-associative), an operand in between 2 operators is associated to the operator to the right (left). Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z
50
An Unambiguous Expression Grammar
If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> <expr> - <term> | <term> <term> <term> / const| const <expr> <expr> - <term> <term> <term> / const const const
51
3 common extensions to BNF:
Extended BNF Grammars 3 common extensions to BNF: [ ] - used to denote optional elements (saves some space so that we don’t have to enumerate options as separate possibilities) { } - used to indicate 0 or more instances ( ) - for a list of choices These extensions are added to a BNF Grammar for convenience allowing us to shorten the grammar EBNF is merely a convenience for helping someone write a grammar without having to spend a lot of time enumerating all of the cases.
52
Extended BNF Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)] Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term> (+|-) const Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit}
53
BNF and EBNF BNF <expr> <expr> + <term> EBNF
<term> <term> * <factor> | <term> / <factor> | <factor> EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}
54
Extended Backus-Naur Form (EBNF)
Kleene’s Star/ Kleene’s Closure Seq ::= St {; St} Seq ::= {St ;} St Optional Part IfSt ::= if ( E ) St [else St] E ::= F [+ E ] | F [- E]
55
Recent Variations in EBNF
Alternative RHSs are put on separate lines Use of a colon (::=) instead of => Use of opt for optional parts Use of oneof for choices e.g., AssignmentOperator one of = *= /= %= += -= <<= >>= &= |= ̂=
56
Finite State Automata Set of states
Usually represented as graph nodes Input alphabet + unique “end of program” symbol State transition function Usually represented as directed graph edges (arcs) Automaton is deterministic if, for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol Unique start state One or more final (accepting) states
57
Syntax Diagrams Graphical representation of EBNF rules Examples
nonterminals: terminals: sequences and choices: Examples X ::= (E) | id Seq ::= {St ;} St E ::= F [+ E ] IfSt id ( ) E id St ; F + E
58
DFA for C Identifiers
59
Syntax Diagram for Expressions with Addition
60
Semantics of Bit Pattern : 0101110
Number (if base 2) (if base 10) Fraction 23/64 Unary encoding “13” ASCII Character “.” Switch positions ON-OFF-ON-… Binary string “ ”
61
Motivation for Precise Specification
Capture subtly different semantics for the same syntax in various languages. Arrays and strings. Parameter passing mechanisms. Scoping rules. Primitive types vs Composite types. Type equivalence.
62
Attribute Grammars (AGs)
An attribute grammar is a device used to describe more of the structure of a programming language than can be described with a CFG. It is an extension to a CFG. Attribute grammars (AGs) have additions to CFGs to carry some semantic info on parse tree nodes
63
Attribute Grammars Defined
X: grammar symbol A(X): a set of attributes associated to X. A(X) consists of two disjoint sets S(X) and I(X). S(X): Synthesized attributes, are used to pass semantic information up a parse tree. I(X): Inherited attributes, are used to pass semantic information down and across a parse tree. Let X0 X1 ... Xn be a rule. Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes of X0. The value of a synthesized attribute on a parse tree node depends only on that node’s children node. Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for 1 <= j <= n, define inherited attributes. The value of a inherited attribute on a parse tree node depends only on that node’s parent node, and those of its sibling nodes.
64
Synthesized Attributes
N . val := 0 N . len := 1 N . val := 1 N . val := N. val N . len := N. len + 1 N . val := N. val + 2^ N. len N . len := N. len + 1 N -> 0 N -> 1 N -> 0 N N -> 1 N Right recursive rules; std. semantics
65
Attribute Grammars Defined (cont’d)
A Predicate function has the form of a Boolean expression on the union of the attribute set {A(X0), ... , A(Xn)} and a set of literal attribute values. Intrinsic attributes: synthesied attributes of leaf nodes whose values are determined outside the parse tree. If all the attribute values in a parse tree have been computed, the tree is said to be fully attributed. Although in practice it is not always done this way, it is convenient to think of attribute values as being computed after the complete unattributed parse tree has been constructed by the compiler.
66
Inherited Attributes Declaration and Use
{ int i, j, k; i := i + j + j; } <assign-stm> -> <var> := <expr> <var>.env := <assign-stm>.env <expr>.env := <assign-stm>.env
67
Inherited Attributes Coercion Code Generation 5.0 + 2
coerce_int_to_real Determination of un-initialized variables Determination of reachable non-terminals Evaluation of an expression containing variables
68
Attribute Grammars: An Example
This example illustrates how an AG can be used to check the type rules of a simple assignment statement. A, B, C : variable name, they can be int or real RHS of the assignment can be a variable or an expression. When there are two variables on RHS, they need not be same type. When operand types are not the same, the type of the expression is always real. When operand types are the same, the type of the expression is that of operand. The type of LHS must match the type of RHS. The types of operands in RHS can be mixed, but the assignment is valid only if the target and the value resulting from RHS have the same type.
69
Fully Attributed Parse Tree
<assign> expected_type = real actual_type = real <var> = <expr> expected_type actual_type = int <var>[2] + <var>[3] expected_type = real actual_type = real expected_type = real actual_type = real A A B 57
70
Quiz Convert EBNF to BNF S A { b A} A a [b] A
71
Quiz Prove that the grammar is ambiguous <S> <A>
<A> <A> * <A> | <id> <id> x | y | z
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.