Download presentation
Presentation is loading. Please wait.
1
Chapter 3 Describing Syntax and Semantics
2
Chapter 3: Describing Syntax and Semantics
Objectives: - Introduction - The General Problem of Describing Syntax - Formal Methods of Describing Syntax - The general problem of describing syntax Formal methods of describing syntax Denotation semantics Axiomatic semantics Operational semantics
3
Who must use language definitions? Language designers Implementers
3.1 Introduction A language is a set of sentences or statements. Sentences or statements are the valid strings of a language. They consist of valid alphabets sequenced in a way that is consistent with the grammar of the language. Who must use language definitions? Language designers Implementers - Programmers (the users of the language) - A formal description of the language is essential in learning, writing, and implementing the language. Thus description must be precise and understandable. 3
4
- Languages are described by their syntaxes and semantics
What is the meaning of Syntax? It means the form or structure of the expressions, statements and program units. The Syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. What is the meaning of Semantics? It means the meaning of the expressions, statements, and program units. Syntax describes what the language looks like. -Semantics determines what a particular construct actually does in a formal way. - Syntax is much easier to describe than semantics. 4
5
3.2 The General Problem of Describing Syntax: Terminology
Formal descriptions of the syntax of programming languages, for simplicity sake, often do not include descriptions of the lowest-level syntactic units. These small units are called lexemes. A lexeme is basic component during lexical analysis. A lexeme consists of related alphabet from the language. e.g. numeric literals, operators (+), and special words (begin). Lexemes are partitioned into groups. For example, the names of variables, methods, classes, and so forth in a programming language form a group called identifiers. Each lexeme group is represented by a name, or a token. A token of a language is a category of its lexemes, and a lexeme is an instance of a token. For example, an identifier is a token that can have lexemes, or instances, such as counter and total (variable names). 5
6
Syntax – the form of the expressions, statements, and program units
Semantics - the meaning of the expressions, statements, and program units. Ex: while (<Boolean_expr>)<statement> The semantics of this statement form is that when the current value of the Boolean expression is true, the embedded statement is executed. The form of a statement should strongly suggest what the statement is meant to accomplish.
7
Example: consider the following Java statement index = 2 * count +17;
The lexemes and tokens of this statement are represented in the following table Lexemes Tokens index identifier = equal_sign 2 int_literal * mult_op 17 ; semicolon 7
8
3.2.1 Language Recognizer 3.2.2 Language Generators
In general, languages can be formally defined in two distinct ways: A language can be (1) generated or (2) recognized. 3.2.1 Language Recognizer A recognizer of a language identifies those strings that are within a language from those that are not. The lexical analyzer and the parser of a compiler are the recognizer of the language the compiler translates. The lexical analyzer recognizes tokens, and the parser recognizes the syntactic structure. 3.2.2 Language Generators A generator generates the valid sentences of a language. In some cases it is more useful than the recognizer since we can “watch and learn”. 8
9
3.3 Formal Methods of Describing Syntax
The formal language that is used to describe the syntax of programming languages is called grammar. 3.3.1 BNF (Backus-Naur Form) and Context-Free Grammars BNF is widely accepted way to describe the syntax of a programming language. Context-Free Grammars Regular expression and context-free grammar are useful in describing the syntax of a programming language. Regular expression describes how a token is made of alphabets, and context-free grammar determines how the tokens are put together as a valid sentence in the grammar. 9
10
3.3 Formal Methods of Describing Syntax
Origins of Backus- Naur From (BNF) John Backus and Peter Naur used BNF to describe ALGOL 58 and ALGOL 60, and BNF is nearly identical to context-free grammar. BNF Fundamentals Metalanguage is a language that is used to describe another language. BNF is a metalanguage for programming languages. The BNF consists of rules (or productions). A rule has a left-hand side (LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in java assignment statement the definition could be as follows: <assign> → <var> = <expression> 10
11
Nonterminals are often enclosed in angle brackets < >
The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply nonterminals. Lexemes and tokens of the rules are called terminal symbols, or simply terminals. A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string of terminals and/or nonterminals i.e. mixture of tokens, lexemes, and references to other abstractions. Nonterminals are often enclosed in angle brackets < > A BNF description, or grammar, is a collection of rules. The rule indicates that whenever you see a nonterminal on the LHS, you can replace it with the RHS, just like expanding a non-terminal into its children in a tree. 11
12
<if_stmt> → if ( <logic_expr> ) <stmt>
An abstraction (or non terminal symbol) can have more than one RHS. e.g. <if_stmt> → if ( <logic_expr> ) <stmt> | if ( <logic_expr> ) <stmt> else <stmt> Describing Lists Recursion is used in BNF to describe lists. e.g. <ident_list> → ident | ident, <ident_list> 12
13
3.3.1.5 Grammars and Derivations
BNF is a generator of the language. The sentences of a language can be generated from the start symbol by applying a series of rules on it. - A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) Replacing a non-terminal with different RHS’s may derive different sentences. Each string of symbols during a derivation is a sentential form. A sentence is a sentential form that has only terminal symbols If we replace every leftmost non-terminal of the sentential form, the derivation is leftmost. However, the set of sentences generated are not affected by the derivation order. 13
14
Example (3.1) a grammar for small language
<program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt>; <stmt_list> <stmt> → <var> = <expression> <var> → A | B | C <expression> → <var> + <var> | <var> - <var> | <var> A derivation of a program in this language follows: <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A= <expression>; <stmt_list> end => begin A = <var> + < var> ; <stmt_list> end => begin A = B+ < var> ; <stmt_list> end => begin A = B + C; <stmt_list> end => begin A = B + C; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end 14
15
Example (3.2) a grammar for a Simple Assignment Statements
<assign> → <id> = <expr> <id> → A | B | C | <expr> → <id> + <expr> | <id>*<expr> | (<expr>) | <id> A = B*(A+C) is generated by the leftmost derivation: <assign> => <id> = <expr> => A = <expr> => A = <id> * <expr> => A = B * <expr> => A = B * (<expr>) => A = B * (<id> + <expr>) => A = B * (A + <expr>) => A = B * (A + <id>) => A = B * (A + C) 15
16
A parse tree for the simple statement: A = B * (A + C)
Parse Trees A derivation can be represented by a tree hierarchy called parse tree. The root of the tree is the start symbol and applying a derivation rule corresponds to expand a non-terminal in a tree into its children. A parse tree for the simple statement: A = B * (A + C) <assign> <id> <expr> = <id> <expr> * A ( ) <expr> B <id> <expr> + <id> A 16 C
17
Ambiguity A grammar is ambiguous if for a given sentence, there is more than one parse tree, i.e., there are two derivations that lead to the same sentence. Two distinct parse trees (generated by the grammar on the next slide) for the same sentence, A = B +C * A <assign> <assign> <id> <expr> = <id> <expr> = <expr> <expr> A + A <expr> <expr> * <expr> <expr> * <id> <expr> <expr> + <id> <id> <id> <id> <id> B A A 17 C B C
18
An Ambiguous Grammar <assign> → <id> = <expr> <id> → A | B | C <expr> → <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <id>
19
<expr> → <expr> - <term> | <term>
Operator Precedence Operator Precedence can be maintained by modifying a grammar so that operators with higher precedence are grouped earlier with its operands, so that they appear lower in the parse tree. If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> → <expr> - <term> | <term> <term> → <term> / const | const 19
20
An Unambiguous Grammar
<assign> → <id> = <expr> <id> → A | B | C <expr> → <expr> + <term> | <term> <term> → <term> * <factor> | <factor> <factor> → ( <expr> ) | <id>
21
3.3.1.9 Associativity of Operators
Operator associativity can also be indicated by a grammar. <expr> => <expr> + const | const (unambiguous) <expr> => <expr> + <expr> | const (ambiguous) <expr> + const 21
22
3.3.2 Extended BNF - EBNF has the same expression power as BNF
- The addition includes optional constructs (parts), repetition, and multiple choices, very much like regular expression. - Optional parts are placed in brackets ([ ]) <if_stmt> → if (<expression>) <statement> [else <statement>] - Alternative parts of RHSs in parentheses and separate them with vertical bars <term> → <term> (* | / | %) <factor> - Put repetitions (0 or more) are placed inside braces ({}) <ident_list> → <identifier> {, <identifier>} 22
23
<expr> <expr> + <term>
3.3.2 Extended BNF BNF <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>} 23
24
3.4 Attribute Grammars An attribute grammar is an extension to a context –free grammar . It is used to describe more of the structure of a programming language than can be described with context-free grammar. The extension allows certain language rules to be described, such as type compatibility. 3.4.1 Static Semantics - There are many restrictions on programming languages that are either difficult or impossible to describe in BNF, however, they can be described when we add attributes to the terminal/non-terminals in BNF. - These added attribute and their computation could be computed at compile-time, thus the name static semantics. 24
25
3.4 Attribute Grammars 3.4.1 Static Semantics (contd.)
- Consider a rule which states that a variable must be declared before it is referenced. - Cannot be specified in a context-free grammar. -Can be tested at compile time. - Some rules can be specified in the grammar of a language, but will unnecessarily complicate the grammar. e.g. a rule in JAVA that states that a string literal cannot be assigned to a variable which was declared to be type int. 25
26
3.5 Describing the meaning of Programs: Dynamic Semantics
- The dynamic semantics of a program is the meaning of its expressions, statements, and program units. Dynamic semantic determines the meanings of programming constructs during their execution. - There is no single widely accepted notation or formalism for describing semantics. - Several needs for a methodology and notation for semantics: Programmers need to know what statements mean Compiler writers must know exactly what language constructs do - Three methods that are used to describe semantics formally: - Operational semantics - Axiomatic semantics -Denotational semantics 26
27
Operational Semantics
The idea behind operational semantics is to describe the meaning of a statement or program by specifying the effects of running it on a machine. Operational Semantics C Statement 27
28
3.5.2 Denotational Semantics
It is the most rigorous, widely known method for describing the meaning of programs Based on recursive function theory The most abstract semantics description method Originally developed by Scott and Strachey (1970) The process of building a denotational specification for a language (not necessarily easy): Define a mathematical object for each language entity Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects The method is named denotational because the mathematical objects denote the meaning of their corresponding syntactic entities. 28
29
The difference between denotational and operational semantics:
In operational semantics, programming language constructs are translated into simpler programming language constructs. In denotational semantics, programming language constructs are mapped to mathematical objects 29
30
The logical expressions are called predicates, or assertions.
An assertion immediately preceding a program statement describes the constraints on the program variables at that point in the program (precondition ). An assertion immediately following a statement describes the new constraints on those variables (and possibly others) after execution of the statement (postcondition). Precondition and postcondition assertions are presented in braces to distinguish them from parts of program statements {x > 10} sum = 2 * x + 1 {sum > 1} Developing an axiomatic description or proof of a given program requires that every statement in the program have both a precondition and a post condition. 30
31
Axiomatic Semantics Axiomatic semantics is based on mathematical logic. Axiomatic semantics defined in conjunction with the development of a method to prove the correctness of a program rather than directly specifying the meaning of a program Each statement of a program is both preceded and followed by a logical expression that specifies constraints on program variables. Simple Boolean expressions are adequate to express constraints 31
32
Example : Show that the program segment y = 2
{x = 1} z = x + y {z = 3} is correct with respect to the precondition {P}: x = 1 and the postcondition {Q}: z = 3. Solution: Suppose that p is true, so that x=1 as the program begins. Then y is assigned 2 and z is assigned the sum of x and y, which is 3. Hence program segment is correct.. x y z = 3 , thus: {P} statement {Q} is true. 32
33
Summary BNF and context-free grammars are equivalent meta- languages
Well-suited for describing the syntax of programming languages Three primary methods of semantics description Operational, axiomatic, denotational 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.