1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter 3
2 What is Syntax and Semantics Syntax and Semantics define a PL Syntax –form or structure of program units expressions, statements, declarations, etc. Semantics –meaning of program units expressions, statements, declarations, etc. Why do we need language definitions? –to design a language –to implementer a compiler/interpreter –to write a program (use the language)
3 Syntax Elements A sentence is –a string of characters over some alphabet A language is –a set of sentences A lexeme is –the lowest level syntactic unit of a language e.g., *, public, totalCount A token is –a category of lexemes e.g., identifier
4 Describing Syntax Recognizers –read an input string in the alphabet of the language (a sentence) and decide whether it belongs to the language used in compilers –see Chapter 4 for details Generators –produce sentences in a language a sentence is syntactically correct if it can be generated by the generator
5 Backus-Naur Form (BNF) BNF is a meta-language –i.e. a language used to describe another language –invented by John Backus to describe ALGOL 58 –used by Peter Naur to describe ALGOL 60 BNF is equivalent to context-free grammars a BNF grammar is defined by –a set of terminal symbols, –a set of nonterminal symbols –a set of rules –a start symbol (one of the terminal symbols)
6 BNF Elements terminal symbols –are the lexemes of the target PL e.g., while, (, ) nonterminal symbols –represent classes of syntactic structures they act like syntactic variables e.g., rules –define how a nonterminal symbol can by developed into a sequence of nonterminal and terminal symbols e.g., while ( )
7 BNF Rules A rule has –a left-hand side (LHS) –then –a right-hand side (RHS) There can be several rules for one LHS begin end Syntactic lists are described using recursion ident ident, A grammar is –a finite nonempty set of rules
8 EBNF Extended BNF (EBNF) –is most often used –avoids having numerous rules for the same LHS Extra meta-symbols (in addition to ) –[… ] enclosed symbols are optional (1 or 0 times) –e.g., if ( ) [ else ] –{…} enclosed symbols can be repeated (0 to n times) –e.g., ident {, ident } –…|… choice of one of the symbol sequences separated by | –e.g., | begin end –(…) groups enclosed symbols
9 BNF + - * / ** ( ) id EBNF { ( + | - ) } { ( * | / ) } [ ** ] ( ) | id BNF vs. EBNF
10 Augmented EBNF another meta-symbol = (equal) instead of meta-symbols for repetitions + means one or more times * means zero or more times = + ( | ) * rules can use iteration instead of recursion –e.g.: | ; –can be formulated as = ( ; ) *
11 Context-Free Grammar Context-Free Grammars (CFG) –defined by Noam Chomsky –meant to describe the syntax of natural languages Context-Free Grammar G = (S, T, N, P) S = start symbol T = set of terminal symbols – lexemes and tokens N = set of non-terminal symbols - abstractions P = production rules – definition of a LHS abstraction using RHS A sentence –a sequence of terminal symbols
12 A Small Language in EBNF begin end | ; = + | - | const a | b | c
13 Derivation A derivation is –a repeated application of rules starting with the start symbol substitution of a nonterminal LHS by the RHS of a rule ending with a sentence (all terminal symbols) Every string of symbols in the derivation is –a sentential form A sentence is –sentential form with only terminal symbols
14 Derivation Types A leftmost derivation –leftmost nonterminal in each sentential form is expanded first A rightmost derivation –rightmost nonterminal is expanded first A mixed derivation –an arbitrary nonterminal is expanded
15 Derivation Example begin end | ; = + | - | const a | b | c => begin end => begin = end => begin a = end => begin a = + end => begin a = b + end => begin a = b + const end
16 Questions In the preceding slide: 1.Is the derivation a leftmost or a rightmost derivation? 2.State the "opposite" derivation. I.e. if it is a leftmost derivation give rightmost one or vice versa 3.What are the terminal symbols of the language, what are the nonterminal symbols and what is the start symbol? 4.Change a rule so that begin a = - b + const end is a legal sentence
17 Parse Tree Parse Tree is –a hierarchical representation of a derivation const a = b + beginend
18 EBNF Grammar = + | * | ( ) | a | b | c Parse tree of the sentence: a = b * (a + c) Simple Assignment Language a = c * b () a +
19 Ambiguous Grammars A grammar is ambiguous –if and only if it generates a sentential form that has two or more distinct parse trees –e.g. = + | * | ( ) | a | b | c
20 add-first parse tree a = b + c * d multiply-first parse tree a = b + c * d Two Distinct Parse Trees a = d * b + c a = * b + c d
21 An Unambiguous Expression Grammar The same language can be defined with an unambiguous grammar! = + | * | ( ) | a | b | c
22 Precedence Through Grammar A grammar can enforce the precedence of operators –The parse tree shows how (low levels are evaluated first) –e.g., + | * const | const * const + const const