CSCI 3370: Principles of Programming Languages Syntax (cont.) Dr. Vamsi Paruchuri University of Central Arkansas vparuchuri@uca.edu
BNF Fundamentals Non-terminals: BNF abstractions Terminals: lexemes and tokens Grammar: a collection of rules Examples of BNF rules: <ident_list> → identifier | identifier, <ident_list> <if_stmt> → if <logic_expr> then <stmt>
BNF Rules A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols A grammar is a finite nonempty set of rules An abstraction (or nonterminal symbol) can have more than one RHS <stmt> <single_stmt> | begin <stmt_list> end
Describing Lists Syntactic lists are described using recursion <ident_list> ident | ident, <ident_list> A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)
An Example Grammar <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | const
An Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Derivation Every string of symbols in the derivation is a sentential form A sentence is a sentential form that has only terminal symbols A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation may be neither leftmost nor rightmost
Parse Tree A hierarchical representation of a derivation Consider a = b + const <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | const <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b
Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees
An Ambiguous Expression Grammar <expr> <expr> <op> <expr> | const <op> / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const
An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> <expr> - <term> | <term> <term> <term> / const| const <expr> <expr> - <term> <term> <term> / const const const
Associativity of Operators Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const (ambiguous) <expr> -> <expr> + const | const (unambiguous) Left Recursive Non-terminal is on left of RHS <expr> <expr> + <term> Right Recursive Non-terminal is on right of RHS <expr> <term> + <expr> <expr> <expr> <expr> + const <expr> + const const
Extended BNF Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term> (+|-) const Repetitions (zero or more) are placed inside braces { } <ident> → letter {letter|digit}
BNF and EBNF BNF <expr> <expr> + <term> EBNF <term> <term> * <factor> | <term> / <factor> | <factor> EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}
Semantics Static semantics Dynamic semantics attribute grammars examples computing attribute values status Dynamic semantics operational semantics axiomatic semantics Examples evaluation denotational semantics
Static Semantics Used to define things about PLs that are hard or impossible to define with BNF hard: type compatibility impossible: declare before use Can be determined at compile time hence the term static Often specified using natural language descriptions imprecise Better approach is to use attribute grammars Knuth (1968)
Attribute Grammars Carry some semantic information along through parse tree Useful for static semantic specification static semantic checking in compilers An attribute grammar is a CFG G = (S, N, T, P) with the additions for each grammar symbol x there is a set A(x) of attribute values each production rule has a set of functions that define certain attributes of the non-terminals in the rule each production rule has a (possibly empty) set of predicates to check for attribute consistency valid derivations have predicates true for each node
Attribute Grammars (continued) Synthesized attributes are determined from nodes of children in parse tree if X0 -> X1 ... Xn is a rule, then S(X0) = f(A(X1), ..., A(Xn)) pass semantic information up the tree Inherited attributes are determined from parent and siblings I(Xj) = f(A(X0), ..., A(Xn)) often, just X0 ... Xj-1 siblings to left in parse tree pass semantic information down the tree
Attribute Grammars (continued) Intrinsic attributes synthesized attributes of leaves of parse tree determined from outside tree e.g., symbol table
Attribute Grammars: Definition Let X0 X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n, define inherited attributes Initially, there are intrinsic attributes on the leaves
An Example The name on the end of an Ada procedure must match the procedure's name: syntax rule: <proc_def> procedure <proc_name>[1] <proc_body> end <proc_name>[2] Predicate: <proc_name>[1].string == proc_name>[2].string
Another Example Syntax actual_type: <assign> -> <var> = <expr> <expr> -> <var> + <var> | <var> <var> -> A | B | C actual_type: synthesized for <var> and <expr> expected_type: inherited for <expr>
Attribute Grammars (continued) Think of attributes as variables in the parse tree, whose values are calculated at compile time conceptually, after parse tree is built Example attributes actual_type intrinsic for variables determined from types of child nodes for <expr> expected_type for <expr>, determined by type of variable on LHS of assignment statement, for example
Attribute Grammar (continued) Syntax rule: <expr> <var>[1] + <var>[2] Semantic rules: <expr>.actual_type <var>[1].actual_type Predicate: <var>[1].actual_type == <var>[2].actual_type <expr>.expected_type == <expr>.actual_type Syntax rule: <var> id Semantic rule: <var>.actual_type lookup (<var>.string)
An attribute grammar for simple assignment statements 1 Syntax rule: <assign> → <var> = <expr> Semantic rule: <expr>.expected_type ← <var>.actual_type 2 Syntax rule: <expr> → <var>[1] + <var>[2] Semantic rule: <expr>.actual_type ← if(<var>[1].actual_type=int) and (<var>[2].actual_type = int) then int else real Predicate: <expr>.actual_type==<expr>.expected_type 3 Syntax rule: <expr> → <var> Semantic rule: <expr>.actual_type ← <var>.actual_type Predicate rule: <expr>.actual_type==<expr>.expected_type 4 Syntax rule: <var> → A | B | C Semantic rule <var>.actual_type ← look-up(<var>.string)
Parse tree for A = A + B
Flow of Attributes
Fully Attributed Parse Tree
Attribute Grammars (continued) How are attribute values computed? If all attributes were inherited, the tree could be decorated in top-down order. If all attributes were synthesized, the tree could be decorated in bottom-up order. In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used.
Status of Attribute Grammars Well-defined, well-understood formalism used for several practical compilers Grammars for real languages can become very large and cumbersome and take significant amounts of computing time to evaluate Very valuable in a less formal way for actual compiler construction
Dynamic Semantics Describe the meaning of PL constructs No single widely accepted way of defining Three approaches used operational semantics axiomatic semantics denotational semantics All are still in research stage, rather than practical use most real compilers use ad-hoc methods
Operational Semantics Describe meaning of a program by executing its statements on a machine actual or simulated change of state of machine (values in memory, registers, etc.) defines meaning Could use actual hardware machine too expensive Could use a software interpreter too complicated, because of underlying machine complexity not transportable
Operational Semantics To use operational semantics for a high-level language, a virtual machine is needed A hardware pure interpreter would be too expensive A software pure interpreter also has problems The detailed characteristics of the particular computer would make actions difficult to understand Such a semantic definition would be machine- dependent
Operational Semantics (continued) A better alternative: A complete computer simulation The process: Build a translator (translates source code to the machine code of an idealized computer) Build a simulator for the idealized computer Evaluation of operational semantics: Good if used informally (language manuals, etc.) Extremely complex if used formally VDL description of semantics of PL/I was several hundred pages long
Axiomatic Semantics Based on formal logic (predicate calculus) Original purpose: formal program verification Axioms or inference rules are defined for each statement type in the language (to allow transformations of expressions to other expressions) The expressions are called assertions
The idea “Compute a number y whose square is less than the input x” We have to write a program P such that y*y < x But what if x = -4? There is no program computing y!!
The idea (continued) “If the input x is a positive number then compute a number y whose square is less than the input x” We need to talk about the states before and after the execution of the program P { x>0 } P { y*y < x }
Axiomatic Semantics (continued) An assertion before a statement (a precondition) states the relationships and constraints among variables that are true at that point in execution An assertion following a statement is a postcondition A weakest precondition is the least restrictive precondition that will guarantee the postcondition An assertion R is said to be weaker than assertion P if the truth of P implies the truth of R, written P→R.
Axiomatic Semantics Form Pre-, post form: {P} statement {Q} An example a = b + 1 {a > 1} One possible precondition: {b > 10} Weakest precondition: {b > 0 }
Program Proof Process The postcondition for the entire program is the desired result Work back through the program to the first statement. If the precondition on the first statement is the same as the program specification, the program is correct. The program proving game is played as follows: We know what program construct C we are using. We know what assertion Q we want to be true after C terminates. We use the proof system to find out what absolutely must be true before executing C and nothing more, that is we find the weakest precondition of C that will yield Q, wp(C,Q). Then we know that if we execute C in any state P such that P→wp(C,Q), then Q will be true when C terminates.
Axiomatic Semantics: Axioms An axiom for assignment statements (x = E): {Qx->E} x = E {Q} The Rule of Consequence: {x > 3} x = x-3 {x > 0}, (x > 5) => (x > 3), (x > 0) => (x > 0) {x > 5} x = x-3 {x > 0}, Strengthening the antecedent Weakening the consequent
Axiomatic Semantics: Axioms An inference rule for sequences {P1} S1 {P2} {P2} S2 {P3} Program Proofs – Validating simple programs
Example Compute the precondition for the assignment statement x = 2 * y - 3 { x > 25 } The weakest precondition is computed as x > 25 2 * y -3 > 25 y > 14
Example What about if the left side of the assignment appears in the right side of the assignment? x = x + y - 3 {x > 10} The weakest precondition is x + y - 3 > 10 y > 13 – x Has no effect on the process of computing the precondition.
Example Consider the following sequence and postcondition: y = 3 * x + 1; x = y + 3; {x < 10} The precondition for the last assignment statement is y < 7 Which is used as the postcondition for the first statement. The precondition for the first statement and the sequence can be now computed. 3 * x + 1 < 7 x < 2
Correctness Proof {x=0 and y=0} x:=x+1;y:=y+1; {x = y} wp(y:=y+1; , {x = y}) = { x = y+1 } wp(x:=x+1; , {x = y+1}) = { x+1 = y+1 } wp(x:=x+1;y:=y+1; , {x = y}) = { x = y } { x = 0 and y = 0 } => { x = y }
Evaluation of Axiomatic Semantics Developing axioms or inference rules for all of the statements in a language is difficult It is a good tool for correctness proofs, and an excellent framework for reasoning about programs, but it is not as useful for language users and compiler writers Its usefulness in describing the meaning of a programming language is limited for language users or compiler writers
Denotational Semantics Based on recursive function theory The most abstract semantics description method Originally developed by Scott and Strachey (1970)
Denotational Semantics (continued) The process of building a denotational specification for a language Define a mathematical object for each language entity Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects The meaning of language constructs are defined by only the values of the program's variables
Denotation Semantics vs Operational Semantics In operational semantics, the state changes are defined by coded algorithms In denotational semantics, the state changes are defined by rigorous mathematical functions
Denotational Semantics: Program State The state of a program is the values of all its current variables s = {<i1, v1>, <i2, v2>, …, <in, vn>} Let VARMAP be a function that, when given a variable name and a state, returns the current value of the variable VARMAP(ij, s) = vj (the value paired with ij in state s)
Binary Numbers Mbin (‘0’) = 0 Mbin (‘1’) = 1 <bin_num> ‘0’ | ‘1’ | <bin_num> ‘0’ | <bin_num> ‘1’ Mbin (‘0’) = 0 Mbin (‘1’) = 1 Mbin (<bin_num> ‘0’) = 2* Mbin (<bin_num>) Mbin (<bin_num> ‘1’) = 2* Mbin (<bin_num>) + 1
Decimal Numbers Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9 <dec_num> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | <dec_num> (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9 Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>) Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1 … Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9
Expressions Map expressions onto Z {error} We assume expressions are decimal numbers, variables, or binary expressions having one arithmetic operator and two operands, each of which can be an expression described by the following BNF <expr> → <dec_num> | <var> | <binary_expr> <binary_expr> → <left_expr> (+ |*) <right_expr> <left_expr> → <dec_num> | <var> <right_expr> → <dec_num> | <var>
Semantics (cont.) Me(<expr>, s) = case <expr> of <dec_num> => Mdec(<dec_num>, s) <var> => if VARMAP(<var>, s) == undef then error else VARMAP(<var>, s) <binary_expr> => if (Me(<binary_expr>.<left_expr>, s) == undef OR Me(<binary_expr>.<right_expr>,s) == undef) else if (<binary_expr>.<operator> == ‘+’ then Me(<binary_expr>.<left_expr>, s) + Me(<binary_expr>.<right_expr>, s) else Me(<binary_expr>.<left_expr>, s) * ...
Evaluation of Denotational Semantics Can be used to prove the correctness of programs Provides a rigorous way to think about programs Can be an aid to language design Has been used in compiler generation systems Because of its complexity, they are of little use to language users
Summary BNF and context-free grammars are equivalent meta-languages Well-suited for describing the syntax of programming languages An attribute grammar is a descriptive formalism that can describe both the syntax and the semantics of a language Three primary methods of semantics description Operation, axiomatic, denotational