Syntax (2)
Clite Grammar (1) lexical level Identifier Letter { Letter | Digit } Letter a | b | ... | z | A | B | ... | Z Digit 0 | 1 | ... | 9 Literal Integer | Boolean | Float | Char Integer Digit { Digit } Boolean true | False Float Integer . Integer Char ‘ ASCII Char ‘
Issues Not Addressed by this Grammar Comments Whitespace Distinguishing one token <= from two tokens < = Distinguishing identifiers from keywords like if
Lexical Syntax (Lexer ) Input: a stream of characters from the ASCII set, keyed by a programmer. Output: a stream of tokens
Classes of Tokens Identifiers: Stack, x, i, push Literals : 123, 'x', 3.25, true Keywords : bool false true char int float if else while main Operators : = || && == != < <= > >= + - * / ! Punctuation : ; , { } ( )
Whitespace Whitespace is any: No token may contain embedded whitespace Tab end-of-line character (or characters) character sequence inside a comment No token may contain embedded whitespace (unless it is a character or string literal) Example: >= one token > = two tokens
Whitespace Examples in Pascal while a < b do legal - spacing between tokens whilea < b do Whilea – valid identifier token whilea < -invalid statement prefix
Keywords and Identifiers Both an identifier and a keyword are lexically the same. if is a keyword and it is also an identifier In most languages keywords are reserved and cannot be used as identifiers main in C and C++ is not a reserved word but it is special identifier
Integer and float values Range and storage Not limited in Clite grammar No idea about storage both limits and storage space are semantic issues
Concrete Syntax (Parser) Based on BNF/EBNF grammar Input: tokens Output: Concrete Syntax (parse) tree or Abstract Syntax tree
Concrete Syntax of Clite Metabraces {} imply left associativity Metabrackets [] makes EquOp and RelOp non-associative In C++, the expression: if (a < x < b) is not equivalent to if (a < x && x < b) But it is error-free! Clite Differs from C/C++ Fewer operators Equality and relational are non-associative
Clite Grammar (2) Expressions Operator Associativity Unary - ! none * / left + - left < <= > >= none == != none && left || left Expression Conjunction { || Conjunction } Conjunction Equality { && Equality } Equality Relation [ EquOp Relation ] EquOp == | != Relation Addition [ RelOp Addition ] RelOp < | <= | > | >= Addition Term { AddOp Term } AddOp + | - Term Factor { MulOp Factor } MulOp * | / | % Factor [ UnaryOp ] Primary UnaryOp - | ! Primary Identifier | Literal | ( Expression ) | Type ( Expression )
Clite Grammar (3) Statements Program int main ( ) { Declarations Statements } Declarations { Declaration } Declaration Type Identifier { , Identifier } Type int | bool | float | char Statements { Statement } Statement ; | Block | Assignment | IfStatement | WhileStatement Block { Statements } Assignment Identifier = Expression ; IfStatement if ( Expression ) Statement [ else Statement ] WhileStatement while ( Expression ) Statement
Abstract Syntax Removes “syntactic redundancies ” and keeps essential elements of a language. Pascal while i < n do begin i := i + 1; end; C/C++ while (i < n) { i = i + 1; } The only essential information It is a loop A terminating condition i < n A body increments the current value of i.
Parse and Abstract Syntax trees Parse tree is inefficient The shape of the parse tree reveals the meaning of the program. So we want a tree that removes its inefficiency and keeps its shape. Remove separator/punctuation terminal symbols Remove all trivial root nonterminals Replace remaining nonterminals with leaf terminals
Example: z = x + 2*y; Parse Tree Abstract Syntax Tree
Partial Abstract Syntax of Clite Assignment = Variable target; Expression source Expression = Variable | Value | Binary | Unary Variable = String id Value = Integer Value Binary = Operator op; Expression term1, term2 Unary = Operator op; Expression term Operator = + | - | * | / | !
Example Abstract Syntax Tree for z = x+2*y Assignment = Variable target; Expression source Expression = Variable | Value | Binary | Unary Variable = String id Value = Integer Value Binary = Operator op; Expression term1, term2 Unary = Operator op; Expression term Operator = + | - | * | / | ! Assignment Variable Binary Operator Value z + * x y 2