Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.

Slides:



Advertisements
Similar presentations
Programming Languages Third Edition Chapter 6 Syntax.
Advertisements

ICE1341 Programming Languages Spring 2005 Lecture #6 Lecture #6 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
BNF. What is BNF? BNF stands for “Backus-Naur Form,” after the people who invented it BNF is a metalanguage--a language used to describe another language.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
ISBN Chapter 3 Describing Syntax and Semantics.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Programming Languages 2nd edition Tucker and Noonan
Slide 1 Chapter 2-b Syntax, Semantics. Slide 2 Syntax, Semantics - Definition The syntax of a programming language is the form of its expressions, statements.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 2: Syntax Fall 2009 Marco.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
Building lexical and syntactic analyzers
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Syntax – Intro and Overview CS331. Syntax Syntax defines what is grammatically valid in a programming language –Set of grammatical rules –E.g. in English,
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Syntax Specification and BNF © Allan C. Milne Abertay University v
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Context-Free Grammars and Parsing
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.
Introduction to Programming Languages S1.3.1Bina © 1998 Liran & Ofir Introduction to Programming Languages Programming in C.
CPS 506 Comparative Programming Languages Syntax Specification.
CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2012 A language that is simple to parse for the compiler.
Syntax and Semantics Structure of programming languages.
Chapter 3 Describing Syntax and Semantics
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Syntax (2).
Syntax and Grammars.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
Structure of programming languages
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2012 A language that is simple to parse for the compiler.
Programming Languages 2nd edition Tucker and Noonan
Programming Languages
Chapter 3 – Describing Syntax
Syntax (1).
What does it mean? Notes from Robert Sebesta Programming Languages
Syntax versus Semantics
Compiler Construction (CS-636)
CSE 3302 Programming Languages
Compiler Designs and Constructions
Compiler Design 4. Language Grammars
Some CFL Practicalities
Programming Languages 2nd edition Tucker and Noonan
Programming Languages 2nd edition Tucker and Noonan
Chapter 2: A Simple One Pass Compiler
R.Rajkumar Asst.Professor CSE
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Programming Languages 2nd edition Tucker and Noonan
BNF 23-Feb-19.
Chapter 3 Describing Syntax and Semantics.
BNF 9-Apr-19.
High-Level Programming Language
Programming Languages 2nd edition Tucker and Noonan
COMPILER CONSTRUCTION
Presentation transcript:

Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth

2.1 Grammars Backus-Naur Form Derivations Parse Trees Associativity and Precedence Ambiguous Grammars 2.2 Extended BNF 2.3 Syntax of a Small Language: Clite Lexical Syntax Concrete Syntax 2.4 Compilers and Interpreters 2.5 Linking Syntax and Semantics Abstract Syntax Abstract Syntax Trees Abstract Syntax of Clite

Expr  Expr + Term | Expr – Term | Term Term  Term * Factor | Term / Factor | Term % Factor | Factor Factor  Primary ** Factor | Primary Primary  0 |... | 9 | ( Expr ) Red indicates a terminal of G 1, blue indicates meta-symbols (symbols that aren’t part of the language but are used to describe the language) Example: 10 * ( 5 – 3)

 Motivation for using a subset of C: Grammar Language (pages) Reference Pascal5Jensen & Wirth C 6Kernighan & Richie C++22Stroustrup Java14Gosling, et. al.  The Clite grammar fits on one page (next 3 slides), so it’s a far better tool for studying language design.

Program  int main ( ) { Declarations Statements } Declarations  { Declaration } Declaration  Type Identifier [ [ Integer ] ] {, Identifier [ [ Integer ] ] } ; Type  int | bool | float | char Statements  { Statement } Statement  ; | Block | Assignment | IfStatement | WhileStatement Block  { Statements } Assignment  Identifier [ [ Expression ] ] = Expression ; IfStatement  if ( Expression ) Statement [ else Statement ] WhileStatement  while ( Expression ) Statement

Expression  Conjunction { || Conjunction } Conjunction  Equality { && Equality } Equality  Relation [ EquOp Relation ] EquOp  == | != Relation  Addition [ RelOp Addition ] RelOp  | >= Addition  Term { AddOp Term } AddOp  + | - Term  Factor { MulOp Factor } MulOp  * | / | % Factor  [ UnaryOp ] Primary UnaryOp  - | ! Primary  Identifier [ [ Expression ] ] | Literal | ( Expression ) | Type ( Expression )

Identifier  Letter { Letter | Digit } Letter  a | b |... | z | A | B |... | Z Digit  0 | 1 |... | 9 Literal  Integer | Boolean | Float | Char Integer  Digit { Digit } Boolean  true | false Float  Integer. Integer Char  ‘ ASCII Char ‘ (ASCII Char is the set of ASCII characters)

13 grammar rules – compare to 4 pages, for C++ Metabraces { }(0 or more) is interpreted to mean left associativity; e.g., ◦ Addition → Term { AddOp Term } AddOp → + | - Metabrackets [ ] (optional) means an addition can only be followed by one or no relational operators plus another addition. Relation  Addition [ RelOp Addition ] ◦ RelOp  | >= ◦ (no a > b > c for example)

Comments The significance of whitespace Distinguishing one token <= from two tokens < = Distinguishing identifiers from keywords like if

 The Clite grammar has two levels ◦ lexical level (described by lexical syntax) ◦ syntactic level(described by concrete syntax)  They correspond to two separate parts of a compiler.  The issues on the previous slide are lexical issues.

 Examples of lexical entities (tokens): ◦ Identifiers e.g., numbr1, X ◦ Literalse.g., 123, 'x', 3.25, true ◦ Keywords bool char else false float if int main true while ◦ Operators = || && == != >= + - * / ! % ◦ Punctuation ;, { } ( ) ◦ Chare.g., ‘?’

 Whitespace is any space, tab, end-of-line character (or characters), or character sequence inside a comment  No token may contain embedded whitespace  (unless it is a character or string literal)  Example: >= one token > = two tokens

 while ( a <= b) legal - spacing between tokens  while(a<=b) also legal - spacing not needed  while (a < = b) no lexical errors but illegal syntactically – lexer would identify tokens while, (, a, <, =, b, )

 Clite uses // comment style of C++  Not defined in Clite grammar (but could be)  Instead, it’s defined outside the grammar  The use of whitespace to differentiate between one and two character operators is also defined outside the grammar.

Sequence of letters and digits, starting with a letter  “if” is an identifier which also is a keyword Keywords versus reserved words: ◦ Keyword: predefined by the language ◦ Reserved word: can only be used as defined. ◦ In most languages all keywords are also reserved, but in a few; e.g., Pascal, a subset of the keyword identifiers are predefined but not reserved (and can be redefined by the programmer). Implications? Flexibility, confusion …

 Concrete syntax of a language is the set of rules for writing correct programs  The structure of a specific program can be represented by a parse tree, based on the concrete syntax of the language, using the stream of Tokens identified during lexical analysis  The root of the parse tree is the Start Symbol of the language (Program, in Clite).

Clite’s expression rules are non- ambiguous with respect to precedence and associativity ◦ Rule ordering defines precedence; rule format defines associativity. C/C++ expression grammar definition is ambiguous – precedence and associativity are specified separately.

Clite OperatorAssociativity Unary - ! none * / left + - left >= none (i.e., no a < b <= c ) == != none && left || left

 … are non-associative. (an idea borrowed from Ada)  Why is this important? In C & C++, the expression: if (10 < x < 20) is not equivalent to if (10 < x && x < 20) But it is error-free! So, what does it mean?

 Grammar rules don’t specify the operand types to be used with various operators; e.g., is true + 13 a legal expression? What is the type of the expression ?  These are type and semantic issues, not lexical or syntax.

Lexical Analyzer (lexer) Syntactic Analyzer (parser) Semantic Analyzer Code Optimizer Code Generator Tokens Abstract Syntax Machine Code Intermediate Code (IC) Source Program Intermediate Code (IC)

 Input: characters (the program)  Output: tokens & token type  Lexical grammars are simpler than syntax grammars  Often generated automatically by lexical analyzer generating programs

Often based on BNF/EBNF grammar Input: tokens Output: abstract syntax tree or some other representation of the program Abstract syntax: similar to a concrete parse tree but with punctuation, many nonterminals discarded

Typical tasks: ◦ Check that all identifiers are declared ◦ Perform type checking for expressions, assignments, … ◦ Insert implied conversion operators (i.e., make them explicit) Context free grammars can’t express the semantic rules that are needed for this phase of translation. Output: Intermediate code (IC) tree, modified abstract syntax tree representation.

 Purpose: Improve the run-time performance of the object code ◦ Usually, to make it run faster ◦ Other possibilities: reduce amount of storage required  Drawback: optimization is time- consuming; slows down debugging  Output: Intermediate code, similar to abstract syntax notation; closer to machine code

Evaluate constant expressions at compile- time In-line expansion (of function calls) Loop unrolling Reorder code to improve cache performance Eliminate common sub-expressions Eliminate unnecessary code Store local variables/intermediate results in registers rather than on the stack or elsewhere in memory 

Output: machine code Instruction selection, register management “Peephole” optimization: look at a small segment of machine code, make it more efficient; e.g., ◦ x = y; →→ load y, R0 store R0, x z = x * 2; load x, R0 // redundant code mul R0, #2

 Replaces last 2 phases of a compiler with direct execution  Input: ◦ Mixed: generates & uses intermediate code/abstract syntax ◦ Pure: start from stream of ASCII characters each time a statement is executed  Mixed interpreters ◦ Java, Perl, Python, Haskell, Scheme  Pure interpreters: ◦ most Basics, shell commands

 Source code: a = x + y;  Compiler-generated object code: load r0, x; add r0, y; store r0, a;  Will be executed later, with remainder of program  Interpreter: Call an interpretive routine to actually perform the operation. e.g., add(x, y, a);

 It’s not the case that the lexical analyzer identifies all the tokens, and then the parser analyzes all the tokens, and then the type/semantic analysis is performed.  Instead, parser repeatedly contacts lexer to get another token  As tokens are received they either do or don’t match the expected syntax ◦ If there’s a match, perform any type or semantic testing, possibly generate int. code, call for another token.

 Output of parser: the concrete parse tree is large – probably more than needed for next phase ◦ The compiler usually produces some more compact representation of the program  Example: Fig. 2.9 (page 46)

 The shape of the parse tree reveals the meaning of the program.  So as output of syntax analysis we want a tree that removes its inefficiency and keeps its meaning. ◦ Remove separator/punctuation terminal symbols ◦ Remove all trivial nonterminals ◦ Replace remaining nonterminals with leaf terminals  Example: Fig. 2.10

Removes unnecessary details but keeps the essential language elements; e.g., consider the following two equivalent loops: PascalC++ while j < n do beginwhile (j < n) { j := j + 1; j = j + 1; end;} Essential information: 1) it is a loop, 2) its terminating condition is j >= n, and 3) its body increments the current value of j.

Purpose: an intermediate form of the source code Generated by the parser during syntax analysis Used during type checking/semantic analysis Abstract syntax rules are defined by the compiler (or interpreter). One concrete syntax can have several abstract syntaxes associated with it, depending on the design of the translator.

 LHS = RHS  LHS names an abstract syntax class  RHS either  (1) gives a list of one or more alternatives  or  (2) lists the essential elements of the syntax class  Compare to production rules in concrete grammar

Assignment = Variable target ; Expression source Expression = Variable| Value | Binary | Unary Variable = String id Value = Integer value Binary = Operator op ; Expression term1, term2 Unary = UnaryOp op ; Expression term Operator = +| - | * | / | ! Priority? Associativity? ….

Concrete: Assignment  Identifier [ [ Expression ] ] = Expression ; Abstract: Assignment = Variable target ; Expression source

target source Binary Operator Variable Value + 2 y * x z

op term1 term2 Binary node opterm Unary node Binaries and unaries represent information that can be used for later processing

Assignment = Variable target ; Expression source Expression = VariableRef | Value | Binary | Unary VariableRef = Variable | ArrayRef Variable = String id ArrayRef = String id ; Expression index Value = IntValue | BoolValue | FloatValue | CharValue Binary = Operator op ; Expression term1, term2 Unary = UnaryOp op ; Expression term Operator = ArithmeticOp | RelationalOp | BooleanOp IntValue = Integer intValue …

abstract class Expression { } abstract class VariableRef extends Expression { } class Variable extends VariableRef { String id; } class ArrayRef extends VariableRef { String id; Expression index} class Value extends Expression { … } class Binary extends Expression { Operator op; Expression term1, term2; } class Unary extends Expression { UnaryOp op; Expression term; }

 Lexical syntax: small, simple, defines language tokens  Concrete syntax: detailed, specific, defines correct programs, used to direct parsing algorithms (language specific, not implementation specific)  Abstract syntax: simpler than concrete, used to describe the structure of the intermediate code (implementation specific, not language specific) NOT INTENDED TO BE USED FOR PARSING  Semantics: program “meaning”, or runtime behavior

 A syntax is ambiguous if a portion of a program has two or more possible interpretations (parse trees)  Non-ambiguous grammars can be written but some ambiguity may be tolerated to reduce grammar size.  Operator associativity and precedence can be defined by the concrete syntax  Compilers generate code for later execution while interpreters execute program statements as they are analyzed.