Structure of programming languages

Slides:

Advertisements

Similar presentations

Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.

Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.

Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.

ISBN Chapter 3 Describing Syntax and Semantics.

Chapter 3 Describing Syntax and Semantics Sections 1-3.

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.

Programming Languages 2nd edition Tucker and Noonan

Chapter 3 Describing Syntax and Semantics Sections 1-3.

Slide 1 Chapter 2-b Syntax, Semantics. Slide 2 Syntax, Semantics - Definition The syntax of a programming language is the form of its expressions, statements.

Chapter 3 Describing Syntax and Semantics Sections 1-3.

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 2: Syntax Fall 2009 Marco.

Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.

ISBN Chapter 3 Describing Syntax and Semantics.

S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.

Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.

Describing Syntax and Semantics

1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker.

Syntax – Intro and Overview CS331. Syntax Syntax defines what is grammatically valid in a programming language –Set of grammatical rules –E.g. in English,

CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.

1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.

Context-Free Grammars

CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.

Grammars CPSC 5135.

3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.

C H A P T E R TWO Syntax and Semantic.

ISBN Chapter 3 Describing Syntax and Semantics.

TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.

1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.

Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.

CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.

CPS 506 Comparative Programming Languages Syntax Specification.

CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2012 A language that is simple to parse for the compiler.

Syntax and Semantics Structure of programming languages.

Chapter 3 Describing Syntax and Semantics

ISBN Chapter 3 Describing Syntax and Semantics.

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.

C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.

1 CS Programming Languages Class 04 September 5, 2000.

Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.

Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:

Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.

Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.

Chapter 3: Describing Syntax and Semantics

Chapter 3 – Describing Syntax

Describing Syntax and Semantics

CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2012 A language that is simple to parse for the compiler.

Describing Syntax and Semantics

Chapter 3 Context-Free Grammar and Parsing

Programming Languages

Chapter 3 – Describing Syntax

Context-Free Grammars

Context-Free Grammars

Programming Languages 2nd edition Tucker and Noonan

Context-Free Grammars

Programming Languages 2nd edition Tucker and Noonan

CSC 4181Compiler Construction Context-Free Grammars

R.Rajkumar Asst.Professor CSE

CSC 4181 Compiler Construction Context-Free Grammars

Context-Free Grammars

Chapter 3 Describing Syntax and Semantics.

Context-Free Grammars

Programming Languages 2nd edition Tucker and Noonan

COMPILER CONSTRUCTION

Presentation transcript:

Structure of programming languages Syntax and Semantics

Introduction Syntax: the form or structure of the expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units

Introduction Syntax and semantics provide a language’s definition Users of a language definition Other language designers Implementers Programmers (the users of the language)

Syntax of a Languages Motivation for using a subset of C: Grammar Language (pages) Reference Pascal 5 Jensen & Wirth C 6 Kernighan & Richie C++ 22 Stroustrup Java 14 Gosling, et. al. The grammar fits on one page, so it’s a far better tool for studying language design.

C Grammar: Statements Program  int main ( ) { Declarations Statements } Declarations  { Declaration } Declaration  Type Identifier [ [ Integer ] ] { , Identifier [ [ Integer ] ] } ; Type  int | bool | float | char Statements  { Statement } Statement  ; | Block | Assignment | IfStatement | WhileStatement Block  { Statements } Assignment  Identifier [ [ Expression ] ] = Expression ; IfStatement  if ( Expression ) Statement [ else Statement ] WhileStatement  while ( Expression ) Statement

C Grammar: Expressions Expression  Conjunction { || Conjunction } Conjunction  Equality { && Equality } Equality  Relation [ EquOp Relation ] EquOp  == | != Relation  Addition [ RelOp Addition ] RelOp  < | <= | > | >= Addition  Term { AddOp Term } AddOp  + | - Term  Factor { MulOp Factor } MulOp  * | / | % Factor  [ UnaryOp ] Primary UnaryOp  - | ! Primary  Identifier [ [ Expression ] ] | Literal | ( Expression ) | Type ( Expression )

Describing Syntax: Terminology A sentence is a string of characters over some alphabet A language is a set of sentences A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) A token is a category of lexemes (e.g., identifier)

Formal Definition of Languages Recognizers A recognition device reads input strings of the language and decides whether the input strings belong to the language Example: syntax analysis part of a compiler Generators A device that generates sentences of a language One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator

Lexical Analysis The process of converting a character stream into a corresponding sequence of meaningful symbols (called tokens or lexemes) is called tokenizing, lexing or lexical analysis. A program that performs this process is called a tokenizer, lexer, or scanner. In Scheme, we tokenize (set! x (+ x 1)) as ( set! x ( + x 1 ) )‏ Similarly, in Java, we tokenize System.out.println("Hello World!"); as System . out . println ( "Hello World!" ) ;

Lexical Analysis Lexical analyzer splits it into tokens Token = sequence of characters (symbolic name) representing a single terminal symbol Identifiers: myVariable … Literals: 123 5.67 true … Keywords: char sizeof … Operators: + - * / … Punctuation: ; , } { … Discards whitespace and comments

Examples of Tokens in C Tokens Lexemes identifier Age, grade,Temp, zone, q1 number 3.1416, -498127,987.76412097 string “A cat sat on a mat.”, “90183654” open parentheses ( close parentheses ) Semicolon ; reserved word if IF, if, If, iF

Examples of Tokens in C Lexical analyzer usually represents each token by a unique integer code “+” { return(PLUS); } // PLUS = 401 “-” { return(MINUS); } // MINUS = 402 “*” { return(MULT); } // MULT = 403 “/” { return(DIV); } // DIV = 404 Some tokens require regular expressions [a-zA-Z_][a-zA-Z0-9_]* { return (ID); } // identifier [1-9][0-9]* { return(DECIMALINT); } 0[0-7]* { return(OCTALINT); } (0x|0X)[0-9a-fA-F]+ { return(HEXINT); }

Reserved Keywords in C auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, wchar_t, while C++ added a bunch: bool, catch, class, dynamic_cast, inline, private, protected, public, static_cast, template, this, virtual and others Each keyword is mapped to its own token

Whitespace Whitespace is any space, tab, end-of-line character (or characters), or character sequence inside a comment No token may contain embedded whitespace (unless it is a character or string literal) Example: >= one token > = two tokens

Redefining Identifiers can be dangerous program confusing; const true = false; begin if (a<b) = true then f(a) else …

Parsing Process Call the scanner to get tokens Build a parse tree from the stream of tokens A parse tree shows the syntactic structure of the source program. Add information about identifiers in the symbol table Report error, when found, and recover from the error

Grammars A meta-language is a language used to define other languages A grammar is a meta-language used to define the syntax of a language. It consists of: Finite set of terminal symbols Finite set of non-terminal symbols Finite set of production rules Start symbol Language = (possibly infinite) set of all sequences of symbols that can be derived by applying production rules starting from the start symbol Backus-Naur Form (BNF)

Describing Syntax Backus-Naur Form and Context-Free Grammars (BNF) Most widely known method for describing programming language syntax Extended BNF Improves readability and writability of BNF

Backus-Naur Form (BNF) Invented by John Backus to describe Algol 58 BNF is equivalent to context-free grammars BNF is a meta-language used to describe another language In BNF, abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called non-terminal symbols)

Example: Binary Digits Consider the grammar: binaryDigit  0 binaryDigit  1 or equivalently: binaryDigit  0 | 1 Here, | is a metacharacter that separates alternatives.

Example: Decimal Numbers Grammar for unsigned decimal integers Terminal symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Non-terminal symbols: Digit, Integer Production rules: Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Start symbol: Integer Can derive any unsigned integer using this grammar Language = set of all unsigned decimal integers

Derivation of 352 as an Integer Production rules: Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Integer  Integer Digit  Integer 2  Integer Digit 2  Integer 5 2  Digit 5 2  3 5 2 Rightmost derivation At each step, the rightmost non-terminal is replaced

Parse Tree A labeled tree in which the interior nodes are labeled by nonterminals leaf nodes are labeled by terminals the children of an interior node represent a replacement of the associated nonterminal in a derivation corresponding to a derivation E + id E F *

Parse Tree for 352 as an Integer

Abstract Syntax Tree for If Statement ) exp true ( if else elsePart return if true st return

Arithmetic Expression Grammar The following grammar defines the language of arithmetic expressions with 1-digit integers, addition, and subtraction. Expr  Expr + Term | Expr – Term | Term Term  0 | ... | 9 | ( Expr )

Parse of the String 5-4+3

BNF Fundamentals Non-terminals: BNF abstractions Terminals: lexemes and tokens Grammar: a collection of rules Examples of BNF rules: <ident_list> → identifier | identifier, <ident_list> <if_stmt> → if <logic_expr> then <stmt>

Regular Expressions x character x \x escaped character, e.g., \n { name } reference to a name M | N M or N M N M followed by N M* 0 or more occurrences of M M+ 1 or more occurrences of M [x1 … xn] One of x1 … xn Example: [aeiou] – vowels, [0-9] - digits

An Example Grammar <program>  <stmts> <stmts>  <stmt> | <stmt> ; <stmts> <stmt>  <var> = <expr> <var>  a | b | c | d <expr>  <term> + <term> | <term> - <term> <term>  <var> | const

Derivation A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols), e.g., <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Examples S  SS | (S)S | () | S  SS  (S)SS (S)S(S)S  (S)S(())S  ((())())(()) E  E O E | (E) | id O  + | - | * | / E  E O E  (E) O E  (E O E) O E * ((E O E) O E) O E  ((id O E)) O E) O E  ((id + E)) O E) O E  ((id + id)) O E) O E  * ((id + id)) * id) + id

Leftmost Derivation Rightmost Derivation Each step of the derivation is a replacement of the leftmost nonterminals in a sentential form. E  E O E  (E) O E  (E O E) O E  (id O E) O E  (id + E) O E  (id + id) O E  (id + id) * E  (id + id) * id Each step of the derivation is a replacement of the rightmost nonterminals in a sentential form. E  E O E  E O id  E * id  (E) * id  (E O E) * id  (E O id) * id  (E + id) * id  (id + id) * id

Parse Tree A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b

CFG For Floating Point Numbers ::= stands for production rule; <…> are non-terminals; | represents alternatives for the right-hand side of a production rule Sample parse tree:

Ambiguity in Expressions Which operation is to be done first? solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. solved by associativity If an operator is right-associative (or left-associative), an operand in between 2 operators is associated to the operator to the right (left). Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z

Associativity and Precedence A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and - . Consider the grammar G1: Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 | ... | 9 | ( Expr )

Precedence (cont’d) E  E + E | E - E | F F  F * F | F / F | X X  ( E ) | id (id1 + id2) * id3 * id4 * F X + E id4 id3 ) ( id1 id2

Precedence E  E + E E  E - E E  E * E E  E / E E  id E  F F  F * F F  F / F F  id + E * id3 id2 id1 E + * F id3 id2 id1

Associativity and Precedence for Grammar Precedence Associativity Operators 3 right ** 2 left * / % \ 1 left + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level.

Associativity of Operators Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const (ambiguous) <expr> -> <expr> + const | const (unambiguous) <expr> <expr> <expr> + const <expr> + const const

Associativity Left-associative operators E  E + F | E - F | F / F X + E id3 ) ( * id1 id4 id2 Left-associative operators E  E + F | E - F | F F  F * X | F / X | X X  ( E ) | id (id1 + id2) * id3 / id4 = (((id1 + id2) * id3) / id4)

Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees

Example of Dangling Else With which ‘if’ does the following ‘else’ associate? if (x < 0) if (y < 0) y = y - 1; else y = 0; Answer: either one!

An Ambiguous Expression Grammar <expr>  <expr> <op> <expr> | const <op>  / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const

Example: Ambiguous Grammar E  E + E E  E - E E  E * E E  E / E E  id + E * id3 id2 id1 * E id2 + id1 id3

Solving the dangling else ambiguity Algol 60, C, C++: associate each else with closest if; use {} or begin…end to override. Algol 68, Modula, Ada: use explicit delimiter to end every conditional (e.g., if…fi) Java: rewrite the grammar to limit what can appear in a conditional: IfThenStatement -> if ( Expression ) Statement IfThenElseStatement -> if ( Expression ) StatementNoShortIf else Statement The category StatementNoShortIf includes all except IfThenStatement.

Ambiguity in Expressions Which operation is to be done first? solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. solved by associativity If an operator is right-associative (or left-associative), an operand in between 2 operators is associated to the operator to the right (left). Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z

An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr>  <expr> - <term> | <term> <term>  <term> / const| const <expr> <expr> - <term> <term> <term> / const const const

Extended BNF Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term> (+|-) const Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit}

BNF and EBNF BNF <expr>  <expr> + <term> EBNF <term>  <term> * <factor> | <term> / <factor> | <factor> EBNF <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>}

Extended Backus-Naur Form (EBNF) Kleene’s Star/ Kleene’s Closure Seq ::= St {; St} Seq ::= {St ;} St Optional Part IfSt ::= if ( E ) St [else St] E ::= F [+ E ] | F [- E]

Finite State Automata Set of states Usually represented as graph nodes Input alphabet + unique “end of program” symbol State transition function Usually represented as directed graph edges (arcs) Automaton is deterministic if, for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol Unique start state One or more final (accepting) states

Syntax Diagrams Graphical representation of EBNF rules Examples nonterminals: terminals: sequences and choices: Examples X ::= (E) | id Seq ::= {St ;} St E ::= F [+ E ] IfSt id ( ) E id St ; F + E

DFA for C Identifiers

Syntax Diagram for Expressions with Addition

Quiz Convert EBNF to BNF S A { b A} A  a [b] A

Quiz Prove that the grammar is ambiguous <S>  <A> <A>  <A> * <A> | <id> <id>  x | y | z