CSE 3302 Programming Languages

Slides:



Advertisements
Similar presentations
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Advertisements

Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
ISBN Chapter 3 Describing Syntax and Semantics.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Chapter 3: Formal Translation Models
Chapter 4 - Syntax Programming Languages:
COP4020 Programming Languages
S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Software II: Principles of Programming Languages
Syntax and Backus Naur Form
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Programming Languages Third Edition Chapter 6 Syntax.
ProgrammingLanguages Programming Languages Language Syntax This lecture introduces the the lexical structure of programming languages; the context-free.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CSE 3302 Programming Languages
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
PROGRAMMING LANGUAGES
A Simple Syntax-Directed Translator
CS 326 Programming Languages, Concepts and Implementation
Introduction to Parsing
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Chapter 3 – Describing Syntax
Concepts of Programming Languages
Syntax (1).
What does it mean? Notes from Robert Sebesta Programming Languages
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
PROGRAMMING LANGUAGES
Syntax versus Semantics
CS 363 Comparative Programming Languages
CSE 3302 Programming Languages
Compiler Design 4. Language Grammars
Programming Languages 2nd edition Tucker and Noonan
COP4020 Programming Languages
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
Chapter 3 Describing Syntax and Semantics.
BNF 9-Apr-19.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Discrete Maths 13. Grammars Objectives
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Programming Languages 2nd edition Tucker and Noonan
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
COMPILER CONSTRUCTION
Presentation transcript:

CSE 3302 Programming Languages Syntax Chengkai Li, Weimin He Spring 2008 Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Phases of Compilation [Programming Language Pragmatics, by Michael Scott] Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Syntax and Semantics Defining a programming language: Specifications of syntax Syntax – structure (form) of programs (the form a program in the language must take). Specifications of semantics Semantics - the meaning of programs Precise definition, without ambiguity Given a program, there is only one unique interpretation. Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Purpose of Describing Syntax and Semantics For language designers: Convey the design principles of the language For language implementers: Define precisely what to be implemented For language programmers: Describe the language that is to be used How to describe? Natural language: ambiguous Formal ways: especially for syntax Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Semantics “Colorless green ideas sleep furiously” https://en.wikipedia.org/wiki/Colorless_green_ideas_sleep_furiously Chomsky created this as an example of a syntactically correct sentence that lacks any semantic value or quality other than being nonsensical. Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Scanning and Parsing Lexical Structure: The structure of tokens (words) scanning phase (lexical analysis) : scanner/lexer recognize tokens from characters Syntactical Structure: The structure of programs parsing phase (syntax analysis) : parser determines the syntactic structure Eg identifier is a token that can have lexemes such a s sum and total Examples of tokens and lexemes on p 106 of statement (on previous page) index = 2 * count + 17; character stream scanner (lexical analysis) token stream parser (syntax analysis) parse tree Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Tokens Tokens (words): Building blocks of programs Reserved words (keywords): e.g., if,while,int,return Literals/constants: numeric literal: 42 string literal: "hello" Special symbols: e.g., “;”, “<=”, “+” Identifiers: e.g., x24, monthly_balance, putchar Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

JSON Example http://json.org/example.html {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }} Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Syntax example – JSON JavaScript Object Notation (http://www.json.org/) Object Array Value String Number Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Syntax Example - JSON An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma). An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma). Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Syntax Example - JSON A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested. Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Syntax Example - JSON A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string. Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Syntax Example - JSON A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used. Whitespace can be inserted between any pair of tokens. Excepting a few encoding details, that completely describes the language. Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Reserved words vs. Predefined identifiers cannot be redefined. e.g., double if; is illegal. Predefined identifiers: have initial meaning allow redefinition (not a good idea in practice) e.g., String,Object,System,Integer in Java [Demonstration] Use the Keyword example in Eclipse Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Principle of Longest Substring doif vs. do if; x12 vs. x 12 The longest possible string of characters is collected into a single token. An exception: FORTRAN DO 99 I = 1.10 (the same as DO99I=1.10) DO 99 I = 1, 10 Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

White Space Principle of longest substring requires that tokens are separated by white space. White space (token delimiters): Blanks, newlines, tabs ignored except that they separate tokens Free-format language: format has no effect on the program structure Most languages are free format One exception: python Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Indentation in Python #error: first line indented def perm(l): for i in range(len(l)): s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) for x in p: r.append(l[i:i+1] + x) return r #error: not indented #error: unexpected indent #error: inconsistent dedent def perm(l): for i in range(len(l)): s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) for x in p: r.append(l[i:i+1] + x) return r Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Regular Expression A form for representing sets of strings Description of patterns of characters Basic operations: Concatenation Repetition Selection Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Regular Expression Name RE epsilon  symbol a concatenation AB selection A | B repetition A* Shortcuts: A+ = AA* A? = A| [a-z] = (a|b|...|z) [a-z][a-z0-9]* (a|b)*aa(a|b)* [0-9]+(\.[0-9]+)? epsilon represents an empty string This is not comprehensive, refer to your RegEx tool’s manual Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Tasks of a Scanner Recognizes keywords (tokens) Recognizes special characters Recognizes identifiers, integers, reals, decimals, strings, etc. Ignores white spaces and comments Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Scanner regular expression description of the tokens → (Lex or JLex) scanner of a language Example: Figure 4.1 (page 82) Lecture 3 - Syntax, Spring 2017 21 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He 2008

Scanning and Parsing Lexical Structure: The structure of tokens (words) Syntactical Structure: The structure of programs regular expression grammar Eg identifier is a token that can have lexemes such a s sum and total Examples of tokens and lexemes on p 106 of statement (on previous page) index = 2 * count + 17; character stream scanner (lexical analysis) token stream parser (syntax analysis) parse tree Lecture 3 - Syntax, Spring 2017 22 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He 2008

Grammar Example: (1) sentence ® noun-phrase verb-phrase . (2) noun-phrase ® article noun (3) article ® a | the (4) noun ® girl | dog (5) verb-phrase ® verb noun-phrase (6) verb ® sees | pets Figure 4.2 (page 83) Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Grammar Language: the programs (character streams) allowed Grammar rules (productions): "produce" the language left-hand side, right-hand side nonterminals (structured names): noun-phrase verb-phrase terminals (tokens): . dog metasymbols: ® (“consists of”) | (choice) start symbol: the nonterminal that stands for the entire structure (sentence, program). sentence E.g., if-statement ® if (expression) statement else statement Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Grammars Produce Languages Language: the set of strings (of terminals) that can be generated from the start symbol by derivation: sentence  noun-phrase verb-phrase . (rule 1)  article noun verb-phrase . (rule 2)  the noun verb-phrase . (rule 3)  the girl verb-phrase . (rule 4)  the girl verb noun-phrase . (rule 5)  the girl sees noun-phrase . (rule 6)  the girl sees article noun . (rule 2) the girl sees a noun . (rule 3) the girl sees a dog . (rule 4) Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Context-Free Grammar Context-Free Grammars (CFG) Noam Chomsky, 1950s. Define context-free languages. Four components: terminals, nonterminals, one start symbol, productions (left-hand side: one single nonterminal) Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

What does “Context-Free” mean? Left-hand side of a production is always one single nonterminal: The nonterminal is replaced by the corresponding right-hand side, no matter where the nonterminal appears. (i.e., there is no context in such replacement/derivation.) Context-sensitive grammar (context-sensitive languages) Why context-free? English example of context: Grace invited Ada over for dinner, she said thanks. Lecture 3 - Syntax, Spring 2017 27 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, 2007

Backus-Naur Form(BNF) A meta language used to describe CFG John Backus/Peter Naur: for describing the syntax of Algol60. BNF is formal and precise Lecture 3 - Syntax, Spring 2017 28 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

BNF Example 1 E → ID | NUM | E*E | E/E | E+E | E-E | (E) ID → a | b |…|z NUM → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Lecture 3 - Syntax, Spring 2017 29 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

BNF Example 2 S → if E then S else S | begin S L | print E L → end E → NUM = NUM NUM → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Lecture 3 - Syntax, Spring 2017 30 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Parse Tree Represents the derivation steps from start symbol to the string Given the derivations used in the parsing of an input sequence, a parse tree has the start symbol as the root the terminals of the input sequence as leafs for each production A → X1 X2 ... Xn used in a derivation, a node A with children X1 X2 ... Xn Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Parse Tree Example 1 CFG: expr number Input Sequence: 234 number digit expr → expr + expr | expr * expr | (expr) | number number → number digit |digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr number Input Sequence: 234 number digit number digit 4 digit 3 2 Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Parse Tree Example 2 CFG: expr number digit 3 * 4 5 + expr → expr + expr | expr * expr | (expr) | number number → number digit |digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr number digit 3 * 4 5 + Input Sequence: 3+4*5 Lecture 3 - Syntax, Spring 2017 33 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

Abstract Syntax Tree Parse Tree: still tedious, all terminals and nonterminals in a derivation are included in the tree. Abstract Syntax Tree: Remove ``unnecessary’’ terminals and nonterminals Still completely determine syntactic structure. number digit 2 3 4 4 3 2 Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

AST Example 1 expr number digit 3 * 4 5 + 3 * 4 5 + Lecture 3 - Syntax, Spring 2017 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008

AST Example 2 If-statement if ( expr ) statement else If-statement Lecture 3 - Syntax, Spring 2017 36 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008