Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.

Slides:



Advertisements
Similar presentations
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Advertisements

Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
PZ02A - Language translation
Context-Free Grammars Lecture 7
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
Winter 2003/4Pls – syntax – Catriel Beeri1 SYNTAX Syntax: form, structure The syntax of a pl: The set of its well-formed programs The rules that define.
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compiler Construction1 COMP Compiler Construction Lecturer: Dr. Arthur Cater Teaching Assistant:
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Lexical Analysis (I) Compiler Baojian Hua
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CPS 506 Comparative Programming Languages Syntax Specification.
D Goforth COSC Translating High Level Languages.
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
Chapter 3 Describing Syntax and Semantics
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Introduction to Compiling
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Comp 311 Principles of Programming Languages Lecture 4 The Scope of Variables Corky Cartwright September 3, 2008.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Comp 411 Principles of Programming Languages Lecture 3 Parsing
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Parsing 2 of 4: Scanner and Parsing
Lexical and Syntax Analysis
A Simple Syntax-Directed Translator
Programming Languages Translator
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Introduction to Parsing (adapted from CS 164 at Berkeley)
What does it mean? Notes from Robert Sebesta Programming Languages
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
PROGRAMMING LANGUAGES
Corky Cartwright January 18, 2017
Compiler Construction
Compiler Lecture 1 CS510.
CSE 3302 Programming Languages
Introduction CI612 Compiler Design CI612 Compiler Design.
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
Chapter 2: A Simple One Pass Compiler
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
High-Level Programming Language
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Faculty of Computer Science and Information System
Compiler design Review COMP 442/6421 – Compiler Design
Presentation transcript:

Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009

Syntax: The Boring Part of Programming Languages Programs are represented by sequences of symbols. These symbols are represented as sequences of characters that can be typed on a keyboard (ASCII). What about Unicode? To analyze or execute the programs written in a language, we must translate the ASCII representation for a program to a higher-level tree representation. This process, called parsing, conveniently breaks into two parts: –lexical analysis, and –context-free parsing (often simply called parsing).

Lexical Analysis Consider this sequence of characters: begin middle end What are the smallest meaningful pieces of syntax in this phrase? The process of converting a character stream into a corresponding sequence of meaningful symbols (called tokens or lexemes) is called tokenizing, lexing or lexical analysis. A program that performs this process is called a tokenizer, lexer, or scanner. In Scheme, we tokenize (set! x (+ x 1)) as ( set! x ( + x 1 ) ) ‏ Similarly, in Java, we tokenize System.out.println("Hello World!"); as System. out. println ( "Hello World!" ) ;

Lexical Analysis, cont. Tokenizing is straightforward for most languages because it can be performed by a finite automaton [regular grammar] (Fortran is an exception!). –The rules governing this process are (a very boring) part of the language definition. Parsing a stream of tokens into structural description of a program (typically a tree) is harder.

Parsing Consider the Java statement: x = x + 1; where x is an int variable. The grammar for Java stipulates (among other things): –The assignment operator = may be preceded by an identifier and must be followed by an expression. –An expression may be two expressions separated by a binary operator, such as +. –An assignment expression can serve as a statement if it is followed by the terminator symbol ;. Given all of the rules of this grammar, we can deduce that the sequence of characters (tokens) x = x + 1; is a legal program statement.

Parsing Token Streams into Trees Consider the following ways to express an assignment operation: x = x + 1 x := x + 1 (set! x (+ x 1)) ‏ Which of these do you prefer? It should not matter very much. To eliminate the irrelevant syntactic details, we can create a data representation that formulates program syntax as trees. For instance, the abstract syntax for the assignment code given above could be (make-assignment )‏ or new Assignment(, )‏

A Simple Example Exp ::= Num | Var | (Exp Exp) | (lambda Var Exp) ‏ Num is the set of numeric constants (given in the lexer specification)‏ Var is the set of variable names (given in the lexer specification)‏ To represent this syntax as trees (abstract syntax) in Scheme ; exp := (make-num number) + (make-var symbol) + (make-app exp exp) + ; (make-proc symbol exp) ‏ (define-struct (num n)) (define-struct (var s)) (define-struct (app rator rand)) (define-struct (proc param body)) ;; param is a symbol not a var – app represents a function application – proc represents a function definition In Java, we represent the same data definition using the composite pattern

Top Down (Predictive) Parsing Idea: design the grammar so that we can always tell what rule to use next starting from the root of the parse tree by looking ahead some small number [k] of tokens (formalized as LL(k) parsing). Can easily be implemented by hand by writing one recursive procedure for each syntactic category (non-terminal symbol). The code in each procedure matches the token pattern of the right hand side of the rule for that procedure against the token stream. This approach to writing a parser is called recursive descent. Conceptual aid: syntax diagrams to express context free grammars. Recursive descent and syntax diagrams are discussed in next lecture.