Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

ISBN Chapter 3 Describing Syntax and Semantics.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
A basis for computer theory and A means of specifying languages
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 2 A Simple Compiler
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Compilation (Chapter 3) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
PART I: overview material
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Introduction (Chapter 1) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.
Bernd Fischer RW713: Compiler and Software Language Engineering.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
CPS 506 Comparative Programming Languages Syntax Specification.
Language Translation A programming language processor is any system that manipulates programs expressed in a PL A source program in some source language.
Contextual Analysis (Chapter 5) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
A Programming Languages Syntax Analysis (1)
ISBN Chapter 3 Describing Syntax and Semantics.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
©SoftMoore ConsultingSlide 1 Context-Free Grammars.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Introduction to Parsing
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Describing Syntax and Semantics
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax Analysis Chapter 4.
Languages and Compilers (SProg og Oversættere)
CS 363 Comparative Programming Languages
Syntax Analysis Sections :.
Lexical and Syntax Analysis
Programming Language Syntax 2
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CSC 4181 Compiler Construction Context-Free Grammars
Course Overview PART I: overview material PART II: inside a compiler
Chapter 3 Describing Syntax and Semantics.
Course Overview PART I: overview material PART II: inside a compiler
COMPILER CONSTRUCTION
Presentation transcript:

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion 8Interpretation 9Review

Syntax Analysis (Chapter 4) 2 The “Phases” of a Compiler Syntax Analysis Contextual Analysis Code Generation Source Program Abstract Syntax Tree Decorated Abstract Syntax Tree Object Code Error Reports This chapter

Syntax Analysis (Chapter 4) 3 In Chapter 4 Syntax Analysis –Scanning: recognize “words” or “tokens” in the input –Parsing: recognize structure of program Different parsing strategies How to construct a recursive descent parser –AST Construction Use of theoretical “Tools”: –Regular Expressions and Finite–State Machines –Grammars –Extended BNF notation –First sets and Follow sets

Syntax Analysis (Chapter 4) 4 Syntax Analysis The “job” of syntax analysis is to read the source program (text file) and determine its structure. Subphases –Scanning –Parsing –Construct an internal representation of the source text that shows the structure (usually an AST) Note: A single-pass compiler usually does not explicitly construct an AST.

Syntax Analysis (Chapter 4) 5 Multi Pass Compiler Compiler Driver Syntactic Analyzer calls Contextual AnalyzerCode Generator calls Dependency diagram of a typical Multi Pass Compiler: A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. input Source Text output AST input output Decorated AST input output Object Code This chapter

Syntax Analysis (Chapter 4) 6 Syntax Analysis Scanner Source Program Abstract Syntax Tree Error Reports Parser Stream of “Tokens” (Stream of Characters) Error Reports Dataflow chart

Syntax Analysis (Chapter 4) 7 (1) Scan: Divide Input into Tokens An example Mini–Triangle source program: let var y: Integer in !new year y := y+1 let var ident. y scanner colon : ident. Integer in ident. y becomes :=... ident. y op. + intlit 1 eot Tokens are “words” in the input, for example keywords, operators, identifiers, literals, etc.

Syntax Analysis (Chapter 4) 8 (2) Parse: Determine structure of program Parser analyzes the structure of the token stream with respect to the grammar of the language. let var id. y col. : id. Int in id. y bec. := id. y op + intlit 1 eot Ident Op. Int.Lit V-Name Type Denoter single-Declaration Declaration primary-Exp Expression single-Command Program

Syntax Analysis (Chapter 4) 9 (3) AST Construction Program LetCommand Ident OpInt.Lit SimpleType VarDecl SimpleVar VNameExpInt.Expr SimpleVar BinaryExpr AssignCommand y Integer Ident yy+1

Syntax Analysis (Chapter 4) 10 Grammars RECAP: –The Syntax of a Language can be specified by means of a CFG (Context Free Grammar). –CFG can be expressed in BNF (Bachus-Naur Form) Example: Mini–Triangle grammar in BNF Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | begin Command end |... Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | begin Command end |...

Syntax Analysis (Chapter 4) 11 Grammars (continued) For our convenience, we will use EBNF or “Extended BNF” rather than simple BNF. EBNF = BNF + regular expressions Program ::= single-Command Command ::= (single-Command ;)* single-Command single-Command ::= V-name := Expression | begin Command end |... Program ::= single-Command Command ::= (single-Command ;)* single-Command single-Command ::= V-name := Expression | begin Command end |... Example: Mini Triangle in EBNF * means 0 or more occurrences of

Syntax Analysis (Chapter 4) 12 Regular Expressions RE are a notation for expressing a set of strings of terminal symbols. Different kinds of RE:  The empty string tGenerates only the string t X YGenerates any string xy such that x is generated by x and y is generated by Y X | YGenerates any string which generated either by X or by Y X*The concatenation of zero or more strings generated by X (X)Used for grouping

Syntax Analysis (Chapter 4) 13 RE: Examples What sets of strings do each of the following RE generate? 1.  2. M(r|s) “. ” 3. (foo|bar)* 4. (foo|bar)(foo|bar)* 5. (0|1|2|3|4|5|6|7|8|9)* 6. 0|(1|..|9)(0|1|..|9)* 1.  2. M(r|s) “. ” 3. (foo|bar)* 4. (foo|bar)(foo|bar)* 5. (0|1|2|3|4|5|6|7|8|9)* 6. 0|(1|..|9)(0|1|..|9)*

Syntax Analysis (Chapter 4) 14 Regular Expressions The “languages” that can be defined by RE and CFG have been extensively studied by theoretical computer scientists. These are some important conclusions / terminology –RE is a “weaker” formalism than CFG: Any language expressible by a RE can be expressed by CFG but not the other way around! –The languages expressible as RE are called regular languages –Generally: a language that exhibits “self–embedding” cannot be expressed by RE. –Programming languages exhibit self–embedding. (Examples: an expression can contain another expression, and a command can contain another command).

Syntax Analysis (Chapter 4) 15 Extended BNF Extended BNF combines BNF with RE A production in EBNF looks like LHS ::= RHS where LHS is a non terminal symbol and RHS is an extended regular expression An extended RE is just like a regular expression except it is composed of terminals and non–terminals of the grammar. Simply put, EBNF adds to BNF these notations –(...) for the purpose of grouping and –* for denoting “0 or more repetitions of … ”

Syntax Analysis (Chapter 4) 16 Extended BNF: an Example Expression ::= PrimaryExp (Operator PrimaryExp)* PrimaryExpression ::= Literal | Identifier | ( Expression ) Identifier ::= Letter (Letter|Digit)* Literal ::= Digit Digit* Letter ::= a | b | c |... |z Digit ::= 0 | 1 | 2 | 3 | 4 |... | 9 Expression ::= PrimaryExp (Operator PrimaryExp)* PrimaryExpression ::= Literal | Identifier | ( Expression ) Identifier ::= Letter (Letter|Digit)* Literal ::= Digit Digit* Letter ::= a | b | c |... |z Digit ::= 0 | 1 | 2 | 3 | 4 |... | 9 Example: a simple expression language

Syntax Analysis (Chapter 4) 17 A little bit of useful theory We will now look at a few useful bits of theory. These will be necessary later when we implement parsers. –Grammar transformations A grammar can be transformed in a number of ways without changing its meaning (i.e. its language, or the set of strings that it generates) –The definition and computation of starter sets (first sets), follow sets, and nullable symbols

Syntax Analysis (Chapter 4) 18 Grammar Transformations Left factorization single-Command ::= V-name := Expression | if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= V-name := Expression | if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= V-name := Expression | if Expression then single-Command (  | else single-Command) single-Command ::= V-name := Expression | if Expression then single-Command (  | else single-Command) X Y | X ZX Y | X Z X ( Y | Z ) Example: X Y=  Z

Syntax Analysis (Chapter 4) 19 Grammar Transformations (continued) Elimination of Left Recursion N ::= X | N Y Identifier ::= Letter | Identifier Letter | Identifier Digit Identifier ::= Letter | Identifier Letter | Identifier Digit N ::= X Y * Example: Identifier ::= Letter | Identifier (Letter|Digit) Identifier ::= Letter | Identifier (Letter|Digit) Identifier ::= Letter (Letter|Digit)*

Syntax Analysis (Chapter 4) 20 Grammar Transformations (continued) Substitution of non-terminal symbols N ::= X M ::=  N  single-Command ::= for controlVar := Expression direction Expression do single-Command direction ::= to | downto single-Command ::= for controlVar := Expression direction Expression do single-Command direction ::= to | downto Example: N ::= X M ::=  X  single-Command ::= for controlVar := Expression (to|downto) Expression do single-Command single-Command ::= for controlVar := Expression (to|downto) Expression do single-Command

Syntax Analysis (Chapter 4) 21 Starter Sets (a.k.a. First Sets) Informal Definition: The starter set of a RE X is the set of terminal symbols that can occur as the start of any string generated by X Example : starters[ ( “ + ” | - |  ) (0 | 1 | … | 9) + ] = { +, -, 0, 1, …, 9 } Formal Definition: starters[  ={ } starters[t  ={t} (where t is any terminal symbol) starters[X Y] = starters[X] (if X doesn’t generate  ) starters[X Y  = starters[X  starters[Y  if X generates  ) starters[X | Y  = starters[X  starters[Y  starters[X*  = starters[X 

Syntax Analysis (Chapter 4) 22 Derivations Replacing a non-terminal S ::= E E ::= T | E + T T ::= i | ( E ) S ::= E E ::= T | E + T T ::= i | ( E ) S S S => E S => E => E + T S => E => E + T => T + T S => E => E + T => T + T => i + T S => E => E + T => T + T => i + T => i + i This is a left-most derivation (it replaces the left-most non-terminal at each step. Can you find the corresponding right-most derivation? Can you find a derivation that is neither left-most nor right-most? This is a left-most derivation (it replaces the left-most non-terminal at each step. Can you find the corresponding right-most derivation? Can you find a derivation that is neither left-most nor right-most?

Syntax Analysis (Chapter 4) 23 Sentential forms A sequence of grammar symbols that can be derived from the start symbol A sentence is a sentential form that contains only terminal symbols, that is, a string that can be generated using the grammar. S => E => E + T => T + T => i + T => i + i

Syntax Analysis (Chapter 4) 24 Ambiguous grammars A grammar is ambiguous if some sentence has more than one distinct parse tree. Equivalently, a grammar is ambiguous if some sentence has more than one left-most derivation, or more than one right-most derivation. S ::= E E ::= i | ( E ) | E + E S ::= E E ::= i | ( E ) | E + E Does i + i demonstrate the ambiguity? Does i + i demonstrate the ambiguity? E => E + E => i + E => i + i Does i + i + i demonstrate the ambiguity? Does i + i + i demonstrate an ambiguity? E => E + E => i + E => i + E + E => i + i + E => i + i + i E => E + E => E + E + E => i + E + E => i + i + E => i + i + i Does i + i + i demonstrate an ambiguity? E => E + E => i + E => i + E + E => i + i + E => i + i + i E => E + E => E + E + E => i + E + E => i + i + E => i + i + i

Syntax Analysis (Chapter 4) 25 Augmented grammars We augment grammars to ensure that we can recognize and handle the end of the input string S ::= E E ::= i | ( E ) | E + E S ::= E E ::= i | ( E ) | E + E S ’ ::= S $ S ::= E E ::= i | ( E ) | E + E S ’ ::= S $ S ::= E E ::= i | ( E ) | E + E Here $ denotes the end-of-file token

Syntax Analysis (Chapter 4) 26 Nullable, First sets (starter sets), and Follow sets A non-terminal is nullable if it derives the empty string First(N) or starters(N) is the set of all terminals that can begin a sentence derived from N Follow(N) is the set of terminals that can follow N in some sentential form Next we will see algorithms to compute each of these.