Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)

Slides:



Advertisements
Similar presentations
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Advertisements

Abstract Syntax Mooly Sagiv html:// 1.
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Recap Roman Manevich Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Mooly Sagiv and Roman Manevich School of Computer Science
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Context-Free Grammars Lecture 7
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Compiler Construction Parsing II Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
1 Abstract Syntax Tree--motivation The parse tree –contains too much detail e.g. unnecessary terminals such as parentheses –depends heavily on the structure.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
PART I: overview material
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CPS 506 Comparative Programming Languages Syntax Specification.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.
Mooly Sagiv and Roman Manevich School of Computer Science
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Chapter 3 – Describing Syntax
Programming Languages Translator
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Textbook:Modern Compiler Design
Syntax Analysis Chapter 4.
Lecture 3: Syntax Analysis: Top-Down parsing Noam Rinetzky
Bottom-Up Syntax Analysis
4 (c) parsing.
Fall Compiler Principles Lecture 4: Parsing part 3
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
Fall Compiler Principles Lecture 4: Parsing part 3
CSC 4181 Compiler Construction Context-Free Grammars
Faculty of Computer Science and Information System
Presentation transcript:

Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)

A motivating example Create a desk calculator Challenges –Non trivial syntax –Recursive expressions (semantics) Operator precedence

import java_cup.runtime.*; % %cup %eofval{ return sym.EOF; %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Solution (lexical analysis) Parser gets terminals from the Lexer

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; Precedence left UMINUS; % expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 %prec UMINUS {: RESULT = new Integer(0 - e1.intValue(); :} | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :}

Solution (syntax analysis) // input * 3 calc <input 22

Subjects The task of syntax analysis Automatic generation Error handling Context Free Grammars Ambiguous Grammars Top-Down vs. Bottom-Up parsing Bottom-up Parsing (next lesson)

Basic Compiler Phases Source program (string) Fin. Assembly lexical analysis syntax analysis semantic analysis Tokens Abstract syntax tree Front-End Back-End

input –Sequence of tokens output –Abstract Syntax Tree Report syntax errors unbalanced parenthesizes [Create “symbol-table” ] [Create pretty-printed version of the program] In some cases the tree need not be generated (one-pass compilers) Syntax Analysis (Parsing)

Report and locate the error Diagnose the error Correct the error Recover from the error in order to discover more errors –without reporting too many “strange” errors Handling Syntax Errors

Example a := a * ( b + c * d ;

The Valid Prefix Property For every prefix tokens –t 1, t 2, …, t i that the parser identifies as legal: there exists tokens t i+1, t i+2, …, t n such that t 1, t 2, …, t n is a syntactically valid program If every token is considered as single character: –For every prefix word u that the parser identifies as legal: there exists w such that –u.w is a valid program

Error Diagnosis Line number –may be far from the actual error The current token The expected tokens Parser configuration

Error Recovery Becomes less important in interactive environments Example heuristics: –Search for a semi-column and ignore the statement –Try to “replace” tokens for common errors – Refrain from reporting 3 subsequent errors Globally optimal solutions –For every input w, find a valid program w’ with a “minimal-distance” from w

Why use context free grammars for defining PL syntax? Captures program structure (hierarchy) Employ formal theory results Automatically create “efficient” parsers

Context Free Grammar (Review) What is a grammar Derivations and Parsing Trees Ambiguous grammars Resolving ambiguity

Context Free Grammars Non-terminals –Start non-terminal Terminals (tokens) Context Free Rules  Symbol Symbol … Symbol

Example Context Free Grammar 1 S  S ; S 2 S  id := E 3 S  print (L) 4 E  id 5 E  num 6 E  E + E 7 E  (S, E) 8 L  E 9 L  L, E

Derivations Show that a sentence is in the grammar (valid program) –Start with the start symbol –Repeatedly replace one of the non-terminals by a right-hand side of a production –Stop when the sentence contains terminals only Rightmost derivation Leftmost derivation

Example Derivations 1 S  S ; S 2 S  id := E 3 S  print (L) 4 E  id 5 E  num 6 E  E + E 7 E  (S, E) 8 L  E 9 L  L, E S S ; S S ; id := E id := E ; id := E id := num ; id := E id := num ; id := E + E id := num ; id := E + num id := num ; id :=num + num a56b7716

Parse Trees The trace of a derivation Every internal node is labeled by a non-terminal Each symbol is connected to the deriving non- terminal

Example Parse Tree S S ; S S ; id := E id := E ; id := E id := num ; id := E id := num ; id := E + E id := num ; id := E + num id := num ; id :=num + num ; ss s id:=Eid:=E id:=EE+E num

Ambiguous Grammars Two leftmost derivations Two rightmost derivations Two parse trees

A Grammar for Arithmetic Expressions 1 E  E + E 2 E  E * E 3 E  id 4 E  (E)

Drawbacks of Ambiguous Grammars Ambiguous semantics Parsing complexity May affect other phases

Non Ambiguous Grammar for Arithmetic Expressions Ambiguous grammar 1E  E + T 2E  T 3T  T * F 4T  F 5 F  id 6 F  (E) 1 E  E + E 2 E  E * E 3 E  id 4 E  (E)

Non Ambiguous Grammars for Arithmetic Expressions Ambiguous grammar 1E  E + T 2E  T 3T  T * F 4T  F 5 F  id 6 F  (E) 1 E  E + E 2 E  E * E 3 E  id 4 E  (E) 1E  E * T 2E  T 3T  F + T 4T  F 5 F  id 6 F  (E)

Efficient Parsers Pushdown automata Deterministic Report an error as soon as the input is not a prefix of a valid program Not usable for all context free grammars cup context free grammar tokens parser “Ambiguity errors” parse tree

Designing a parser language design context-free grammar design Cup parser (Java program)

Kinds of Parsers Top-Down (Predictive Parsing) LL –Construct parse tree in a top-down matter –Find the leftmost derivation –For every non-terminal and token predict the next production –Preorder tree traversal Bottom-Up LR –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order –For every potential right hand side and token decide when a production is found –Postorder tree traversal

Top-Down Parsing 1 t 1 t 2 input

Bottom-Up Parsing t 1 t 2 t 4 t 5 t 6 t 7 t 8 input 1 2 3

Example Grammar for Predictive LL Top- Down Parsing expression  digit | ‘(‘ expression operator expression ‘)’ operator  ‘+’ | ‘*’ digit  ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’

static int Parse_Expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ‘(‘) { expr->type=‘P’; get_next_token(); if (!Parse_Expression(&expr->left)) Error(“missing expression”); if (!Parse_Operator(&expr->oper)) Error(“missing operator”); if (Token.class != ‘)’) Error(“missing )”); get_next_token(); return 1; } return 0; }

Parsing Expressions Try every alternative production –For P  A 1 A 2 … A n | B 1 B 2 … B m –If A 1 succeeds Call A 2 If A 2 succeeds –Call A 3 If A 2 fails report an error –Otherwise try B 1 Recursive descent parsing Can be applied for certain grammars Generalization: LL1 parsing

int P(...) { /* try parse the alternative P  A 1 A 2... A n */ if (A 1 (...)) { if (!A 2 ()) Error(“Missing A 2 ”); if (!A 3 ()) Error(“Missing A 3 ”);.. if (!A n ()) Error(Missing A n ”); return 1; } /* try parse the alternative P  B 1 B 2... B m */ if (B 1 (...)) { if (!B 2 ()) Error(“Missing B 2 ”); if (!B 3 ()) Error(“Missing B 3 ”);.. if (!B m ()) Error(Missing B m ”); return 1; } return 0;

Predictive Parser for Arithmetic Expressions Grammar C-code? 1E  E + T 2E  T 3T  T * F 4T  F 5 F  id 6 F  (E)

Summary Context free grammars provide a natural way to define the syntax of programming languages Ambiguity may be resolved Predictive parsing is natural –Good error messages –Natural error recovery –But not expressive enough But LR bottom-up parsing is more expressible