Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Functional Design and Programming Lecture 10: Regular expressions and finite state machines.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Context-Free Grammars Lecture 7
Parsing III (Eliminating left recursion, recursive descent parsing)
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Terminology l Statement ( 敘述 ) »declaration, assignment containing expression ( 運算式 ) l Grammar ( 文法 ) »a set of rules specify the form of legal statements.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Chapter 2 A Simple Compiler
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Syntax and Semantics Structure of programming languages.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
1 Week 3 Questions / Concerns What’s due: Lab1b due Friday at midnight Lab1b check-off next week (schedule will be announced on Monday) Homework #2 due.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Intro to Lexing & Parsing CS 153. Two pieces conceptually: – Recognizing syntactically valid phrases. – Extracting semantic content from the syntax. E.g.,
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
CS3230R. What is a parser? What is an LR parser? A bottom-up parser that efficiently handles deterministic context-free languages in guaranteed linear.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Parsing III (Top-down parsing: recursive descent & LL(1) )
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Parsing 2 of 4: Scanner and Parsing
A Simple Syntax-Directed Translator
Programming Languages Translator
PROGRAMMING LANGUAGES
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 8: Top-Down Parsing
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Functional Design and Programming Lecture 9: Lexical analysis and parsing

Literature  Paulson, chap. 9: Lexical analysis (9.1) Functional parsing ( )

Exercises  Paulson, chap. 9: , 9.8  Write a parser for XML elements (see home page).

Parsing/Unparsing  Purpose: Encoding/decoding structured data into flat (string) representations  Reasons: Data read (and written) using operating system routines (“read 25 bytes from file XYZ”). Need for universal format for all kinds of data; e.g., to allow editing with text editor.

Language processor architecture scanner parser transformer(s) unparser character stream token stream abstract syntax tree character stream “ My title ” [LANGLE, ID “H1”, RANGLE, ID “ My title”, LSLASH, ID “ H1”, RANGLE] element stag contents etag “H1”“ My title”“H1” “ MY TITLE ”

Lexical analysis (Scanning, lexing, tokenizing)  Purpose: Turning a character stream into a stream of tokens.  Reasons: Making parsing easier by taking care of ‘low-level’ concerns such as eliminating whitespace. Efficient preprocessing and compression of input to parser. Unbounded lookahead into input stream (in contrast to most parsers) Well-founded theoretical basis and tool support (regular expressions and finite state machines).

Context-free Grammars (CFGs)  A context-free grammar G describes a language (set of strings)  G = (T, N, P, S) where T: set of terminal symbols N: set of nonterminal symbols P: set of productions S: start symbol (a particular nonterminal symbol)

CFGs: Example T = { +, -, *, /, (, ), Var, Const } N = { Exp, Term, Factor } S = Exp Exp ::= Exp + Term | Exp - Term | Term Term :: = Term * Factor | Term / Factor | Factor Factor ::= Var | Const | ( Exp )

[Var, +, Var, /, Const, -, Var, *, Var] CFG’s: Example... “x + y / 15 - x * x” Factor Term Factor Term Exp Factor Term Exp

Parsing  Purpose: Turning a stream of tokens into a tree structure expressed by grammar  Reasons: Checking that input is well-formed (according to given grammar) Producing parse tree or abstract syntax tree to recover tree structure in input Processing parse tree according to grammar

Parsing combinators  Idea: For each terminal or nonterminal M there is a function: f M : token list -> T * token list (= T phrase) such that f M takes elements from its argument until it has reduced the elements to M and then produces a value of type T for it.

Parsing primitives  Terminals: Var: string phrase Const: int phrase $: string -> string phrase (for keywords)

Parsing primitives...  Parsing combinators: empty: (‘a list) phrase ||: ‘a phrase * ‘a phrase -> ‘a phrase --: ‘a phrase * ‘b phrase -> (‘a * ‘b) phrase >>: ‘a phrase * (‘a -> ‘b) -> ‘b phrase  Derived combinators: repeat: ‘a phrase -> ‘a list phrase $--: ‘a phrase * ‘b phrase -> ‘b phrase --$: ‘a phrase * ‘b phrase -> ‘a phrase

Parsing precedences infix 6 $-- --$ infix 5 -- infix 3 >> infix 0 ||

Problems with combinatory parsers  Left-recursion: Problem: Left-recursive grammars make parsers go into an infinite loop. Remedy: Transform grammar to eliminate left-recursion  Mutual recursion: Problem (SML-specific!): Cannot use val -declaration and combinator applications only. Remedy: Use fun -declarations for mutually recursive parts of a grammar

Parsing problems... Example grammar is left-recursive: Exp ::= Exp ‘+’ Term | Exp ‘-’ Term | Term Term :: = Term ‘*’ Factor | Term ‘/’ Factor | Factor Factor ::= Var | Const | ‘(’ Exp ‘)’ Eliminate left-recursion: Binop1 ::= ‘+’ | ‘-’ Binop2 ::= ‘*’ | ‘/’ Factor ::= Var | Const | ‘(’ Exp ‘)’ Term ::= Factor (Binop2 Factor)* Exp ::= Term (Binop1 Term)*

Data type for abstract syntax trees type binop = string datatype expAST = EXP of termAST * (binop * termAST) list and termAST = TERM of factorAST * (binop * factorAST) list and factorAST = VAR of string | CONST of int | PARENEXP of expAST

Parser: example (first try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” val factor = Var >> VAR || Const >> CONST o Int.fromString || $”(” $-- exp --$ $”)” >> PARENEXP val term = factor -– repeat (binop2 -- factor) >> TERM val exp = term –- repeat (binop1 term) >> EXP PROBLEM: Doesn’t work! These definitions are intended to be mutually recursive, but are not!

Parser: example (second try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” fun factor toks = ( Var >> VAR || Const >> CONST || $”(” $-- exp --$ $”)” ) toks and term toks = (factor -– repeat (binop2 -- factor)) toks and exp toks = (term -– repeat (binop1 term)) toks

Operator precedence parsing (overview)  When processing operator expressions, a parser has to decide whether to reduce (stop the current phrase parser and return its result) or shift (continue the current phrase parse)  Operator precedence parsing: Associate a precedence (binding strength) with each operator, remember the the precedence of the last operator processed and determine whether to reduce or shift depending on the precedence of the next operator.  See Paulson, pp

Backtracking parsing (overview)  There may be more than one of parsing an expression.  Backtracking parsing: Construct a lazy list of all possible parses of a token stream. Continue parse with first of those and find a complete parse for the whole token stream; if that fails, backtrack to second in the list and repeat.  See Paulson, pp

Recursive-descent parsing (overview)  Write one parser for each grammatical category (as in combinatory parsing)  Process token stream as in combinatory parsers, excepting alternatives.  Process alternatives as follows: Look at next token (first token of remaining token stream). Choose phrase parser on the basis of that token.

LL-parsing and LR-parsing (overview)  Use tools to generate parsers from grammar specifications.  Produces a table that guides a push-down automaton through parsing actions (“shift”, “reduce”)  LL-parsing: Predictive (basically recursive descent parsing in table-driven form)  LR-parsing (incl. SLR- and LALR-parsing): (Virtual) parallel execution of phrase parsers.  Problems: Lookahead bounded in practice, at times unwieldy.