Presentation is loading. Please wait.

Presentation is loading. Please wait.

Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Similar presentations


Presentation on theme: "Functional Design and Programming Lecture 9: Lexical analysis and parsing."— Presentation transcript:

1 Functional Design and Programming Lecture 9: Lexical analysis and parsing

2 Literature  Paulson, chap. 9: Lexical analysis (9.1) Functional parsing (9.2-9.4)

3 Exercises  Paulson, chap. 9: 9.1-9.2 9.3-9.6, 9.8  Write a parser for XML elements (see home page).

4 Parsing/Unparsing  Purpose: Encoding/decoding structured data into flat (string) representations  Reasons: Data read (and written) using operating system routines (“read 25 bytes from file XYZ”). Need for universal format for all kinds of data; e.g., to allow editing with text editor.

5 Language processor architecture scanner parser transformer(s) unparser character stream token stream abstract syntax tree character stream “ My title ” [LANGLE, ID “H1”, RANGLE, ID “ My title”, LSLASH, ID “ H1”, RANGLE] element stag contents etag “H1”“ My title”“H1” “ MY TITLE ”.......

6 Lexical analysis (Scanning, lexing, tokenizing)  Purpose: Turning a character stream into a stream of tokens.  Reasons: Making parsing easier by taking care of ‘low-level’ concerns such as eliminating whitespace. Efficient preprocessing and compression of input to parser. Unbounded lookahead into input stream (in contrast to most parsers) Well-founded theoretical basis and tool support (regular expressions and finite state machines).

7 Context-free Grammars (CFGs)  A context-free grammar G describes a language (set of strings)  G = (T, N, P, S) where T: set of terminal symbols N: set of nonterminal symbols P: set of productions S: start symbol (a particular nonterminal symbol)

8 CFGs: Example T = { +, -, *, /, (, ), Var, Const } N = { Exp, Term, Factor } S = Exp Exp ::= Exp + Term | Exp - Term | Term Term :: = Term * Factor | Term / Factor | Factor Factor ::= Var | Const | ( Exp )

9 [Var, +, Var, /, Const, -, Var, *, Var] CFG’s: Example... “x + y / 15 - x * x” Factor Term Factor Term Exp Factor Term Exp

10 Parsing  Purpose: Turning a stream of tokens into a tree structure expressed by grammar  Reasons: Checking that input is well-formed (according to given grammar) Producing parse tree or abstract syntax tree to recover tree structure in input Processing parse tree according to grammar

11 Parsing combinators  Idea: For each terminal or nonterminal M there is a function: f M : token list -> T * token list (= T phrase) such that f M takes elements from its argument until it has reduced the elements to M and then produces a value of type T for it.

12 Parsing primitives  Terminals: Var: string phrase Const: int phrase $: string -> string phrase (for keywords)

13 Parsing primitives...  Parsing combinators: empty: (‘a list) phrase ||: ‘a phrase * ‘a phrase -> ‘a phrase --: ‘a phrase * ‘b phrase -> (‘a * ‘b) phrase >>: ‘a phrase * (‘a -> ‘b) -> ‘b phrase  Derived combinators: repeat: ‘a phrase -> ‘a list phrase $--: ‘a phrase * ‘b phrase -> ‘b phrase --$: ‘a phrase * ‘b phrase -> ‘a phrase

14 Parsing precedences infix 6 $-- --$ infix 5 -- infix 3 >> infix 0 ||

15 Problems with combinatory parsers  Left-recursion: Problem: Left-recursive grammars make parsers go into an infinite loop. Remedy: Transform grammar to eliminate left-recursion  Mutual recursion: Problem (SML-specific!): Cannot use val -declaration and combinator applications only. Remedy: Use fun -declarations for mutually recursive parts of a grammar

16 Parsing problems... Example grammar is left-recursive: Exp ::= Exp ‘+’ Term | Exp ‘-’ Term | Term Term :: = Term ‘*’ Factor | Term ‘/’ Factor | Factor Factor ::= Var | Const | ‘(’ Exp ‘)’ Eliminate left-recursion: Binop1 ::= ‘+’ | ‘-’ Binop2 ::= ‘*’ | ‘/’ Factor ::= Var | Const | ‘(’ Exp ‘)’ Term ::= Factor (Binop2 Factor)* Exp ::= Term (Binop1 Term)*

17 Data type for abstract syntax trees type binop = string datatype expAST = EXP of termAST * (binop * termAST) list and termAST = TERM of factorAST * (binop * factorAST) list and factorAST = VAR of string | CONST of int | PARENEXP of expAST

18 Parser: example (first try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” val factor = Var >> VAR || Const >> CONST o Int.fromString || $”(” $-- exp --$ $”)” >> PARENEXP val term = factor -– repeat (binop2 -- factor) >> TERM val exp = term –- repeat (binop1 term) >> EXP PROBLEM: Doesn’t work! These definitions are intended to be mutually recursive, but are not!

19 Parser: example (second try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” fun factor toks = ( Var >> VAR || Const >> CONST || $”(” $-- exp --$ $”)” ) toks and term toks = (factor -– repeat (binop2 -- factor)) toks and exp toks = (term -– repeat (binop1 term)) toks

20 Operator precedence parsing (overview)  When processing operator expressions, a parser has to decide whether to reduce (stop the current phrase parser and return its result) or shift (continue the current phrase parse)  Operator precedence parsing: Associate a precedence (binding strength) with each operator, remember the the precedence of the last operator processed and determine whether to reduce or shift depending on the precedence of the next operator.  See Paulson, pp. 364-366

21 Backtracking parsing (overview)  There may be more than one of parsing an expression.  Backtracking parsing: Construct a lazy list of all possible parses of a token stream. Continue parse with first of those and find a complete parse for the whole token stream; if that fails, backtrack to second in the list and repeat.  See Paulson, pp. 366-367

22 Recursive-descent parsing (overview)  Write one parser for each grammatical category (as in combinatory parsing)  Process token stream as in combinatory parsers, excepting alternatives.  Process alternatives as follows: Look at next token (first token of remaining token stream). Choose phrase parser on the basis of that token.

23 LL-parsing and LR-parsing (overview)  Use tools to generate parsers from grammar specifications.  Produces a table that guides a push-down automaton through parsing actions (“shift”, “reduce”)  LL-parsing: Predictive (basically recursive descent parsing in table-driven form)  LR-parsing (incl. SLR- and LALR-parsing): (Virtual) parallel execution of phrase parsers.  Problems: Lookahead bounded in practice, at times unwieldy.


Download ppt "Functional Design and Programming Lecture 9: Lexical analysis and parsing."

Similar presentations


Ads by Google