Download presentation
Presentation is loading. Please wait.
1
Functional Design and Programming Lecture 9: Lexical analysis and parsing
2
Literature Paulson, chap. 9: Lexical analysis (9.1) Functional parsing (9.2-9.4)
3
Exercises Paulson, chap. 9: 9.1-9.2 9.3-9.6, 9.8 Write a parser for XML elements (see home page).
4
Parsing/Unparsing Purpose: Encoding/decoding structured data into flat (string) representations Reasons: Data read (and written) using operating system routines (“read 25 bytes from file XYZ”). Need for universal format for all kinds of data; e.g., to allow editing with text editor.
5
Language processor architecture scanner parser transformer(s) unparser character stream token stream abstract syntax tree character stream “ My title ” [LANGLE, ID “H1”, RANGLE, ID “ My title”, LSLASH, ID “ H1”, RANGLE] element stag contents etag “H1”“ My title”“H1” “ MY TITLE ”.......
6
Lexical analysis (Scanning, lexing, tokenizing) Purpose: Turning a character stream into a stream of tokens. Reasons: Making parsing easier by taking care of ‘low-level’ concerns such as eliminating whitespace. Efficient preprocessing and compression of input to parser. Unbounded lookahead into input stream (in contrast to most parsers) Well-founded theoretical basis and tool support (regular expressions and finite state machines).
7
Context-free Grammars (CFGs) A context-free grammar G describes a language (set of strings) G = (T, N, P, S) where T: set of terminal symbols N: set of nonterminal symbols P: set of productions S: start symbol (a particular nonterminal symbol)
8
CFGs: Example T = { +, -, *, /, (, ), Var, Const } N = { Exp, Term, Factor } S = Exp Exp ::= Exp + Term | Exp - Term | Term Term :: = Term * Factor | Term / Factor | Factor Factor ::= Var | Const | ( Exp )
9
[Var, +, Var, /, Const, -, Var, *, Var] CFG’s: Example... “x + y / 15 - x * x” Factor Term Factor Term Exp Factor Term Exp
10
Parsing Purpose: Turning a stream of tokens into a tree structure expressed by grammar Reasons: Checking that input is well-formed (according to given grammar) Producing parse tree or abstract syntax tree to recover tree structure in input Processing parse tree according to grammar
11
Parsing combinators Idea: For each terminal or nonterminal M there is a function: f M : token list -> T * token list (= T phrase) such that f M takes elements from its argument until it has reduced the elements to M and then produces a value of type T for it.
12
Parsing primitives Terminals: Var: string phrase Const: int phrase $: string -> string phrase (for keywords)
13
Parsing primitives... Parsing combinators: empty: (‘a list) phrase ||: ‘a phrase * ‘a phrase -> ‘a phrase --: ‘a phrase * ‘b phrase -> (‘a * ‘b) phrase >>: ‘a phrase * (‘a -> ‘b) -> ‘b phrase Derived combinators: repeat: ‘a phrase -> ‘a list phrase $--: ‘a phrase * ‘b phrase -> ‘b phrase --$: ‘a phrase * ‘b phrase -> ‘a phrase
14
Parsing precedences infix 6 $-- --$ infix 5 -- infix 3 >> infix 0 ||
15
Problems with combinatory parsers Left-recursion: Problem: Left-recursive grammars make parsers go into an infinite loop. Remedy: Transform grammar to eliminate left-recursion Mutual recursion: Problem (SML-specific!): Cannot use val -declaration and combinator applications only. Remedy: Use fun -declarations for mutually recursive parts of a grammar
16
Parsing problems... Example grammar is left-recursive: Exp ::= Exp ‘+’ Term | Exp ‘-’ Term | Term Term :: = Term ‘*’ Factor | Term ‘/’ Factor | Factor Factor ::= Var | Const | ‘(’ Exp ‘)’ Eliminate left-recursion: Binop1 ::= ‘+’ | ‘-’ Binop2 ::= ‘*’ | ‘/’ Factor ::= Var | Const | ‘(’ Exp ‘)’ Term ::= Factor (Binop2 Factor)* Exp ::= Term (Binop1 Term)*
17
Data type for abstract syntax trees type binop = string datatype expAST = EXP of termAST * (binop * termAST) list and termAST = TERM of factorAST * (binop * factorAST) list and factorAST = VAR of string | CONST of int | PARENEXP of expAST
18
Parser: example (first try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” val factor = Var >> VAR || Const >> CONST o Int.fromString || $”(” $-- exp --$ $”)” >> PARENEXP val term = factor -– repeat (binop2 -- factor) >> TERM val exp = term –- repeat (binop1 term) >> EXP PROBLEM: Doesn’t work! These definitions are intended to be mutually recursive, but are not!
19
Parser: example (second try) val binop1 = $”+” || $”-” val binop2 = $”*” | $”/” fun factor toks = ( Var >> VAR || Const >> CONST || $”(” $-- exp --$ $”)” ) toks and term toks = (factor -– repeat (binop2 -- factor)) toks and exp toks = (term -– repeat (binop1 term)) toks
20
Operator precedence parsing (overview) When processing operator expressions, a parser has to decide whether to reduce (stop the current phrase parser and return its result) or shift (continue the current phrase parse) Operator precedence parsing: Associate a precedence (binding strength) with each operator, remember the the precedence of the last operator processed and determine whether to reduce or shift depending on the precedence of the next operator. See Paulson, pp. 364-366
21
Backtracking parsing (overview) There may be more than one of parsing an expression. Backtracking parsing: Construct a lazy list of all possible parses of a token stream. Continue parse with first of those and find a complete parse for the whole token stream; if that fails, backtrack to second in the list and repeat. See Paulson, pp. 366-367
22
Recursive-descent parsing (overview) Write one parser for each grammatical category (as in combinatory parsing) Process token stream as in combinatory parsers, excepting alternatives. Process alternatives as follows: Look at next token (first token of remaining token stream). Choose phrase parser on the basis of that token.
23
LL-parsing and LR-parsing (overview) Use tools to generate parsers from grammar specifications. Produces a table that guides a push-down automaton through parsing actions (“shift”, “reduce”) LL-parsing: Predictive (basically recursive descent parsing in table-driven form) LR-parsing (incl. SLR- and LALR-parsing): (Virtual) parallel execution of phrase parsers. Problems: Lookahead bounded in practice, at times unwieldy.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.