CSCE 314: Programming Languages Dr. Dylan Shell

Slides:



Advertisements
Similar presentations
0 LECTURE 5 LIST COMPREHENSIONS Graham Hutton University of Nottingham.
Advertisements

What is a Parser? A parser is a program that analyses a piece of text to determine its syntactic structure  3 means 23+4.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Context-Free Grammars Lecture 7
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Functional Programming guest lecture by Tim Sheard Parsing in Haskell Defining Parsing Combinators.
Advanced Functional Programming Tim Sheard 1 Lecture 14 Advanced Functional Programming Tim Sheard Oregon Graduate Institute of Science & Technology Lecture.
0 PROGRAMMING IN HASKELL Chapter 9 - Higher-Order Functions, Functional Parsers.
Advanced Programming Andrew Black and Tim Sheard Lecture 11 Parsing Combinators.
© M. Winter COSC 4P41 – Functional Programming Programming with actions Why is I/O an issue? I/O is a kind of side-effect. Example: Suppose there.
CPS 506 Comparative Programming Languages Syntax Specification.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Introduction to Parsing
CSE 3302 Programming Languages
What is a Parser? A parser is a program that analyses a piece of text to determine its syntactic structure  3 means 23+4.
Chapter 3 – Describing Syntax
Announcements Today: the Parsing Domain-specific Language
String is a synonym for the type [Char].
lec02-parserCFG May 8, 2018 Syntax Analyzer
Instructor: Laura Kallmeyer
Lexical Analysis.
Introduction to Parsing
Parsing & Context-Free Grammars
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Textbook:Modern Compiler Design
PROGRAMMING IN HASKELL
A lightening tour in 45 minutes
What does it mean? Notes from Robert Sebesta Programming Languages
Simplifications of Context-Free Grammars
Haskell.
Compiler Construction
CS 363 Comparative Programming Languages
4 (c) parsing.
Basic Program Analysis: AST
Syntax Analysis Sections :.
Lexical and Syntax Analysis
PROGRAMMING IN HASKELL
PROGRAMMING IN HASKELL
PROGRAMMING IN HASKELL
CSE401 Introduction to Compiler Construction
Chapter 2: A Simple One Pass Compiler
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
PROGRAMMING IN HASKELL
Lecture 4: Lexical Analysis & Chomsky Hierarchy
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Introduction to Parsing
Introduction to Parsing
PROGRAMMING IN HASKELL
CSCE 314: Programming Languages Dr. Dylan Shell
Higher Order Functions
PROGRAMMING IN HASKELL
Programming Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSCE 314: Programming Languages Dr. Dylan Shell
lec02-parserCFG May 27, 2019 Syntax Analyzer
PROGRAMMING IN HASKELL
Faculty of Computer Science and Information System
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

CSCE 314: Programming Languages Dr. Dylan Shell Functional Parsers

explicit in our data structure What is a Parser? A parser is a program that takes a text (set of tokens) and determines its syntactic structure. String or [Token] syntactic structure Parser Means 2∗3+4 Structure is made explicit in our data structure

The Parser Type In a functional language such as Haskell, parsers can naturally be viewed as functions. A parser is a function that takes a string and returns some form of tree. type Parser = String → Tree However, a parser might not require all of its input string, so we also return any unused input: type Parser = String → (Tree,String) A string might be parsable in many ways, including none, so we generalize to a list of results: type Parser = String → [(Tree,String)]

Furthermore, a parser might not always produce a tree, so we generalize the result to a value of any type: type Parser res = String → [(res,String)] Finally, a parser might take token streams of symbols instead of character streams: type TokenParser symb res = [symb] → [(res,[symb])] Note: For simplicity, we will only consider parsers that either fail and return the empty list of results, or succeed and return a singleton list.

Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down.

Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down.

Approach and Rationale

⟨protocol⟩::= http|https|ftp ⟨port⟩::= 0|1|2|⋯|65535 ⟨host⟩ ::= ⟨char⟩⟨char⟩* ⟨sub-domain⟩ ::= ⟨⟨char⟩*.⟩⟨sub-domain⟩|”” ⟨domain⟩::= com|edu|gov|⋯ ⟨host-domain-name⟩ ::= ⟨host⟩.⟨sub-domain⟩⟨domain⟩ ⋮

Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. We start with very simple parsers When we will glue these together in two ways: One after the other: first an X then a Y afterwards Choice: either an X or a Y. Then this process can be applied recursively. ⟨protocol⟩,⟨domain⟩, etc. ⟨sub-domain⟩⟨domain⟩ 2|25|255|2555

Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. We start with very simple parsers When we will glue these together in two ways: One after the other: first an X then a Y afterwards Choice: either an X or a Y. Then this process can be applied recursively. High Level Overview We want to be able to write a parser that closely resembles the grammatical structure the defines the language we’re parsing. The approach described next, is involved, but it allows us to do that! ⟨protocol⟩,⟨domain⟩, etc. ⟨sub-domain⟩⟨domain⟩ 2|25|255|2555

Basic Parsers (Building Blocks) The parser symbolDot consumes the first character and succeeds if it is a dot, and fails otherwise: symbolDot :: Parser Char :: String -> [(Char, String)] :: [Char] -> [(Char, [Char])] symbolDot (x:xs) | (x == ‘.’) = [(x, xs)] | otherwise = [] Example: *Main> symbolDot “.com” [(‘.’, “com”)]

Basic Parsers (Building Blocks) The parser item fails if the input is empty, and consumes the first character otherwise: item :: Parser Char :: String -> [(Char, String)] :: [Char] -> [(Char, [Char])] item = \inp -> case inp of [] -> [] (x:xs) -> [(x,xs)] Example: *Main> item "parse this" [('p',"arse this")]

The parser failure always fails: The parser return v always succeeds, returning the value v without consuming any input: return :: a -> Parser a return v = \inp -> [(v,inp)] The parser failure always fails: failure :: Parser a failure = \inp -> [] Example: *Main> Main.return 7 "parse this" [(7,"parse this")] *Main> failure "parse this" []

We can make it more explicit by letting the function parse apply a parser to a string: parse :: Parser a → String → [(a,String)] parse p inp = p inp -- essentially id function Example: *Main> parse item "parse this" [('p',"arse this")]

Choice What if we have to backtrack? First try to parse p, then q? The parser p +++ q behaves as the parser p if p succeeds, and as the parser q otherwise. (+++) :: Parser a -> Parser a -> Parser a p +++ q = \inp -> case p inp of [] -> parse q inp [(v,out)] -> [(v,out)] Example: *Main> parse failure "abc" [] *Main> parse (failure +++ item) "abc" [('a',"bc")]

Examples > parse item "" [] > parse item "abc" [('a',"bc")] > parse failure "abc" > parse (return 1) "abc" [(1,"abc")] > parse (item +++ return 'd') "abc" > parse (failure +++ return 'd') "abc" [('d',"abc")]

Note: The library file Parsing.hs is available on the course webpage. The Parser type is a monad, a mathematical structure that has proved useful for modeling many different kinds of computations.

“The value of x is generated by the item parser.” Sequencing Often we want to sequence parsers, e.g., the following grammar: ⟨if-stmt⟩ ::= if (⟨expr⟩) then ⟨stmt⟩ First parse if, then (, then ⟨expr⟩, … A sequence of parsers can be combined as a single composite parser using the keyword do. For example: Meaning: “The value of x is generated by the item parser.” p :: Parser (Char,Char) p = do x ← item item y ← item return (x,y)

Note: Each parser must begin in precisely the same column. That is, the layout rule applies. The values returned by intermediate parsers are discarded by default, but if required can be named using the ← operator. The value returned by the last parser is the value returned by the sequence as a whole.

If any parser in a sequence of parsers fails, then the sequence as a whole fails. For example: > parse p "abcdef" [((’a’,’c’),"def")] > parse p "ab" [] The do notation is not specific to the Parser type, but can be used with any monadic type.

The “Monadic” Way Parser sequencing operator fails if p fails (>>=) :: Parser a -> (a -> Parser b) -> Parser b p >>= f = \inp -> case parse p inp of [] -> [] [(v, out)] -> parse (f v) out p >>= f fails if p fails otherwise applies f to the result of p this results in a new parser, which is then applied Example > parse ((failure +++ item) >>= (\_ -> item)) "abc" [('b',"c")]

Sequencing Typical parser structure Using do notation p1 >>= \v1 -> p2 >>= \v2 -> . . . pn >>= \vn -> return (f v1 v2 . . . vn) do v1 <- p1 v2 <- p2 . . . vn <- pn return (f v1 v2 . . . vn) If some vi is not needed, vi <- pi can be written as pi, which corresponds to pi >>= \_ -> ...

Example Typical parser structure Using do notation rev3 = item >>= \v1 -> item >>= \v2 -> item >>= \_ -> item >>= \v3 -> return $ reverse (v1:v2:v3:[]) rev3 = do v1 <- item v2 <- item item v3 <- item return $ reverse (v1:v2:v3:[]) > rev3 “abcdef” [(“dba”,”ef”)] > (rev3 >>= (\_ -> item)) “abcde” [(‘e’,””)] > (rev3 >>= (\_ -> item)) “abcd” []

Key benefit: The result of first parse is available for the subsequent parsers parse (item >>= (\x -> item >>= (\y -> return (y:[x])))) “ab” [(“ba”,””)]

Derived Primitives Parsing a character that satisfies a predicate: sat :: (Char -> Bool) -> Parser Char sat p = do x <- item if p x then return x else failure Examples > parse (sat (==‘a’)) “abc” [(‘a’,”bc”)] > parse (sat (==‘b’)) “abc” [] > parse (sat isLower) “abc” > parse (sat isUpper) “abc”

Derived Parsers from sat digit, letter, alphanum :: Parser Char digit = sat isDigit letter = sat isAlpha alphanum = sat isAlphaNum lower, upper :: Parser Char lower = sat isLower upper = sat isUpper char :: Char → Parser Char char x = sat (== x)

To accept a particular string Use sequencing recursively: string :: String -> Parser String string [] = return [] string (x:xs) = do char x string xs return (x:xs) Entire parse fails if any of the recursive calls fail > parse (string "if [") "if (a<b) return;" [] > parse (string "if (") "if (a<b) return;" [("if (","a<b) return;")]

many applies the same parser many times many :: Parser a -> Parser [a] many p = many1 p +++ return [] many1 :: Parser a -> Parser [a] many1 p = do v <- p vs <- many p return (v:vs) > parse (many digit) "123ab" [("123","ab")] > parse (many digit) "ab123ab" [("","ab123ab")] > parse (many alphanum) "ab123ab" [("ab123ab","")] Examples

Example We can now define a parser that consumes a list of one or more digits of correct format from a string: p :: Parser String p = do char '[' d ← digit ds ← many (do char ',' digit) char ']' return (d:ds) > parse p "[1,2,3,4]" [("1234","")] > parse p "[1,2,3,4" [] Note: More sophisticated parsing libraries can indicate and/or recover from errors in the input string.

Arithmetic Expressions Consider a simple form of expressions built up from single digits using the operations of addition + and multiplication *, together with parentheses. We also assume that: * and + associate to the right. * has higher priority than +.

Example: Parsing a token space :: Parser () space = many (sat isSpace) >> return () token :: Parser a -> Parser a token p = space >> p >>= \v -> space >> return v identifier :: Parser String identifier = token ident ident :: Parser String ident = sat isLower >>= \x -> many (sat isAlphaNum) >>= \xs -> return (x:xs)

Formally, the syntax of such expressions is defined by the following context free grammar: expr → term '+' expr | term term → factor '*' term | factor factor → digit | '(' expr ')' digit → '0' | '1' | … | '9'

However, for reasons of efficiency, it is important to factorize the rules for expr and term: expr → term ('+' expr | ε) term → factor ('*' term | ε) Note: The symbol ε denotes the empty string.

It is now easy to translate the grammar into a parser that evaluates expressions, by simply rewriting the grammar rules using the parsing primitives. That is, we have: expr → term ('+' expr | ε) term → factor ('*' term | ε) expr :: Parser Int expr = do t ← term do char '+' e ← expr return (t + e) +++ return t

term → factor ('*' term | ε) expr → term ('+' expr | ε) term → factor ('*' term | ε) term :: Parser Int term = do f ← factor do char '*' t ← term return (f * t) +++ return f factor :: Parser Int factor = do d ← digit return (digitToInt d) +++ do char '(' e ← expr char ')' return e

then we try out some examples: Finally, if we define eval :: String → Int eval xs = fst (head (parse expr xs)) then we try out some examples: > eval "2*3+4" 10 > eval "2*(3+4)" 14 > eval "2+5-" 7 > eval "+5-" *** Exception: Prelude.head: empty list