Layered Combinator Parsers with a Unique State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands.

Slides:



Advertisements
Similar presentations
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Advertisements

1 2.Lexical Analysis 2.1Tasks of a Scanner 2.2Regular Grammars and Finite Automata 2.3Scanner Implementation.
Error Handling A compiler should: detect errors locate errors recover from errors Errors may occur in any of the three main analysis phases of compiling:
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Environments and Evaluation
IFL2002 Madrid 1 a generic test-system Pieter Koopman, Artem Alimarine, Jan Tretmans, Rinus Plasmeijer Nijmegen, NL.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Top-Down Parsing - recursive descent - predictive parsing
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
Functional Programming guest lecture by Tim Sheard Parsing in Haskell Defining Parsing Combinators.
Advanced Functional Programming Tim Sheard 1 Lecture 14 Advanced Functional Programming Tim Sheard Oregon Graduate Institute of Science & Technology Lecture.
0 PROGRAMMING IN HASKELL Chapter 9 - Higher-Order Functions, Functional Parsers.
Advanced Programming Andrew Black and Tim Sheard Lecture 11 Parsing Combinators.
A grammar for arithmetic expressions involving the four basic operators and parenthesized expressions. Parenthesized expressions have the highest precedence.
Lexical Analyzer in Perspective
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
© M. Winter COSC 4P41 – Functional Programming Programming with actions Why is I/O an issue? I/O is a kind of side-effect. Example: Suppose there.
CSc 453 Lexical Analysis (Scanning)
Data Structures & Algorithms
More Parsing CPSC 388 Ellen Walker Hiram College.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Parser Generation Using SLK and Flex++ Copyright © 2015 Curt Hill.
Top-Down Parsing.
CSCI-383 Object-Oriented Programming & Design Lecture 25.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Introduction to Parsing
What is a Parser? A parser is a program that analyses a piece of text to determine its syntactic structure  3 means 23+4.
Lexical Analyzer in Perspective
Chapter 3 Lexical Analysis.
G. Pullaiah College of Engineering and Technology
CSc 453 Lexical Analysis (Scanning)
PROGRAMMING IN HASKELL
CSc 453 Lexical Analysis (Scanning)
Formal Language Theory
Two issues in lexical analysis
Syntax Analysis Sections :.
Lexical and Syntax Analysis
Chapter 3: Lexical Analysis
PROGRAMMING IN HASKELL
Subject Name:Sysytem Software Subject Code: 10SCS52
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
Programming Language Syntax 5
CS 3304 Comparative Languages
Introduction to Parsing
Introduction to Parsing
PROGRAMMING IN HASKELL
CSC 4181 Compiler Construction Context-Free Grammars
Kanat Bolazar February 16, 2010
CSCE 314: Programming Languages Dr. Dylan Shell
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
CSCE 314: Programming Languages Dr. Dylan Shell
Lecture 5 Scanning.
PROGRAMMING IN HASKELL
CSc 453 Lexical Analysis (Scanning)
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Layered Combinator Parsers with a Unique State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands

Parser CombinatorsPieter Koopman2 Overview conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling

Parser CombinatorsPieter Koopman3 parser combinators Non deterministic, list of results :: Parser s r :== [s] -> [ ParseResult s r ] :: ParseResult s r :== ([s],r) fail & yield fail = \ss = [] yield r = \ss = [(ss,r)] recognize symbol satisfy :: (s->Bool) -> Parser s s satisfy f = p where p [s:ss] | f s = [(ss,s)] p _ = [] symbol sym :== satisfy ((==) sym)

Parser CombinatorsPieter Koopman4 parser combinators 2 sequence-combinators ( ) infixr 6::(Parser s r)(r->Parser s t)->Parser s t ( ) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1, tuple <- p2 r1 ss2 ] ( )infixl 6::(Parser s(r->t))(Parser s r)->Parser s t ( ) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1, (ss3,r) <- p2 ss2 ] choose-combinator ( ) infixr 4::(Parser s r) (Parser s r)->Parser s r ( ) p1 p2 = \ss = p1 ss ++ p2 ss

Parser CombinatorsPieter Koopman5 parser combinators 3 some useful abbreviations infixr 7 f p :== yield f p ( ) infixl 6 ( ) p1 p2 :== (\h p1 p2

Parser CombinatorsPieter Koopman6 parser combinators 4 Kleene star star p = p star p yield [] plus p = p star p parsing an identifier identifier :: Parser Char String identifier = satisfy isAlpha star (satisfy isAlphanum)

Parser CombinatorsPieter Koopman7 parser combinators 5 context sensitive parsers twice the same character doubleChar = satisfy isAlpha \c -> symbol c arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' symbol 'a' +> symbol 'c'

Parser CombinatorsPieter Koopman8 parser combinators 5 context sensitive parsers twice the same character doubleChar = satisfy isAlpha \c -> symbol c arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' symbol 'a' +> symbol 'c' star (satisfy isSpace) +> symbol 'a' symbol 'x'

Parser CombinatorsPieter Koopman9 properties of combinators + concise and clear parsers + full power of fpl available + context sensitive + arbitrary look-ahead + can be efficient, continuations IFL '98 - no error handling (messages & recovery) - no unique symbol tables - separate scanner yields problems scan entire file before parser starts

Parser CombinatorsPieter Koopman10 Requirements parse state with error file notion of position user-defined extension e.g. symbol table possibility to add separate scanner efficient implementation, continuations for programming languages we want a single result (deterministic grammar)

Parser CombinatorsPieter Koopman11 Uniqueness files and windows that should be single-threaded are unique in Clean fwritec :: Char *File -> *File data-structures can be updated destructively when they are unique only unique arrays can be changed

Parser CombinatorsPieter Koopman12 System-architecture replace the list of symbols by a structure containing actual input position error administration user defined part of the state use a type constructor class to allow multiple levels

Parser CombinatorsPieter Koopman13 Type constructor class Reading a symbol class PSread ps s st :: (*ps s *st)->(s, *ps s *st) Copying the state is not allowed, use functions to manipulate the input class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st) Minimal parser state requires Clean 2.0 class ParserState ps symbol state | PSread, PSsplit, PSback, PSclear ps symbol state

Parser CombinatorsPieter Koopman14 New parser combinators Parsers have three arguments 1. success-continuation determines action upon success SuccCont :== Item failCont State -> (Result, State) 2. fail-continuation specifies what to do if parser fails FailCont :== State -> (Result, State) 3. current input state State :== (Symbol, ParserState)

Parser CombinatorsPieter Koopman15 New parser combinators 2 yield and fail, apply appropriate continuation yield r = \succ fail tuple = succ r fail tuple failComb = \succ fail tuple = fail tuple sequence of parsers, change continuation p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t choice, change continuations ( ) p1 p2 = \succ fail tuple = p1 (\r f t = succ r fail (PSclear t)) (\t2 = p2 succ fail (PSback t2)) (PSsplit tuple)

Parser CombinatorsPieter Koopman16 string input a very simple instance of ParserState :: *StringInput symbol state = { si_string :: String // string holds input, si_pos :: Int // index of current char, si_hist :: [Int] // to remember old positions, si_state :: state // user-defined extension, si_error :: ErrorState } instance PSread StringInput Char state where PSread si=:{si_string,si_pos} = (si_string.[si_pos],{si & si_pos = si_pos+1}) instance PSsplit StringInput Char state where PSsplit (c,si=:{si_pos,si_hist}) = (c,{si & si_hist = [si_pos:si_hist]}) instance PSback StringInput Char state where PSback (_,si=:{si_string,si_hist=[h:t]}) = (si_string.[h-1],{si & si_pos = h, si_hist = t})

Parser CombinatorsPieter Koopman17 Separate scanner and parser sometimes it is convenient to have a separate scanner e.g. to implement the offside rule task of scanner and parser is similar. So, use the same combinators due to the type constructor class we can nest parser states

Parser CombinatorsPieter Koopman18 a simple scanner use of combinators doesn’t change produces tokens (algebraic datatype) scanner = skipSpace +> ( generateOffsideToken satisfy isAlpha star (satisfy isAlphanum) plus (satisfy isDigit) symbol '=' symbol '(' symbol ')' K CloseToken )

Parser CombinatorsPieter Koopman19 generating offside tokens use an ordinary parse function generateOffsideToken = pAcc getCol \col -> // get current coloumn pAcc getOffside \os_col -> // get offside position handleOS col os_col where handleOS col os_col | EndGroupGenerated os_col | col < os_col = pApp popOffside (yield EndOfGroupToken) = pApp ClearEndGroup failComb | col <= os_col = pApp SetEndGroup (yield EndOfDefToken) = failComb

Parser CombinatorsPieter Koopman20 Parser state for nesting parser state contains scanner and its state :: *NestedInput token state = E..ps sym scanState: { ni_scanSt :: (ps sym scanState), ni_scanner :: (ps sym scanState) -> *(token, ps sym scanState)), ni_buffer :: [token], ni_history :: [[token]], ni_state :: state } can be nested to any depth we can, but doesn’t have to, use this

Parser CombinatorsPieter Koopman21 Parser state for nesting 2 NestedInput *File *ErrorState *OffsideState ScanState scanner *HashTable

Parser CombinatorsPieter Koopman22 Parser state for nesting 3 apply scanner to read token instance PSread NestedState token state where PSread ns=:{ns_scanner, ns_scanSt} # (tok, state) = ns_scanner ns_scanSt = (tok, {ns & ns_scanSt = state}) here, we ignored the buffer define instances for other functions in class ParserState

Parser CombinatorsPieter Koopman23 error handling general error correction is difficult correct simple errors skip to new definition otherwise Good error messages: location:position in file what are we parsing:stack of contexts Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of =

Parser CombinatorsPieter Koopman24 error handling 2 basic error generation parseError expected val = \succ fail (t,ps) = let msg = toString expected +++ " expected instead of " +++ toString t in succ val fail (PSerror msg (PSread ps)) useful primitives wantSymbol sym = symbol sym parseError sym sym want p msg value = p parseError msg value skipToSymbol sym = symbol sym parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym

Parser CombinatorsPieter Koopman25 Parser Parsing expressions pExpression = "Expression" ::> match mBasicValue pIdentifier symbol CaseToken +> pDeter pCompoundExpression star pCaseAlt <+ skipToSymbol EndOfGroupToken symbol OpenToken +> pCompoundExpression <+ wantSymbol CloseToken

Parser CombinatorsPieter Koopman26 identifiers in hashtable use a parse-function hashtable is user defined state in ParserState pIdentifier = match mIdentToken \ident = pAccSt (putNameInHashTable ident) \name={app_symb=UnknownSymbol name, app_args=[]} the function pAccSt applies a function to the user defined state

Parser CombinatorsPieter Koopman27 limitations of this approach syntax specified by parse functions grammar is not a datastructure no detection of left recursion runtime error instead of nice message no automatic left-factoring do it by hand, or runtime overhead p1 = p q1 p q2 p2 = p (q1 q2)

Parser CombinatorsPieter Koopman28 discussion old advantages concise, fpl-power, arbitrary look ahead, context sensitve new advantages unique and extendable parser state one or more layers decent error handling, simple error correction can be added still efficient, overhead < 2 non-determinism only when needed