Exploring and visualizing Generalized LL parsing Name: Bram Cappers “An algorithm must be seen in order to be believed” - D. Knuth
Agenda Preliminaries Motivation and Research questions Exploring GLL Visualizing GLL Concept Difficulties Conclusions / Department of Mathematics and Computer Science 05/08/2014
Parsing 05/08/2014
Parsing - notation Terminal Nonterminal Grammar Production rule Alternates Input string Parse tree 05/08/2014
Top down parsing - notation Partial Derivation Derivation / name of department 05/08/2014
Generalized Parsing Supports larger class of grammars Context-free grammars Returns multiple derivations One for every possible interpretation Why? Grammar modularity Perform disambiguation outside the grammar / name of department 05/08/2014
Motivation and Research Questions Generalized LL (GLL) parsing: Top-down parser multiple derivations Founded by E. Scott and A. Johnstone Discovers derivations in parallel GLL parsers Clear code structure Easy to generate automatically Efficient Object-Oriented GLL (OOGLL) Improves extensibility parser 05/08/2014
Motivation and Research Questions Generalized LL (GLL) parsing: Relatively new technology Several mechanisms not implemented Error reporting/recovery Support EBNF format Requires knowledge about control-flow High learning curve Abstract control-flow by means of “descriptors” Difficult to distinguish derivations Introduction of shared data structures blur relationship with LL parsing 05/08/2014
Motivation and Research Questions How can we obtain a GLL parser by starting from a traditional LL parser? Identify the core problems of LL parsing vs. GLL Can we explain the control-flow of a GLL parser by means of visualization techniques? How does a GLL parser work 05/08/2014
Agenda Preliminaries Motivation and Research questions Exploring GLL Visualizing GLL Concept Difficulties Conclusions / Department of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer S ::= • a S a S ::= • B Consider: S ::= a S a | B; B ::= b; 1 function per production rule Use the call-stack to store where parser must continue One-to-one correspondence between grammar and code S ::= a • S a S ::= a S • a S ::= B • B ::= • b B ::= b • / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba S ::= a S • a / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba S ::= a S • a / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba S ::= a S • a S ::= B • B / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba S ::= a S • a S ::= B • B b / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba S ::= a S • a B b / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba S ::= a S • a B b / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba B b / Departement of Mathematics and Computer Science 05/08/2014
Introduction – LL – A recognizer Consider: S ::= a S a | B; B ::= b; stack a S a aba B b / Departement of Mathematics and Computer Science 05/08/2014
Issues LLRD parsing No support for left-recursion E ::= E “+ 1” | “1” Call-stack Does not support/detect cycles No support for non-determinism Can only return one derivation Can only store one stack . Stack: E ::= E • “+ 1” / Departement of Mathematics and Computer Science 05/08/2014
Step 1: Take explicit control Solve issue 1: Do no use call-stack to store grammarslots Manually handle control-flow Split code into one function per grammar slot S ::= • a S a S ::= a • S a S ::= a S • a 05/08/2014
Step 1: Take explicit control Solve issue 1: Replace call-stack with our own data structure Manually handle control-flow Split code into one function per grammar slot 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Concept: Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
Step 2: Support multiple derivations Solve issue 2: GLL: “parallelization of LLRD parsers” Summarize every derivation as a triple (1, 2, 3) The function where parser has left The stack corresponding to derivation The position how far input string was parsed 05/08/2014
GSS Stacks have a lot in common Combine multiple stacks into one data structure Replace stack in triple with a GSS node 05/08/2014
Descriptor Descriptor: triple with GSS node “Job description” where one or more derivations are interested in. Example: (S ::= • a S a, S ::= a S • a, aba) To parse: a S of a S a starting from aba Upon success, continue with S ::= a S • a 05/08/2014
Agenda Preliminaries Motivation and Research questions Exploring GLL Visualizing GLL Concept Difficulties Conclusions / Department of Mathematics and Computer Science 05/08/2014
Why do we want to explore/visualize Generalized parsing? GLR/GLL discover derivations in parallel Difficult to distinguish derivations Manual control-flow through descriptors Descriptors generated everywhere Makes it difficult to reason why descriptors are created at certain positions Large amount of descriptors Why GLL? OOGLL suitable to realize visualization Conceptually easier to understand Top-down parsing less complex to visualize 05/08/2014
Goal visualization Step-by-step visualization of GLL Reducing the learning curve as much as possible Education purposes Explanation LL vs. GLL parsing To speed up development of GLL extensions In particular: Visualize how data structures evolve over time. Explain why/when descriptors work are created / name of department 05/08/2014
Visualizing GLL parsing S ::= A S d | a S | ɛ A ::= a Grammar “walking” Algorithm walker Derivation “tree” Stack “path from walker to root” Start Start symbol grammar Destination Point without substitutions 05/08/2014
Visualizing GLL parsing (2) Summarize derivations in one tree Improve scalability Descriptor: “Point of continue” for 1 or more derivations Sitting walker Descriptors to be processed / name of department 05/08/2014
Visualizing sharing Walking routes in parallel Detection sharing One walker per derivation Detection sharing Same input position Same walker position Postponed sharing: Descriptor processed “too soon” Notify user using messaging / name of department 05/08/2014
Descriptors paradox Visualizing Descriptor: Visualizing Control-flow: Independent chunk of work where one or more derivations are interested in. Visualizing Control-flow: Show progress of derivations discovered so far How to properly visualize a descriptor if not all derivations have been discovered? / name of department 05/08/2014
Difficulties Reversion technology Ambiguities Level of detail Parallel changes over multiple views hard to follow Control algorithm with “tape-recording” functionality Ambiguities Visualize overlap in derivations Level of detail Many special cases in the GLL algorithm Dynamic behaviour of descriptor over time Descriptors paradox / name of department 05/08/2014
Conclusions on visualization Advantages: Simple concept to understand Clear relationship between LL and GLL Concept is applicable to top down-parsing in general Disadvantages: At runtime visualization insufficient to consistently visualize descriptors Not suitable for highly ambiguous grammars / name of department 05/08/2014