Extreme Grammaring Development of an industrial strength ISO VDM-SL Grammar
Introduction Need of building a parser for VDM from a VDM- SL grammar (VooDooM project) Although parsing is a subset of a well studied area like compilers, grammars were always looked upon as the “ugly duckling”. Extreme Programming + Engineering of Grammars = Extreme Grammaring Solution:
Background VDM: Vienna Development Method (VDM) is one of the most mature formal methods Primarily intended for formal specification and development of functional aspects of software systems. The importance of a VDM-SL Grammar: Documentation Build a parser (metric generators, language generators,...)
Starting point Previous work VDM-SL grammar in Happy + Alex Some problems State of Art (Hacking v.s. Engineering) Grammar was encoded directly Difficult to maintain/change (300 rules) Lack of tool support...
Principles of Grammar Engineering Introduced by Lämmel in “Towards an engineering discipline for grammaware” 1. Start from specifications - base-line grammar 2. Implement by customization - technology, implementation 3. Separate concerns - modularization 4. Enable evolution - minimize impact of changes 5. Ensure quality - metrics, correctness 6. Automate - traceability and scalability
Extreme Programming 1. The Planning Game - scope, priorities, technical estimates 2. Small Releases - very short release cycle 3. Metaphor - shared story 4. Simple Design - remove complexity when found 5. Testing - continuous unit testing, test-first design 6. Refactoring - restructure without functionality changes 7. Pair Programming - two programmers one machine 8. Collective Ownership - change each others code (anytime) 9. Continuous Integration - build and test Hour Week - work more = produce less 11. On-Site Customer - user in team 12. Coding Standards - no irrelevant personal preferences
Extreme Grammaring 1. The Planning Game - scope, priorities, technical estimates 2. Small Releases - very short release cycle 3. Metaphor - shared story 4. Simple Design - remove complexity when found 5. Testing - continuous unit testing, test-first design 6. Refactoring - restructure without functionality changes 7. Pair Programming - two programmers one machine 8. Collective Ownership - change each others code (anytime) 9. Continuous Integration - build and test Hour Week - work more = produce less 11. On-Site Customer - user in team 12. Coding Standards - no irrelevant personal preferences
The Planning Game Scope Follow strictly the ISO VDM-SL grammar spec Priorities 1. Disambiguate types 2. Disambiguate full grammar 3. Tree construction Technical estimates Not defined...
Small Releases Programmed releases (completed): Grammar typed from standard Disambiguated grammar AST construction Future Releases Haskell front-end (finished ) Java front-end
Testing White box Structural testing Full visibility into how system works Black box Functional or behavioral testing Only the interface with exterior is available
Grammar Unit Testing Unit test Test a single method, function, etc... Different types of unit tests: Parsing (succeeds, fails) Well-formness of the tree Test suite Combination of all unit test
Test Coverage Rule coverage Introduced by Purdon (1971) Explores all rules of a grammar Simple measure but doesn’t cover all cases Context-dependent rule coverage Introduced by Lämmel in “Grammar Testing” Generalization of the above in which the context is taken in account No known implementations
Test Coverage Metrics KP - Kernel Productions KPr - Kernel Priorities S - States RSa - Rule Size average RSm - Rule Size maximum RC - Rule Coverage Percentage version KPKPrSRSaRSmRC % % %
Test Coverage Metrics (2) Although the “Generics” test-suite does not change the coverage gets lower (Injections, total nr rules) The “expressions” and “functiontypes” were only added in version. versionGenericsexpressionsfunctiontypesAll %0 % 52% %24%6%54% %25%5%56%
Refactoring Semantic preserving transformations Study made by Lämmel in “Grammar Adaptation” Operators: preserve - replace a phrase by an equivalent fold - replace for its definition unfold - extract definition introduce - introduction of non-terminals eliminate - elimination of non-terminals rename - renaming non-terminals
Continuous Integration The integration test suite is a set of generic real world examples Only 52% coverage Examples are difficult to find Most of the examples use language extensions Examples: Found on internet Used a pre-processor for extracting code from literal programs.
Code Standards Nothing found about the subject The following can be applied: Limiting the number of children in a rule Limiting the number of alternatives in a rule Prefer some sort of constructs than other Convention for the non-terminal names Convention for syntax specification Limit module size...
Technological Alternatives Most parser technologies are too restrictive Lex + Yacc uses LALR(1) ANTLR uses LL(K) Have other problems, like: Lexical, context-free & abstract syntax separated Difficult to disambiguate (left-recursive “demon”) Grammars are technology dependent Solution: Generalized LR Parsing using SDF Grammars
Supporting the Methodology SDF - Syntax Definition Formalism Purely declarative Very expressive with natural and concise syntax Modular structure Supported by Scannerless Generalized LR Parsing Supports compositional grammars Allows parsing with ambiguities (allows earlier testing) Disambiguation is separated from the grammar using priority and associative rules
SDF - Technology Parsing sdf2table, sglr Testing test-unit, ambtracker, SdfCoverage Tree visualization tree2graph, graph2dot Transformation trm2baf, implodePT Haskell Generation Sdf2Haskell (AST, Pretty Printer)
Syntax Definition Formalism Optional: “?” Repetition: “*”, “+” Simple, e.g.: Identifier 2+ With separators, e.g.: { Indentifier “,”}+ Alternative: “|”
SDF - Example BNF: A ::= B | C B ::= “b” C ::= “c” | “c” “,” C SDF: B -> A C -> A “b” -> B { “c” “,” }+ -> C Both grammars recognize a single “b” or a list of “c” separated with a “,”
Setting up the bases Hard copy of the ISO VDM-SL standard (ISO/IEC ) Initial test suite Real world examples (loc: 1970) Exercises from Formal Methods course (loc: 507) Software: CVS to keep track of all changes parse-unit (sdf unit testing tool) Sdf2 software bundle (sdf2table, sglr) SdfCoverage Starndard unix tools (text editor, make,...)
Development cycle 1. Initial grammar 2. Correction 1. Correct grammar rules 2. Correct test suite 3. Disambiguation 1. Add filters 2. Change grammar shape Steps 2 and 3 should make heavy use of testing
Grammar correction 1. Isolate problem Source location Grammar rules involved 2. Correct grammar Change syntax (test suite) Run to verify test succeeds Run entire test battery 3. Commit Change Document change in message
Grammar Disambiguation 1. Isolate problem Source location Grammar rules involved 2. Create unit test Captures error Run to guarantee this 3. Correct grammar Add disambiguation filter (change syntax) Run to verify unit test succeeds Run entire test battery 4. Commit Change Document change in message
Grammar Metrics Simple metrics Total Number of Terminals (AVG per rule) Total Number of Non-terminals (AVG per rule) Complex metrics Introduced by Malloy in “A metrics suite for grammar-base software” McCabe Cyclomatic complexity Halstead Effort...
Problems found ISO Document has ambiguities in its specification Syntax Expressions: Apply v.s. RecordConstructor Apply v.s. IsDefTypeExpr EqualsDefinition CallStatement v.s. Expression Lexical Quotes are allowed in strings and in characters
Future plans Short-term (VDM parser “clients”): VooDooM Formal methods projects MCZ Objectifier Camila revival? Long-term Topic open for discussion...
What’s next? Test set completion (fill the rest 44%) Test generation Add examples manually Analyze the rules that were not covered Try to find pathologies Compute Grammar Metrics Test the methodology developing other grammars.
Conclusion Work was completed in only 3 weeks A complete grammar of the ISO VDM-SL is for the first time public available (parser) A strong methodology for grammar developing was defined Grammar testing were put to practice Different types of tests Test coverage
Thank you!
Questions / Discussion