Open Source Compiler Construction (for the JVM) Tom Lee Shine Technologies OSCON Java 2011
Overview About the Java Virtual Machine About Scala About Apache BCEL A Generalized Compiler Architecture Introducing “Awesome” Writing a Stub Code Generator for “Awesome” Abstract Syntax Trees (ASTs) Parsing “Awesome” with Scala What Next?
About the Java Virtual Machine Stack-based machine. Operands pushed onto a stack, operated upon by opcodes. Heavily influenced by “Java The Language”. Allows interop between various source languages. JVM bytecode is architecture-independent. Multiple implementations. Oracle / HotSpot Apache / Harmony OpenJDK Many more...
About Scala http://www.scala-lang.org/ Functional/OOP hybrid programming language First class functions Pattern matching Classes & traits Etc. Runs on the JVM. And .NET. Own standard library (on top of the Java standard library).
About Apache BCEL http://jakarta.apache.org/bcel/ “Bytecode Engineering Library” Emit JVM bytecode with a reasonably straightforward API. Also supports reading class files, modifying classes, etc. But we won't use that stuff here.
A Generalized Compiler Architecture Scanner matches patterns in source code, outputs tokens. Parser organizes tokens into an Abstract Syntax Tree (AST). Or a Parse Tree. Semantic checks or optimizations of the AST may occur here. Code Generator traverses the AST to produce target code.
Introducing “Awesome” program ::= (expression ';')* expression ::= sum sum ::= product (('+' | '-') expression)? product ::= number (('*' | '/') expression)? number ::= /[0-9]+/ The “Awesome” compiler will generate JVM bytecode to display each parsed expression. (Awesome, eh?)
Writing a Stub Code Generator for “Awesome” Work backwards and write a stub code generator first. Immediate feedback! A solid foundation on which to build the rest of the compiler. Generate a “Hello World” class file with BCEL. We'll make the code generator do “real” stuff later.
Abstract Syntax Trees (ASTs) A logical, in-memory representation of the source program. Constructed by the parser. Semantic checks and optimizations possible at the AST level. Outside the scope of this presentation – sorry! Let's add the beginnings of an AST to our compiler.
Parsing “Awesome” with Scala Use parser combinators. Combine small parsing functions together to describe a language. Scanner and parser wrt the Generalized Compiler Architecture. Describe languages using something of a pseudo-EBNF. You can write your own parsers too. e.g. for string literals. We'll use built-in parsers to identify integers & semi-colons.
The “Awesome” Grammar Revisited program ::= (expression ';')* expression ::= sum sum ::= product (('+' | '-') expression)? product ::= number (('*' | '/') expression)? number ::= /[0-9]+/
What next? Variables? Conditional logic? For/while loops? Function calls?
Summary Write the code generator first for instant gratification. Use Scala parser combinators to build an AST from your input. Update the code generator to walk the AST, emitting equivalent bytecode for each node using BCEL. Iterate & add new features!
That's All, Folks! Shine Technologies All code from this presentation will be available from http://github.com/thomaslee/oscon2011-awesome Shine Technologies www http://www.shinetech.com github shinetech twitter @realshinetech Tom Lee www http://tomlee.co email me@tomlee.co github thomaslee twitter @tglee