Introduction
Traditional view of a compiler tool to translate high-level (imperative) code into optimized machine code large, complicated black box details for experts only focus on parsing (historically) optimization (recently) gcc, llvm, ...
Traditional reasons to study compilers application of theory regular expressions, context-free grammars, … graph-based optimization algorithms, ... type theory, ... interesting algorithms that are applicable elsewhere pattern matching, natural language understanding, … challenging software development project help understanding programming languages why does C not have nested functions? “It is good for you.”
Why really study compilers today? Many tools use “little” languages! digraph finite_state_machine { rankdir=LR; size="8,5” node [shape = doublecircle]; LR_0 LR_3 LR_4 LR_8; node [shape = circle]; LR_0 -> LR_2 [ label = "SS(B)" ]; LR_0 -> LR_1 [ label = "SS(S)" ]; LR_1 -> LR_3 [ label = "S($end)" ]; LR_2 -> LR_6 [ label = "SS(b)" ]; LR_2 -> LR_5 [ label = "SS(a)" ]; LR_2 -> LR_4 [ label = "S(A)" ]; LR_5 -> LR_7 [ label = "S(b)" ]; LR_5 -> LR_5 [ label = "S(a)" ]; LR_6 -> LR_6 [ label = "S(b)" ]; LR_6 -> LR_5 [ label = "S(a)" ]; LR_7 -> LR_8 [ label = "S(b)" ]; LR_7 -> LR_5 [ label = "S(a)" ]; LR_8 -> LR_6 [ label = "S(b)" ]; LR_8 -> LR_5 [ label = "S(a)" ]; } all: hello hello: main.o factorial.o hello.o g++ main.o factorial.o hello.o -o hello main.o: main.cpp g++ -c main.cpp factorial.o: factorial.cpp g++ -c factorial.cpp hello.o: hello.cpp g++ -c hello.cpp clean: rm -rf *o hello You might write a “little” compiler!
Why really study compilers today? Many tools use program analysis techniques! smell detection program verification software visualization You might write a program analysis!
Why really study compilers today? Many tools use program transformation! model-driven engineering program verification refactoring You might write a program transformation!
Compiler-like tools: programs can be executed in different ways. Source Program Interpreter Bytecode Compiler Compiler Bytecode Bytecode Interpreter Machine Language
Compiler-like tools: programs can be interpreted in different ways. Source Program Interpreter Analyzer Transpiler Note: transpiler: translates from one high-level language to another (source-to-source translator) cross-compiler: runs on one machine but generates code for another machine Target Program
Compilers are split into phases. Source Program lexical “scanning” Compiler analysis syntax “parsing” Front end “semantic analysis” contextual “middle end” intermediate code Back end Bad example of phase dependency: C variable declarations [need to check] synthesis object code Machine Language ⇒ Phases simplify compiler structure
Compilers are split into many phases. We will look at red phases in some detail. [Appel, 2002]
Phases interact via data structures. Source Program text lexical tokens Compiler analysis syntax Front end (abstract) syntax tree contextual decorated AST + symbol table intermediate code Back end synthesis intermediate code object code Machine Language object code
Compiler construction methods stepwise construction composition cross-compilation bootstrapping compiler compilers: tools for compiler generation scanner (regular expressions) parser (context-free grammars) attribute evaluation (attribute grammars) code generator (code templates, tree patterns) interpreter (formal semantics)
T-diagrams T-diagrams abstractly visualize compilers: T-diagrams compose: I S → T source language target language implementation language (must eventually be a machine language M) I S → T J S → T = M I → J
Composition Given T1 , an I → M compiler in M, construct T2 as an S → M compiler in M. Solution: construct T0 as an S → M compiler in I apply T1 to T0 I S → M M S → M = M I → M T0 T2 T1
Composition Given T1 , an I → M compiler in M, construct T2 as an S → M compiler in M. Solution: construct T0 as an S → M compiler in I apply T1 to T0 I S → M M S → M M I → M T0 T2 T1
Cross compilation Given T1 , an S → M compiler in S, and T2 , an S → N compiler in N, construct T3 as an S → M compiler in M. Solution: apply T2 to T1 (on N), which yields a cross-compiler apply the cross-compiler to T1 again (also on N) S S → M M S → M S S → M N S → M N S → N
Bootstrapping
Bootstrapping
Course Organization
Learning objectives knowledge of the basic terms and concepts understanding of commonly used methods experience with compiler construction tools ability to learn new techniques as they emerge understanding of relation between language design and implementation You should be able to implement a program analysis or transformation tool!
Lecture Schedule (I) Introduction Lexical Analysis Compiler structure and phases, bootstrapping Course organization Lexical Analysis regex -> nfa -> dfa -> minimal dfa lexing process grep, awk, lex/jflex, Antlr, quex (not table driven), vlex (visualization)
Lecture Schedule (II) Parsing I: LL methods Parsing II: LR methods left-factorization, left-recursion elimination JavaCC, Antlr Parsing II: LR methods LR(0), SLR(1), LALR(1), LR(1) yacc Parsing III: Other parsing methods parser combinators GLL, (S)GLR, CYK, Tomita, Early
Lecture Schedule (III) Abstract Syntax Trees Antlr tree building templates Pretty printing box language Symbol tables and name binding NaBL Attribute evaluation inherited vs. synthesized attributes yacc, Antlr Language embeddings and IDEs staging, meta-programming Spoofax, MPS
Lecture Schedule (IV) Runtime data organization Intermediate code representation and generation stack machines, three-address code, IR trees, code generation templates Machine code generation IR tree tiling, register allocation BURG-like tools Optimization
Course organization slides and other material posted on course web page yes, to be done... weekly lectures (1-3 hours) incl. discussions about papers, problems, solutions, ... reading assignments original research papers, background material “learning by doing” install and use tools
Evaluation: Continuous assessment no final exam regular assignments “every week” each assignment counts the same mostly practical
Textbooks (I) A. V. Aho, M. S. Lam, R. Sethi & J. D. Ullman, Compilers: Principles, Techniques, and Tools (2nd edition), Addison-Wesley, 2007. In-depth textbook. Andrew W. Appel, Modern Compiler Implementation in Java (2nd edition), Cambridge University Press, 2002. More practical approach. S. Muchnick. Advanced Compiler Design and Implementation, Morgan Kaufmann, 1997. Focus on optimization techniques.
Textbooks (II) N. Wirth. Compiler Construction, Addison Wesley, 1996. Classical approach to building compilers manually. Available at www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf R. Mak. Writing Compilers and Interpreters (3rd edition), Wiley, 2009. Object-oriented approach to building compilers manually.
Textbooks (III) M. Voelter. DSL Engineering, 2013. Modern view of software language engineering. Available at www.dslbook.org M. Fowler. Domain-Specific Languages, Addison-Wesley, 2010. UML-y approach to software language engineering. T. Parr, The Definitive ANTLR Reference: Building Domain-Specific Languages, The Pragmatic Bookshelf, 2007. Handbook for widely used tool.