Introduction to Compilers Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY
Computers run 0/1 strings (machine language program) A machine language program that adds two numbers First 4 bits for opcode Last 12 bits for operands Source: Louden and Lambert’s book: Programming Languages
Programmers write more readable character strings An assembly language program that adds two numbers, from Louden and Lambert’s book.
Even more readable character strings: high-level languages Imperative Languages: specifies HOW Fortran ALGOL PASCAL C C++ Java Declarative Languages: specifies WHAT SQL, ML, Prolog
Models of Computation in Languages Underlying most programming languages is a model of computation: Procedural: Fortran (1957) Functional: Lisp (1958) Object oriented: Simula (1967) Logic: Prolog (1972) Relational algebra: SQL (1974) Source: A. V. Aho. Lectures of Programming Languages and Translators
Programming Languages Evolve: Java as an Example Java 1.0, 1996 Object-oriented The language of choice for internet applet programs. Java 8, 2014 Changing computing background: multicore and processing big data. Java 8 streams support database-queries style of programming Java 8 incorporates many ideas from functional programming.
What is a compiler? A Compiler is a translator between computers and programmers More generally speaking, a Compiler is a translator between source strings and target strings. between assembly language and Fortran between Java and Java Bytecode between Java and SQL
Assembly language vs Fortran Source: Stephen A. Edwards. Lectures of Programming Languages and Translators
The Structure of a Compiler 1. Lexical Analysis 2. Syntax Analysis (or Parsing) 3. Semantic Analysis 4. Intermediate Code Generation 5. Code Optimization 6. Code Generation
Translation of an assignment statement (1)
Translation of an assignment statement (2)
Translation of SQL query SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname Query can be converted to relational algebra Relational Algebra converts to tree, joins form branches Each operator has implementation choices Operators can also be applied in different order! (sname) (bid=100 rating > 5) (Reserves Sailors)
Cost-based Query Sub-System Query Parser Query Optimizer Plan Generator Plan Cost Estimator Query Executor Catalog Manager Usually there is a heuristics-based rewriting step before the cost-based steps. Schema Statistics Select * From Blah B Where B.blah = blah Queries
Motivating Example Cost: *1000 I/Os By no means the worst plan! Misses several opportunities: selections could be`pushed’ down no use made of indexes Goal of optimization: Find faster plans that compute the same answer. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) Plan:
500,500 IOs Alternative Plans – Push Selects (No Indexes) Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 250,500 IOs
Alternative Plans – Push Selects (No Indexes) Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) SailorsReserves sid=sid bid = 100 sname (Page-Oriented Nested loops) (On-the-fly) rating > 5 (On-the-fly) *10 250,500 IOs 4250 IOs