01 – Overview of Compilers
Compiler Source program Object Program input Program Results Compile Execute
Interpreter Source program Program input Execute Program Results
L P M L S T
Python sort C++ x86 Java x86 MIPS ARM Power PC Java C++ x86 x86 C++ Java JVM
must match M P x86 sort SPARC
must match M S T S P T
Compile: Execute: x86 C++ x86 C++ sort
download Power PC C++ ARM C++ sort ARM
Functionally equivalent to a C++-to-x86 compiler C++ C C++ sort C C x86
C# C# M M C M Want Have
C C#/0 M M C M Write this Compile it To get this
Write this To get this Compile using the compiler from step 1 C#/0 M
C#/0 C# M M C#/0 M Write this Compile it To get this
Rewritten to This version is improve efficiency more efficient than C++ M M Rewritten to improve efficiency This version is more efficient than this version Compile it using existing compiler
S L Basic Java Lisp x86 JVM x86-64
Basic x86 Basic Basic x86 sort
Compile: Compiler Execute: Java Virtual Machine (JVM) Java P x86 Java JVM JVM Java Virtual Machine (JVM) Compiler
Compiled You will write version of this compiler. your compiler CPRL CPRLVM/A JVM Java x86 Java JVM You will write this compiler. Compiled version of your compiler Use the Java compiler to compile your CPRL compiler.
HelloWorld in CPRL HelloWorld in CPRL assembly language JVM x86 CRPL Hello CPRL CPRLVM/A CPRLVM/A CPRLVM/A CPRLVM CPRLVM HelloWorld in CPRL HelloWorld in CPRL assembly language HelloWorld in CPRLVM machine language Compiled version of your compiler Provided assembler
CPRLVM JVM x86 Provided emulator Hello
02 – Structure of Compilers
Front End Back End Lexical Analyzer (a.k.a. Scanner) Syntax Analyzer (a.k.a. Parser) Constraint Analyzer Code Generator Optimizer Identifier Table Error Handler Source Program Object Program Front End Back End
Scanner y := x + 1 id [“y”, (1, 1)] := [(1, 3)] id [“x”, (1, 6)] := [(1, 3)] id [“x”, (1, 6)] + [(1, 8)] literal [(“1”, (1, 10)]
Parser := id [“y”, (1, 1)] := [(1, 3)] id [“x”, (1, 6)] + [(1, 8)] := [(1, 3)] id [“x”, (1, 6)] + [(1, 8)] literal [(“1”, (1, 10)] idy idx + 1
Code Generator := idy idx inc LDLADDR 16 LDLADDR 12 LOADW LDCINT 1 ADD STOREW
Optimizer LDLADDR 16 LDLADDR 12 LOADW LDCINT 1 ADD STOREW INC
03 – Context-Free Grammars
All possible programs (valid and invalid) All programs with valid syntax All valid programs – valid syntax – satisfy contextual constraints
while loop expression statements end statement loopStmt ;
* x + 10 simpleExpression term addingOp factor varId constValue literal multiplyingOp * y
+ 2 * 3 4 expr op intLit multiplication has higher precedence addition has
+ 2 * 3 4 expr term intLit Note that “*” has higher precedence than “+” in this grammar.
binaryExpr expression operator
whileStmt booleanExpr statements
04 – Definition of CPRL
05 – Lexical Analysis
Scanner 'y' ' ' ':' '=' ' ' 'x' ' ' '+' ' ' '1' '0' id [“y”, (1, 1)] := [(1, 3)] id [“x”, (1, 6)] + [(1, 8)] literal [(“10”, (1, 10)] Source source code file y := x + 10
06 – Syntax Analysis
Sequence of tokens returned by the scanner Parser := id [“y”, (1, 1)] := [(1, 3)] id [“x”, (1, 6)] + [(1, 8)] literal [(“1”, (1, 10)] idy idx + 1 Sequence of tokens returned by the scanner
07 – Error Handling
08 – Abstract Syntax Trees
AssignmentStmt Variable Expression
LoopStmt Expression Statements *
BinaryExpr Expression (leftOperand) (rightOperand) Token (operator)
AST Program Declaration Statement ConstDecl VarDecl IfStmt LoopStmt Expression AddingExpr RelationalExpr Literal BinaryExpr SubprogramDecl FunctionDecl ProcedureDecl InitialDecl
Program DeclarativePart StatementPart SingleVarDecl identifier : x varType : Integer scopeLevel : PROGRAM ConstValue literal : 5 AssignmentStmt variable : expression : position : (4, 5) WritelnStmt NamedValue decl : position : (5, 11) Variable position : (4, 3)
09 – Constraint Analysis
... m n c a b BP
10 – CPRLVM
Program Code Variable Stack Free Space (Unused Memory) PC BP SP SB Low-numbered memory addresses High-numbered Stack grows in this direction
… machine instructions for the program m: relative address = 0 HALT LDLADDR m: relative address = 0 SB SP 87 88 BP 97 machine instructions for the program n: relative address = 4 c: relative address = 8 unused memory 92 96 PC 10
11 – Code Generation
12 – Optimization
peephole
13 – Subprograms
AST Declaration Statement ProcedureCallStmt ReturnStmt Expression FunctionCall SubprogramDecl FunctionDecl ProcedureDecl ParameterDecl
… no return value part -8 a parameter part -4 b relative to BP 2 return address n -8 -4 4 8 12 relative to BP parameter part context part local variable part BP SP 5 dynamic link temporary part … no return value part
memory address 0 higher numbered memory addresses SB BP runtime stack grows downward program and subprogram instructions block act. record for first call to P for second SP PC
… no return value part -8 a parameter part -4 b relative to BP 2 return address n -8 -4 4 8 12 relative to BP parameter part context part local variable part BP SP 5 dynamic link temporary part … no return value part
14 – Arrays
(references are copied) value semantics (array values are copied) reference semantics (references are copied) value semantics (array values are copied)
AST Declaration Expression Variable InitialDecl ArrayTypeDecl Type ArrayType
a + 0*4 a[0] a[1] a[2] a + 1*4 a + 2*4 …