Download presentation
Presentation is loading. Please wait.
Published byDorothy Lynn Crawford Modified over 9 years ago
1
Introduction CPSC 388 Ellen Walker Hiram College
2
Why Learn About Compilers? Practical application of important computer science theory Ties together computer architecture and programming Useful tools for developing language interpreters –Not just programming languages!
3
Computer Languages Machine language –Binary numbers stored in memory –Bits correspond directly to machine actions Assembly language –A “symbolic face” for machine language –Line-for-line translation High-level language (our goal!) –Closer to human expressions of problems, e.g. mathematical notation
4
Assembler vs. HLL Assembler Ldi $r1, 2 -- put the value 2 in R1 Sto $r1, x -- store that value in X HLL X = 2;
5
Characteristics of HLL’s Easier to learn (and remember) Machine independent –No knowledge of architecture needed –… as long as there is a compiler for that machine!
6
Early Milestones FORTRAN (Formula Translation) –IBM (John Backus) 1954-1957 –First High-level language, and first compiler Chomsky Hierarchy (1950’s) –Formal description of natural language structure –Ranks languages according to the complexity of their grammar
7
Chomsky Hierarchy Type 3: Regular languages –Too simple for programming languages –Good for tokens, e.g. numbers Type 2: Context Free languages –Standard representation of programming languages Type 1: Context Sensitive Languages Type 0: Unrestricted
8
CSL Another View of the Hierarchy CFL RL
9
Formal Language & Automata Theory Machines to recognizes each language class –Turing Machine (computable languages) –Push-down Automaton (context-free languages) –Finite Automaton (regular languages) Use machines to prove that a given language belongs to a class Formally prove that a given language does not belong to a class
10
Practical Applications of Theory Translate from grammar to formal machine description Implement the formal machine to parse the language Tools: –Scanner Generator (RL / FA): LEX, FLEX –Parser Generator (CFL / FA): YACC, Bison
11
Beyond Parsing Code generation Optimization –Techniques to “mindlessly” improve code –Usually after code generation –Rarely “optimal”, simply better
12
Phases of a Compiler Scanner -> tokens Parser -> syntax tree Semantic Analyzer -> annotated tree Source code optimizer -> intermediate code Code generator -> target code Target code optimizer -> better target code
13
Additional Tables Symbol table –Tracks all variable names and other symbols that will have to be mapped to addresses later Literal table –Tracks literals (such as numbers and strings) that will have to be stored along with the eventual program
14
Scanner Read a stream of characters Perform lexical analysis to generate tokens Update symbol and literal tables as needed Example: Input: a[j] = 4 + 1 Tokens: ID Lbrack ID Rbrack EQL NUM PLUS NUM
15
Parser Performs syntax analysis Relates the sequence of tokens to the grammar Builds a tree that represents this relationship, the parse tree
16
Partial Grammar assign-expr -> expr = expr array-expr -> ID [ expr ] expr -> array-expr expr -> expr + expr expr -> ID expr -> NUM
17
Example Parse assign-expression expression add-expressionarray-expression ID[ ] = NUM + expression
18
Abstract Syntax Tree assign-expression expression add-expressionarray-expression ID NUM expression
19
Semantic Analyzer Determine the meaning (not structure) of the program This is “compile-time” or static semantics only Example; a[j] = 4 + 1 –a refers to an array location –a contains integers –j is an integer –j is in the range of the array (not checked in C) Parse or Syntax tree is “decorated” with this information
20
Source Code Optimizer Simplify and improve the source code by applying rules –Constant folding: replace “4+2” by 6 –Combine common sub-expressions –Reordering expressions (often prior to constant folding) –Etc. Result: modified, decorated syntax tree or Intermediate Representation
21
Code Generator Generates code for the target machine Example: –MOV R0, jvalue of j into R0 –MUL R0, 22*j in R0 (int = 2 wds) –MOV R1, &avalue of a in R1 –ADD R1, R0a+2*j in R1 (addr of a[j]) –MOV *R1, 66 into address in R1
22
Target Code Optimizer Apply rules to improve machine code Example: –MOV R0, j –SHL R0(shift to multiply by 2) Use more complex –MOV &a[R0], 6machine instruction to replace simpler ones
23
Major Data Structures Tokens Syntax Tree Symbol Table Literal Table Intermediate Code Temporary files
24
Structuring a Compiler Analysis vs. Synthesis –Analysis = understanding the source code –Synthesis = generating the target code Front end vs. Back end –Front end: parsing & intermediate code generation (target machine-independent) –Back end: target code generation Optimization included in both parts
25
Multiple Passes Each pass process the source code once –One pass per phase –One pass for several phases –One pass for entire compilation Language definition can preclude one- pass compilation
26
Runtime Environments Static (e.g. FORTRAN) –No pointers, no dynamic allocation, no recursion –All memory allocation done prior to execution Stack-based (e.g. C family) –Stack for nested allocation (call/return) –Heap for random allocation (new) Fully dynamic (LISP) –Allocation is automatic (not in source code) –Garbage collection required
27
Error Handling Each phase finds and handles its own types of errors –Scanning: errors like: 1o1 (invalid ID) –Parsing: syntax errors –Semantic Analysis: type errors Runtime errors handled by the runtime environment –Exception handling by programmer often allowed
28
Compiling the Compiler Using machine language –Immediately executable, hard to write –Necessary for the first (FORTRAN) compiler Using a language with an existing compiler and the same target machine Using the language to be compiled (bootstrapping)
29
Bootstrapping Write a “quick & dirty” compiler for a subset of the language (using machine language or another available HLL) Write a complete compiler in the language subset Compile the complete compiler using the “quick & dirty” compiler
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.