Download presentation
Presentation is loading. Please wait.
1
Overview of Compiler Design CIS 631, CSE 691, CIS400, CSE 400 Compiler Design Dr. Nancy McCracken January 15, 2008
2
2 Compilers Compilers translate from a source language (typically a high level language) to a functionally equivalent target language (typically the machine code of a particular machine or a machine-independent virtual machine). Compilers for high level programming languages are among the larger and more complex pieces of software –Original languages included Fortran and Cobol Often multi-pass compilers (to facilitate memory reuse) –Compiler development helped in better programming language design Early development focused on syntactic analysis –Commercially, compilers are developed by very large software groups Current focus is on optimization and smart use of resources for modern RISC architectures
3
3 Motivation to learn about compilers General background of good software engineer –Increases understanding of language semantics –Understanding what machine code is generated for language constructs helps to under performance issues for languages –Helps to understand good language design –Opportunity for non-trivial programming project Can apply compiler techniques to many other language translation applications: –Translating JavaDoc comments to HTML –Collating responses from an email survey –Implementing a server that uses a protocol like http or imap –Printers use parsing to render PostScript files –Translating from a hardware description language to the schematic of a circuit –Some aspects of natural language processing, such as a spam filter
4
4 Dynamic Structure of a Compiler character stream val=01*val+i lexical analysis (scanning) token stream 1 (ident) "val" 3 (assign) - 2 (number) 10 4 (times) - 1 (ident) "val" 5 (plus) - 1 (ident) "i" token number token value syntax analysis (parsing) syntax tree ident = number * ident + ident Term Expression Statement Front end (analysis)
5
5 Dynamic Structure of a Compiler semantic analysis (type checking,...) syntax tree ident = number * ident + ident Term Expression Statement intermediate representation syntax tree, symbol table, or three address code (TAC)... optimization code generation const10 load1 mul... machine code Front end Back end (synthesis)
6
6 Compiler versus Interpreter Compiler translates to machine code scannerparser...code generatorloader source codemachine code Variant: interpretation of intermediate code... compiler... source codeintermediate code (e.g. Java bytecode) VM source code is translated into the code of a virtual machine (VM) VM interprets the code simulating the physical machine Interpreter executes source code "directly" scannerparser source code interpretation statements in a loop are scanned and parsed again and again
7
7 Static Structure of a Compiler parser & sem. analysis scanner symbol table code generation provides tokens from the source code maintains information about declared names and types generates machine code "main program" directs the whole compilation uses data flow
8
8 Lexical Analysis Stream of characters is grouped into tokens Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters, operators and special symbols int a; a = a + 2; int reserved word aidentifier ;special symbol aidentifier =operator aidentifier +operator 2integer constant ;special symbol
9
9 Syntax Analysis or Parsing Parsing uses a context-free grammar of valid programming language structures to find the structure of the input Result of parsing usually represented by a syntax tree Example of grammar rules: expression -> expression + expression | … variable | constant variable -> identifier constant -> intconstant | doubleconstant | … Example parse tree: =a+a2=a+a2
10
10 Semantic Analysis Parse tree is checked for things that violates the semantic rules of the language –Semantic rules may be written with an attribute grammar Examples: –Using undeclared variables –Function called with improper arguments Number and type of arguments –Array variables used without array syntax –Type checking of operator arguments –Left hand side of an assignment must be a variable (sometimes called an L-value) –...
11
11 Intermediate Code Generation An intermediate code representation is used to further break the program into something that is easy to generate machine code from Typical choices include –Annotated parse trees –Three Address Code (TAC), and abstract machine language Example statements:Resulting TAC: if (a b { a = a – c; }if _t1 goto L0 _t2 = a – c a = _t2 c = b * c L0: _t3 = b * c c = _t3
12
12 Code Optimization Compiler converts the intermediate representation to another one that attempts to be smaller and faster. Typical optimizations: –Inhibit code generation for unreachable segments –Getting rid of unused variables –Eliminating multiplication by 1 and addition by 0 –Loop optimization: e.g. removing statements not modified in the loop –Common sub-expression elimination –...
13
13 Object Code Generation The target program is generated in the machine language of the target architecture. –Memory locations are selected for each variable –Instructions are chosen for each operation –Individual tree nodes or TAC is translated into a sequence of machine language instructions that perform the same task Typical machine language instructions include things like –Load register –Add register to memory location –Store register to memory –...
14
14 Object Code Optimization It is possible to have another code optimization phase that transforms the object code into more efficient object code. These optimizations use features of the hardware itself to make efficient use of processors and registers. –Specialized instructions –Pipelining –Branch prediction and other peephole optimizations
15
15 Symbol Table Symbol table management is a part of the compiler that interacts with several of the phases –Identifiers are found in lexical analysis and placed in the symbol table –During syntactical and semantical analysis, type and scope information is added –During code generation, type information is used to determine what instructions to use –During optimization, the “live analysis” may be kept in the symbol table
16
16 Error Handling Error handling and reporting also occurs across many phases –Lexical analyzer reports invalid character sequences –Syntactic analyzer reports invalid token sequences –Semantic analyzer reports type and scope errors, and the like The compiler may be able to continue with some errors, but other errors may stop the process
17
17 Compiler Project Choose a source language –Needs to be large enough to have many of the interesting language features/problems for compiling –Needs to be small enough to implement in a semester –Tentatively choosing a subset called MicroJava Choose a target language –Needs to be either a real assembly language for a machine with an assembler or a virtual machine language with an interpreter –Tentatively choosing a slightly modified Java VM with an interpreter
18
18 Example MicroJava Program program P final int size = 10; class Table { int[] pos; int[] neg; } Table val; { void main() int x, i; {//---------- initialize val ---------- val = new Table; val.pos = new int[size]; val.neg = new int[size]; i = 0; while (i < size) { val.pos[i] = 0; val.neg[i] = 0; i = i + 1; } //---------- read values ---------- read(x); while (x != 0) { if (x > 0) val.pos[x] = val.pos[x] + 1; else if (x < 0) val.neg[-x] = val.neg[-x] + 1; read(x); } main program; no separate compilation classes (without methods) global variables local variables
19
19 References for these slides Wirth, Compiler Construction, chapters 1 and 2 Course notes from H. Mossenback, System Specification and Compiler Construction, http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/ –Also notes on MicroJava Course notes from Jerry Cain, Compilers, http://www.stanford.edu/class/cs143/ http://www.stanford.edu/class/cs143/ General references: –Aho, A., Sethi, R., Ullman, J., Compilers: Principles, Techniques and Tools, Addison-Wesley, 1986. (also new edition 2006) –Steven Muchnik, Advanced Compiler Design and Implementation, Morgan-Kaufmann, 1997. –Keith Cooper and Linda Torczon, Engineering a Compiler, Morgan-Kaufmann, 2003.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.