Download presentation
Presentation is loading. Please wait.
Published byCharles Snow Modified over 9 years ago
1
Joey Paquet, 2000, 2002, 2007, 20081 Concordia University Department of Computer Science COMP 442/6421 Compiler Design
2
Joey Paquet, 2000, 2002, 2007, 20082 Course Description Instructor –Name: Dr. Joey Paquet –Office: EV-3-221 –Phone: 7831 –e-mail: paquet@cse.concordia.ca –Web: www.cse.concordia.ca/~paquet
3
Joey Paquet, 2000, 2002, 2007, 20083 Course Description Topic –Compiler organization and implementation. –Lexical, syntax and semantic analysis. Code generation. Outline –Design and implementation of a simple compiler. –Lectures related to the project.
4
Joey Paquet, 2000, 2002, 2007, 20084 Course Description Grading –Assignments (4) : 40% –Final Examination: 30% –Final Project: 30% Late assignment penalty: 50% per working day Assignments and project are graded on: Correctness, Completeness, Design, Style, Documentation.
5
Joey Paquet, 2000, 2002, 2007, 20085 Project Description Design and coding of a simple compiler –Individual work –Divided in four assignments –Final project is graded at the end of the semester, during a final demonstration –Testing is VERY important and up to you
6
Joey Paquet, 2000, 2002, 2007, 20086 Project Description A complete compiler is a fairly complex and large program: from 10,000 to 1,000,000 lines of code. Programming one will force you to go over your limits. It uses most of the elements of the theoretical foundations of Computer Science. It will probably be the most complex program you have ever written.
7
Joey Paquet, 2000, 2002, 2007, 20087 Introduction to Compilation A compiler is a translation system. It translates programs written in a high level language into a lower level language, generally machine (binary) language. source code compiler target code Source language Target languageTranslator
8
Joey Paquet, 2000, 2002, 2007, 20088 Introduction to Compilation The only language that the processor understands is binary. a: Register addition (from a symbol table) b: First operand (R1) c: Second operand (R3) d: Third operand (R15) 000100000100111111 ab c d
9
Joey Paquet, 2000, 2002, 2007, 20089 Introduction to Compilation Assembly language is the first higher level programming language. 000100000100111111 Add R1,R3,R15 There is a one-to-one correspondence between lines of code and the machine code lines. A op-code table is sufficient to translate assembly language into machine code.
10
Joey Paquet, 2000, 2002, 2007, 200810 Introduction to Compilation Compared to binary, it greatly improved the productivity of programmers. Why? Though a great improvement, it is not ideal: –Not easy to write –Even less easy to read and understand –Extremely architecture-dependent
11
Joey Paquet, 2000, 2002, 2007, 200811 Introduction to Compilation A compiler translates a given high-level language into assembler or machine code. X=Y+Z; L 3,YLoad working register with Y A 3,ZAdd Z to working register ST 3,XStore the result in X 00001001001011 00010010010101 00100100101001
12
Joey Paquet, 2000, 2002, 2007, 200812 FORTRAN: The first compiler The problems with assembly led to the development of the first compiler: FORTRAN. Stands for FORmula TRANslation. Developed between 1954 and 1957 at IBM by a team led by John Backus. This was an incredible feat, as the theory of compilation was not available at the time.
13
Joey Paquet, 2000, 2002, 2007, 200813 Paving down the road In parallel to that, Noam Chomsky was investigating on the structure of natural languages. His studies led the way to the classification of languages according to their complexity (aka the Chomsky hierarchy). This was used by various theoreticians in the 1960s and early 1970s to design a fairly complete set of solutions to the parsing problem. These solutions have been used ever since. As the parsing solutions became well understood, efforts were devoted to the development of parser generators. The most commonly known is YACC (Yet Another Compiler Compiler). Developed by Steve Johnson in 1975 for the Unix system.
14
Joey Paquet, 2000, 2002, 2007, 200814 Compilation vs. Interpretation A compiler translates high-level instructions into machine code. An interpreter uses the computer to execute the program directly, statement by statement. –Advantage: immediate response –Drawbacks: inefficient with loops, restricted to single-file programs.
15
Joey Paquet, 2000, 2002, 2007, 200815 Compiler’s Environment Building an executable from multiple files source code compiler object code executable code linker run-time libraries compiled modules
16
Joey Paquet, 2000, 2002, 2007, 200816 Phases of a Compiler front-end back-end target code intermediate code syntax tree token stream annotated tree optimized target code source code target code generation high-level optimization syntactic analysis lexical analysis semantic analysis low-level optimization
17
Joey Paquet, 2000, 2002, 2007, 200817 Lexical analysis Transforms the initial stream of characters into a stream of tokens –keywords: while, to, do, int, main –identifiers: i, max, total, i1, i2 –literals: 123, 12.34, “Hello” –operators: +, *, and, >, < –punctuation: {, }, [, ], ;
18
Joey Paquet, 2000, 2002, 2007, 200818 Syntactic analysis Attempts to build a valid parse tree from the grammatical description of the language. S id = * ; E E E Distance = rate * time;
19
Joey Paquet, 2000, 2002, 2007, 200819 Semantic Analysis The semantics of a program is its meaning. It is possible to have syntactically valid program that does not have any meaning. Semantic analysis has two parts: –Semantic checking: Validating the semantics of a syntactically valid program and gathering information about the meaning of its constitents (attributes). –Semantic translation: Giving a meaning to a program using a pre-established language, typically a syntax tree decorated with attributes. This is often called an intermediate representation.
20
Joey Paquet, 2000, 2002, 2007, 200820 Semantic Translation: example Breaks the statements into small pieces corresponding roughly to machine instructions. x = a*y+z; t1 = a*y; t2 = t1+z; x = t2;
21
Joey Paquet, 2000, 2002, 2007, 200821 High-Level Optimization The generated intermediate representation is often inefficient because of bad structure or redundancy. This kind of optimization is not bound to the target machine’s architecture. t1 = a*y; t2 = t1+z; x = t2; t1 = a*y; x = t1+z;
22
Joey Paquet, 2000, 2002, 2007, 200822 Target Code Generation Translates the optimized intermediate representation into the target code (normally machine language or assembler). t1 = a*y; x = t1+z; LE 4,a a in register 4 ME 4,y multiply by y AE 4,z add z STE 4,x store register 4 in x
23
Joey Paquet, 2000, 2002, 2007, 200823 Passes, Front End and Back End A pass consists in reading a high-level version of the program and writing a new lower-level version. Several passes are often needed: –To resolve forward references –To limit the memory used by the different phases.
24
Joey Paquet, 2000, 2002, 2007, 200824 Low-Level Optimization The generated target code is analyzed for inefficiencies such as dead code or code redundancy. Care is taken to exploit as much as possible the CPU’s capabilities. This phase is heavily architecture dependent. Lots of research is still done in this very complex area.
25
Joey Paquet, 2000, 2002, 2007, 200825 Passes, Front End and Back End The front-end is composed of: Lexical, Syntactic, Semantic analysis and High-level optimization. In most compilers, most of the front-end is driven by the Syntactic analyzer. It calls the Lexical analyzer for tokens and generates an abstract syntax tree when syntactic elements are recognized. The generated tree (or other intermediate representation) is then analyzed and optimized in a separate process. It has little or no concern with the target machine.
26
Joey Paquet, 2000, 2002, 2007, 200826 Passes, Front End and Back End The back-end is composed of: Code generation and low-level optimization. Uses the intermediate representation generated by the front-end to generate target machine code. Heavily dependent on the target machine. Independent on the programming language compiled.
27
Joey Paquet, 2000, 2002, 2007, 200827 System Support Symbol table –Central repository of identifiers (variable or function names) used in the compiled program. –Contains information such as the data type or value in the case of constants. –Used to identify undeclared or multiply declared identifiers, as well as type mismatches. –Provides temporary variables for intermediate code generation.
28
Joey Paquet, 2000, 2002, 2007, 200828 System Support Error handling procedures –Implement the compiler’s response to errors in the code it is compiling. –Provides useful insight to the user about where is the error and what it is. –Should find all errors in the whole program. –Can attempt to correct some errors and only give a warning.
29
Joey Paquet, 2000, 2002, 2007, 200829 System Support Run-time system –Some programming languages concepts raise the need for dynamic memory allocation. What are they? –The running program must then be able to manage its own memory use. –Some will require a stack, others a heap. These are managed by the run-time system.
30
Joey Paquet, 2000, 2002, 2007, 200830 Writing of Early Compilers The first C compiler minimal C compiler source assembler executable C compiler (minimal) C compiler (minimal) full C compiler source executable C compiler (full)
31
Joey Paquet, 2000, 2002, 2007, 200831 Writing Cross-Compilers A Unix-MacIntosh C cross compiler Mac C compiler source code in Unix C Unix C compiler Mac C complier usable on Unix Mac C complier usable on Unix Mac C compiler source code in Unix C Mac C complier usable on Mac
32
Joey Paquet, 2000, 2002, 2007, 200832 Writing Retargetable Compilers Two methods: –Make a strict distinction between front-end and back-end, then use different back-ends. –Generate code for a virtual machine, then build a compiler or interpreter to translate virtual machine code to a specific machine code. That is what we do in the project.
33
Joey Paquet, 2000, 2002, 2007, 200833 Summary The first compiler was the assembler, a one-to-one direct translator. Complex compilers were written incrementally, first using assemblers. All compilation techniques are well known since the 60’s and early 70’s.
34
Joey Paquet, 2000, 2002, 2007, 200834 Summary The compilation process is divided into phases. The input of a phase is the output of the previous phase. It can be seen as a pipeline, where the phases are filters that successively transform the input program into an executable.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.