Presentation is loading. Please wait.

Presentation is loading. Please wait.

410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction.

Similar presentations


Presentation on theme: "410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction."— Presentation transcript:

1 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

2 410/510 2 of 31 The Big Picture In this course we will be constructing a compiler! Moving from a High Level Language to a Low Level Language Compilers are complex programs –> 10,000 lines of code Integrate aspects from many different areas of CS –Formal language theory, algorithms, data structures, HLL & LLL (obviously), user interaction (error reporting)

3 410/510 3 of 31 What is a compiler? A specialization of a language translator Usually in CS: – the Source is a high level programming language –the Target is a machine code for a micro-processor L1L2 SourceTarget Cx86 processor

4 410/510 4 of 31 Applications of Compiler Techniques Potential Source languages include: –Natural languages (English, French,….) –Circuit layout languages –Mark-up languages (HTML, XML, …) –Command line languages (SQL interface) Potential Target languages include: –Natural languages –Printer drivers –Markup languages e.g. HTML to RTF converter –Could involve many of the aspects we will cover in compiler construction

5 410/510 5 of 31 Compilers for Programming Languages If we had 1 compiler for each {Source,Target} pair then we would have a lot of compilers! Source LanguagesTarget Languages Compilers C Prolog Java Lisp Haskell C++ C# Fortran Pascal Sather x86 (MMX) JVM PowerPC 750 (G3) ARM SPARC AMD K6

6 410/510 6 of 31 Modularity for Code Generation Compilers x86 ARM G4 Source Intermediate Representation  Compiler portability (man gcc – lists different target machines)

7 410/510 7 of 31 Modularity for Source Languages? Compilers Intermediate Representation Sources Targets C Java Prolog Typically compilers only compile one source language – but the techniques used are very similar and are shared across different compilers

8 410/510 8 of 31 Typical Compiler Intermediate Representation SourceTarget Front-endBack-end Independent of Source and Target languages AnalysisSynthesis For a new Source language – we can add a new front-end to an existing back-end For a new Target language – we can add a new back-end to an existing front-end course nowweek 6  Ideally:

9 410/510 9 of 31 Front End Knowledge about the source language –Lexical structure (tokens) –Syntax Programming constructs –Conditionals, iteration etc –Semantics Type checking Error-reporting –UI component Often basic (and unhelpful!) May vary if part of an IDE or standalone Source program Lexical analyser Syntax analyser Semantic analyser Symbol table Error Handler

10 410/510 10 of 31 Lexical Analysis Lexical Tasks the compiler has to perform: group together the 3 characters ‘max’ to form the single variable identifier max group together the 2 characters ‘<=’ to form the single relational operator <= (less than or equal to) int max = 20, x; read(x); if ( x <= max ) print(‘ok’); else print(‘too big’);

11 410/510 11 of 31 Syntactic Analysis Recognise the if.. then … else structure Group the x <= max into a single expression with a relational operator Recognise the format of the variable declaration list –Such that x is correctly declared to be an int Loops, program blocks (begin…end) Arithmetic expressions, etc

12 410/510 12 of 31 Semantic analysis Check that x <= max is a sensible thing to do –If x was a boolean and max a string then we would have a type error Check that the ‘20’ is in fact an integer and so can be assigned to an int And also (can be split over several phases) –Keep a note of all the variables used so we make sure they all refer to the same value (in memory)

13 410/510 13 of 31 Data Structures Stream of text as the source file Group together text into larger units from a limited set Nearly all programming constructs can be represented as tree structures If statement ifBoolean expressionstatementelse statement Relational operator expression

14 410/510 14 of 31 Data Structures Lexical Analyzer –  Stream of tokens (enumerated type) –NUMBER OPERATOR NUMBER Syntax Analyzer / Parser –  Tree of program structure program if_statementassignmentwhile_loopoutput_statement

15 410/510 15 of 31 Back-end Knowledge about target processor / virtual machine –Instruction set ‘costs’ of different: –op-codes –instructions –Registers –Memory Semantic analyser Intermediate code generator Code optimiser Code generator Symbol table manager Error handler

16 410/510 16 of 31 Putting it together Source program Lexical analyser Syntax analyser Semantic analyser Symbol table Error Handler Intermediate code generator Code optimiser Code generator Compiler Skeletal source program preprocessor compiler assembler Loader link-editor Target asse mbly program Relocatable machine code Absolute machine code Source program A language-processing system

17 410/510 17 of 31 Grammars We define/describe HL languages with grammars A Grammar consists of: –T, set of Terminals –N, set of Non-terminals N  T =  –P, set of Productions    Where  and  are members of T  N –S, special member of N, the Start symbol G = {T, N, P, S}

18 410/510 18 of 31 Chomsky’s Grammar Hierarchy Type 3 Regular Grammar Type 2 Context Free Grammar Type 1 Context-Sensitive Grammar Type 0 Unrestricted Grammar

19 410/510 19 of 31 Grammars Type 0 (unrestricted) –   , –  and  are unrestricted sequences,  is not null –languages formed from Type 0 grammars can be recognised by non-deterministic Turing machines Type 1 (context sensitive) –  A    B  –A becomes B in the context of  …  –Complex for computer analysis

20 410/510 20 of 31 Grammars Type 2 (context free) –A   A is a Non-terminal  is a member of T  N   (can be empty) –Equivalent to a push-down automaton Type 3 (regular) –A  wB, A  w (right linear) w is a string of Terminals A and B are Non-Terminals –Finite state automata

21 410/510 21 of 31 In a compiler Use the minimum complexity grammars that let us successfully cope with HL programming languages (and process them efficiently) Regular grammars (=regular expressions) in the Lexical Analysis phase –‘recognise the words’ Context-free grammars in the Syntax Analysis phase –’recognise the phrases’ –  define our HLL as a grammar based on the output of the Lexical Analysis Deal with context sensitivity in the Semantic Analysis phase

22 410/510 22 of 31 Overall Front-End View Source program Text file Lexical Analyser Syntax Analyser tokens Semantic Analyser Tree structure Intermediate Representation Type-safe Tree structure Back-end Tree / Linearized tree Context-free grammar Regular grammar Flex Bison

23 410/510 23 of 31 The Textbook Compilers: principles, techniques & tools Aho, Sethi & Ullman Addison-Wesley {‘The Dragon Book’}

24 410/510 24 of 31 Assessment Building a compiler for a new language Front-end –Lexical analysis –Parsing Back end –Generating assembler code Some formal and some practical –Formal more at the front-end

25 410/510 25 of 31 Programming & Tools Lexical analysis generator – lex / flex Parser generator – yacc / bison C / C++ –To implement the remainder of the compiler Unix environment –make files will be useful for coordinating lex and yacc

26 410/510 26 of 31 Instant Compilation Consider the program: main() { int a = 3; a = a + 1; } Given a reasonably sensible assembly language a hand- compilation might be: LDA #3 STA 1 LDA 1 ADD a, #1 STA 1

27 410/510 27 of 31 & an Instant Compiler could look like … Switch( source_code_construct ) { case INT_DEC: print( “LDA #”, INT.value) print(“STA 1”) break case INT_ADD: print(“LDA 1”) print(“ADD a,#”, ADD.value) print(“STA 1”) break } /* end switch */

28 410/510 28 of 31 The Problems …. Not efficient, (LDA #4; STA 1) Only works for 1 variable Only works at one location in memory –(usually let assembler deal with symbolic addresses) Only has 2 programming constructs! Not even slightly portable: – 1 instruction set & 1 source language

29 410/510 29 of 31 More problems… No error reporting –type checking? Assumes: –Program is correct –Recognition of programming language constructs int a = 3  INT_DEC –Access to values INT.value, ADD.value –1:1 relationship between integers and memory locations

30 410/510 30 of 31 Solutions We can view compilers as a solution to all of these problems E.g. –Only compile correct programs to object code –Recognise all constructs in the language –Improve the efficiency of code Execution speed Memory usage –Meaningful error messages to the user –Cope with different target architectures

31 410/510 31 of 31 Why are compilers called compilers? In early compilers one of the main tasks was connecting object program to –standard library functions, I/O devices collecting information from different sources(e.g. libraries) –OS and processor dependent This is now performed by ‘linkers’ Compile – ‘construct by collecting from different sources’


Download ppt "410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction."

Similar presentations


Ads by Google