Compiler Construction Lawate p m
Administrative info Instructor Teaching Assistant Name: Vana Doufexi E-mail: vdoufexi@cs.northwestern.edu Office: Ford Building, #2-229 Hours: E-mail to set up appointment Teaching Assistant TBA
Administrative info Course webpage Newsgroup http://www.cs.northwestern.edu/academics/courses/322 contains: news staff information lecture notes & other handouts homeworks & manuals policies, grades newsgroup info useful links Newsgroup Name: cs.322 nntp: news.cs.northwestern.edu
What is a compiler A program that reads a program written in some language and translates it into a program written in some other language Modula-2 to C Java to bytecodes COOL to MIPS code How was the first compiler created?
Why study compilers? Application of a wide range of theoretical techniques Data Structures Theory of Computation Algorithms Computer Architecture Good SW engineering experience Better understanding of programming languages
Features of compilers Correctness Speed of target code preserve the meaning of the code Speed of target code Speed of compilation Good error reporting/handling Cooperation with the debugger Support for separate compilation
Compiler structure Front End Back End IR source code target code Use intermediate representation Why?
Compiler Structure Front end Back end Recognize legal/illegal programs report/handle errors Generate IR The process can be automated Back end Translate IR into target code instruction selection register allocation instruction scheduling lots of NPC problems -- use approximations
Compiler Structure Optimization goals improve running time of generated code improve space, power consumption, etc. how? perform a number of transformations on the IR multiple passes important: preserve meaning of code
The Front End Scanning (a.k.a. lexical analysis) recognize "words" (tokens) Parsing (a.k.a. syntax analysis) check syntax Semantic analysis examine meaning (e.g. type checking) Other issues: symbol table (to keep track of identifiers) error detection/reporting/recovery
The Scanner Its job: Good news given a character stream, recognize words (tokens) e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER collect identifier information e.g. IDENTIFIER corresponds to a lexeme (the actual word x) and its type (acquired from the declaration of x). ignore white space and comments report errors Good news the process can be automated
The Parser Its job: Good news Check and verify syntax based on specified syntax rules e.g. IDENTIFIER LPAREN RPAREN make up an EXPRESSION. Coming soon: how context-free grammars specify syntax Report errors Build IR often a syntax tree Good news the process can be automated
Semantic analysis Its job: Check the meaning of the program e.g. In x=y, is y defined before being used? Are x and y declared? e.g. In x=y, are the types of x and y such that you can assign one to the other? Meaning may depend on context Report errors
IRs Graphical Linear Hybrid Low-level or high-level e.g. parse tree, DAG Linear e.g. three-address code Hybrid e.g. linear for blocks of straight-line code, a graph to connect blocks Low-level or high-level
The scanning process Main goal: recognize words How? by recognizing patterns e.g. an identifier is a sequence of letters or digits that starts with a letter. Lexical patterns form a regular language Regular languages are described using regular expressions (REs) Can we create an automatic RE recognizer? Yes! (Hold that thought)
The scanning process Definition: Regular expressions (over alphabet ) is an RE denoting {} If , then is an RE denoting {} If r and s are REs, then (r) is an RE denoting L(r) r|s is an RE denoting L(r)L(s) rs is an RE denoting L(r)L(s) r* is an RE denoting the Kleene closure of L(r) Property: REs are closed under many operations This allows us to build complex REs.
The scanning process Definition: Deterministic Finite Automaton a five-tuple (, S, , s0, F) where is the alphabet S is the set of states is the transition function (SS) s0 is the starting state F is the set of final states (F S) Notation: Use a transition diagram to describe a DFA DFAs are equivalent to REs Hey! We just came up with a recognizer!
The scanning process Goal: automate the process Idea: Start with an RE Build a DFA How? We can build a non-deterministic finite automaton (Thompson's construction) Convert that to a deterministic one (Subset construction) Minimize the DFA (Hopcroft's algorithm) Implement it Existing scanner generator: flex