Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

Copyright © 2005 Elsevier Imperative languages Group languages as –imperative von Neumann(Fortran, Pascal, Basic, C) object-oriented(Smalltalk, Eiffel,
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Compiler Construction
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
COP4020 Programming Languages
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 Programming Languages Tevfik Koşar Lecture - II January 19 th, 2006.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction to Compiling
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS 3304 Comparative Languages
More on scanning: NFAs and Flex
Advanced Computer Systems
Component 1.6.
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Compiler Design (40-414) Main Text Book:
Chapter 1 Introduction.
CS 3304 Comparative Languages
Introduction to Compiler Construction
CS 3304 Comparative Languages
CS 326 Programming Languages, Concepts and Implementation
Lecture 2 Lexical Analysis
NFAs, scanners, and flex.
CS 326 Programming Languages, Concepts and Implementation
Programming Languages Translator
Chapter 2 :: Programming Language Syntax
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
Finite-State Machines (FSMs)
CS 153: Concepts of Compiler Design October 17 Class Meeting
Automata and Languages What do these have in common?
Chapter 1 Introduction.
Finite-State Machines (FSMs)
Compiler Lecture 1 CS510.
Chapter 2 :: Programming Language Syntax
CS416 Compiler Design lec00-outline September 19, 2018
Course supervisor: Lubna Siddiqui
Lexical and Syntax Analysis
Introduction CI612 Compiler Design CI612 Compiler Design.
CPSC 388 – Compiler Design and Construction
Chapter 2 :: Programming Language Syntax
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
CS 3304 Comparative Languages
CS416 Compiler Design lec00-outline February 23, 2019
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Lecture 5 Scanning.
NFAs, scanners, and flex.
Faculty of Computer Science and Information System
Compiler design Review COMP 442/6421 – Compiler Design
Presentation transcript:

Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references

Announcements Homework 1 due Wednesday Email pdf Or submit paper copy in class Be sure to at least glance at syllabus, and send questions my way No class Friday; instead, please read Chapter 2.1 of the textbook, and look at the first two pages of 2.2.

Compilation The process by which programming languages are turned into assembly or machine code Central topic in programming languages These are essentially translators, so they must semantically understand the code Output: assembly, machine code, or some intermediate language (i.e. Java ->Bytecode)

Compilation phases

Front end phases The first 3 phases are known as the “front end”, where the goal is to figure out the meaning of the program The last 3 are the “back end”, and are used to construct an equivalent target program in the output language These are split to make things independent: The front end can be shared between different systems The back end can be shared between different source languages

First phase: scanning Divides the program into "tokens", which are the smallest meaningful units; this saves time, since character-by-character processing is slow We can tune the scanner better if its job is simple; it also saves complexity (lots of it) for later stages You can design a parser to take characters instead of tokens as input, but it isn't pretty Theoretically, scanning is recognition of a regular language, e.g., via a deterministic finite automata (or DFA)

Phase 2: parsing Parsing is recognition of a context-free language, done via something called a push down automata (or PDA) Parsing discovers the "context free" structure of the program Informally, it finds the structure you can describe with syntax diagrams

Phase 3: semantic analysis Semantic analysis is the discovery of meaning in the program The compiler actually does what is called STATIC semantic analysis. That's the meaning that can be figured out at compile time Some things (e.g., array subscript out of bounds) can't be figured out until run time. Things like that are part of the program's DYNAMIC semantics

Phase 4: intermediate form Intermediate form (IF) is done after semantic analysis (if the program passes all checks) IFs are often chosen for machine independence, ease of optimization, or compactness Note: these are somewhat contradictory! They often resemble machine code for some imaginary idealized machine; e.g. a stack machine, or a machine with arbitrarily many registers Many compilers actually move the code through more than one IF

Bottom phases Optimization takes an intermediate-code program and produces another one that does the same thing faster, or in less space The optimization phase is optional, but is often key for speed Code generation phase produces assembly language or (sometime) machine language

Bottom phases (cont) Certain machine-specific optimizations (use of special instructions or addressing modes, etc.) may be performed during or after target code generation Symbol table: all phases rely on a symbol table that keeps track of all the identifiers in the program and what the compiler knows about them This symbol table may be retained (in some form) for use by a debugger, even after compilation has completed

An Overview of Compilation Lexical and Syntax Analysis: back to our GCD Program (in C) int main() { int i = getint(), j = getint(); while (i != j) { if (i > j) i = i - j; else j = j - i; } putint(i); Copyright © 2009 Elsevier

An Overview of Compilation Lexical and Syntax Analysis GCD Program Tokens Scanning (lexical analysis) groups characters into tokens, the smallest meaningful units of the program int main ( ) { int i = getint ( ) , j = getint ( ) ; while ( i != j ) { if ( i > j ) i = i - j ; else j = j - i ; } putint ( i ) ; Copyright © 2009 Elsevier

An Overview of Compilation Context-Free Grammar and Parsing Parsing organizes tokens into a parse tree that represents higher-level constructs in terms of their constituents Potentially recursive rules known as a context-free grammar define the ways in which these tokens can combine Copyright © 2009 Elsevier

An Overview of Compilation Context-Free Grammar and Parsing Example (while loop in C) iteration-statement → while ( expression ) statement statement, in turn, is often a list enclosed in braces: statement → compound-statement compound-statement → { block-item-list opt } where block-item-list opt → block-item-list or block-item-list opt → ϵ and block-item-list → block-item block-item-list → block-item-list block-item block-item → declaration block-item → statement Copyright © 2009 Elsevier

An Overview of Compilation Example: Our GCD Program Parse Tree B A next slide Copyright © 2009 Elsevier

An Overview of Compilation Context-Free Grammar and Parsing (continued) Copyright © 2009 Elsevier

An Overview of Compilation Context-Free Grammar and Parsing (continued) A B Copyright © 2009 Elsevier

Ch. 2 – a deeper look We’ll take a deeper look at scanning and parsing, the two parts of the “front end” of this process Each has deeper ties to theoretical models of computation, and useful concepts like regular expressions You may have seen these if you’ve done string manipulations.

Regular expressions A regular expression is defined (recursively) as: A character The empty string, ε 2 regular expressions concatenated 2 regular expressions connected by an “or”, usually written x | y 0 or more copies of a regular expression – written *, and called the Kleene star

Regular languages Regular languages are then the class of languages which can be described by a regular expression Example: L = 0*10* Another: L = (1|0)*

More regular languages Exercise: Give the regular expression for the language of binary strings that begin with a 0 and end with a 1 Exercise (a bit harder): Give the regular expression for the language of binary strings that start with a 0 and have odd length

A more realistic example Unsigned integers in Pascal: Examples: 4, or 82.3, or 5.23e-26 Formally:

Another view: DFAs Regular languages are also precisely the set of strings that can be accepted by a deterministic finite automata (DFA) Formally, a DFA is: a set of states an input alphabet a start state a set of accept states a transition function: given a state and input, outputs another state

DFAs More often, we’ll just draw a picture (like in graph theory) Example:

DFAs What regular language does the following DFA accept?

DFA examples What’s the DFA for the regular language: 1(0|1)*0 What’s the regular language accepted by this DFA?

Scanning Recall scanner is responsible for tokenizing source removing comments (often) dealing with pragmas (i.e., significant comments) saving text of identifiers, numbers, strings saving source locations (file, line, column) for error messages Copyright © 2009 Elsevier

Scanning Suppose we are building an ad-hoc (hand-written) scanner for Pascal: We read the characters one at a time with look-ahead If it is one of the one-character tokens { ( ) [ ] < > , ; = + - etc } we announce that token If it is a ., we look at the next character If that is a dot, we announce . Otherwise, we announce . and reuse the look-ahead Copyright © 2009 Elsevier

Scanning If it is a <, we look at the next character if that is a = we announce <= otherwise, we announce < and reuse the look-ahead, etc If it is a letter, we keep reading letters and digits and maybe underscores until we can't anymore then we check to see if it is a reserve word Copyright © 2009 Elsevier

Scanning If it is a digit, we keep reading until we find a non-digit if that is not a . we announce an integer otherwise, we keep looking for a real number if the character after the . is not a digit we announce an integer and reuse the . and the look-ahead Copyright © 2009 Elsevier

Scanning Pictorial representation of a scanner for calculator tokens, in the form of a finite automaton Copyright © 2009 Elsevier

Coding DFAs (scanners) That’s all well and good – but how to we program this stuff? A bunch of if/switch/case statements A table and driver (flex or other tools) Both have merits, and are described further in the book. We’ll mainly use the second route in homework, simply because there are many good tools out there.

Scanners Writing a pure DFA as a set of nested case statements is a surprisingly useful programming technique though it's often easier to use perl, awk, sed for details see Figure 2.4 in text Table-driven DFA is what lex and scangen produce lex (flex) in the form of C code – this will be an upcoming homework scangen in the form of numeric tables and a separate driver (for details see Figure 2.12)

Next week We’ll see a bit more about DFAs, and introduce NFAs. This is the rest of section 2.2, if you want to look ahead a bit. By the end of the week, we’ll move to discussing one table-driven DFA, flex, and have the first programming assignment over it the week after.