Compiler Construction 2011, Lecture 2

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
CS 345: Chapter 9 Algorithmic Universality and Its Robustness
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Compiler (scalac, gcc) Compiler (scalac, gcc) Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } Id3 = 0 while (id3 < 10) { println(“”,id3); id3.
Exercise: Build Lexical Analyzer Part For these two tokens, using longest match, where first has the priority: binaryToken ::= ( z |1) * ternaryToken ::=
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction
Symbol Table (  ) Contents Map identifiers to the symbol with relevant information about the identifier All information is derived from syntax tree -
Topic #3: Lexical Analysis
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lexical Analysis (I) Compiler Baojian Hua
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Lexer input and Output i d 3 = 0 LF w i d 3 = 0 LF w id3 = 0 while ( id3 < 10 ) id3 = 0 while ( id3 < 10 ) lexer Stream of Char-s ( lazy List[Char] ) class.
CSc 453 Lexical Analysis (Scanning)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Compiler (scalac, gcc) Compiler (scalac, gcc) Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } Id3 = 0 while (id3 < 10) { println(“”,id3); id3.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
Prof. Necula CS 164 Lecture 31 Lexical Analysis Lecture 3-4.
Exercise: (aa)* | (aaa)* Construct automaton and eliminate epsilons.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS 3304 Comparative Languages
CS314 – Section 5 Recitation 2
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Finite automate.
CS510 Compiler Lecture 2.
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Automata and Languages What do these have in common?
Finite-State Machines (FSMs)
Two issues in lexical analysis
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
CS 3304 Comparative Languages
4b Lexical analysis Finite Automata
CS 3304 Comparative Languages
4b Lexical analysis Finite Automata
Lexical Analysis - An Introduction
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Compiler Construction 2011, Lecture 2 http://lara.epfl.ch http://tiny.cc/compilers Drawing Hands M.C. Escher, 1948 Compiler Construction 2011, Lecture 2 Staff: Viktor Kuncak – Lectures Etienne Kneuss and Philippe Suter – {labs} Eva Darulova and Giuliano Losa – Exercises Regis Blanc – assistant Yvette Gallay – secretary Good starting 4min seemed to soon, but it’s good to start questions Minute 8 - quite dynamic. Good examples minute 14.5 – questions stressed in the start – better later Every 15 minutes some questions – and pause map rhoteric  real questions (write on a piece of paper, or tell to person next to you) Group discussions unclear 30 minutes is good time to introduce the course ----- merging both slides and board worked well initially, not so good later when asking questions, if they don’t answer, then ask them to discuss with a colleague Look less towards the blackboard 15-20min attention span Organization of the blackboard 2-5min to solve the small subproblem Good structure of the lecture vk: give questions after every lecture Grading percentage- but also give examples Did you understand the question Relounching the question

Reminder Register on: IS academia Moodle – so you can get our emails Our wonderful repository (reachable from course page) So please form the groups

(e.g. x86, arm, JVM) efficient to execute Compiler i=0 while (i < 10) { a[i] = 7*i+3 i = i + 1 } source code (e.g. Scala, Java,C) easy to write Construction data-flow graphs Compiler (scalac, gcc) optimizer i = 0 LF w h i l e i = 0 while ( i < 10 ) lexer parser assign while i 0 + * 3 7 i a[i] < 10 type check code gen characters words trees mov R1,#0 mov R2,#40 mov R3,#3 jmp +12 mov (a+R1),R3 add R1, R1, #4 add R3, R3, #7 cmp R1, R2 blt -16 machine code (e.g. x86, arm, JVM) efficient to execute

println(“”,id3); id3 = id3 + 1 } source code Construction Compiler Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } source code Construction Compiler (scalac, gcc) i d3 = 0 LF w id3 = 0 while ( id3 < 10 ) lexer parser assign while i 0 + * 3 7 i a[i] < 10 characters words (tokens) trees Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

Today: Lexical Analysis. Summary: lexical analyzer maps a stream of characters into a stream of tokens while doing that, it typically needs only bounded memory we can specify tokens for a lexical analyzers using regular expressions it is not difficult to construct a lexical analyzer manually we give an example for manually constructed analyzers, we often use the first character to decide on token class; a notion first(L) = { a | aw in L } we follow the maximal munch rule: lexical analyzer should eagerly accept the longest token that it can recognize from the current point it is possible to automate the construction of lexical analyzers; the starting point is conversion of regular expressions to automata tools that automate this construction are part of compiler-compilers, such as JavaCC described in the Tiger book automated construction of lexical analyzers from regular expressions is an example of compilation for a domain-specific language

While Language – Idea Small language used to illustrate key concepts Also used in your first lab – interpreter later labs will use a more complex language we continue to use while in lectures ‘while’ and ‘if’ are the control statements no procedures, no exceptions the only variables are of ‘int’ type no variable declarations, they are initially zero no objects, pointers, arrays

While Language – Example Programs x = 13; while (x > 1) {   println("x=", x);   if (x % 2 == 0) {     x = x / 2;   } else {     x = 3 * x + 1;   } } while (i < 100) { j = i + 1; while (j < 100) { println(“ “,i); println(“,”,j); j = j + 1; } i = i + 1; Does the program terminate for every initial value of x? (Collatz conjecture - open) Nested loop

Tokens (Words) of the While Language Ident ::= letter (letter | digit)* integerConst ::= digit digit* stringConst ::= “ AnySymbolExceptQuote* “ keywords if else while println special symbols ( ) && < == + - * / % ! - { } ; , letter ::= a | b | c | … | z | A | B | C | … | Z digit ::= 0 | 1 | … | 8 | 9 regular expressions

Regular Expressions: Definition One way to denote (often infinite) languages Regular expression is an expression built from: empty language  {ε}, denoted just ε {a} for a in Σ, denoted simply by a union, denoted | or, sometimes + concatenation, as multiplication or nothing Kleene star * Identifiers: letter (letter | digit)* (letter,digit are shorthands from before)

History: Kleene (from Wikipedia) Stephen Cole Kleene  (January 5, 1909, Hartford, Connecticut, United States – January 25, 1994, Madison, Wisconsin) was an American mathematician who helped lay the foundations for theoretical computer science. One of many distinguished students of Alonzo Church, Kleene, along with Alan Turing, Emil Post, and others, is best known as a founder of the branch of mathematical logic known as recursion theory. Kleene's work grounds the study of which functions are computable. A number of mathematical concepts are named after him: Kleene hierarchy, Kleene algebra, the Kleene star (Kleene closure), Kleene's recursion theorem and the Kleene fixpoint theorem. He also invented regular expressions, and was a leading American advocate of mathematical intuitionism.

Manually Constructing Lexers

println(“”,id3); id3 = id3 + 1 } source code Construction Compiler Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } source code Construction Compiler (scalac, gcc) i d3 = 0 LF w id3 = 0 while ( id3 < 10 ) lexer parser assign while i 0 + * 3 7 i a[i] < 10 characters words (tokens) trees Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

Lexer input and Output Stream of Char-s Stream of Token-s lexer class CharStream(fileName : String){ val file = new BufferedReader( new FileReader(fileName)) var current : Char = ' ' var eof : Boolean = false def next = { if (eof) throw EndOfInput("reading" + file) val c = file.read() eof = (c == -1) current = c.asInstanceOf[Char] } next Stream of Token-s sealed abstract class Token case class ID(content : String) // “id3” extends Token case class IntConst(value : Int) // 10 extends Token case class AssignEQ() ‘=‘ extends Token case class CompareEQ // ‘==‘ case class MUL() extends Token // ‘*’ case class PLUS() extends Token // + case clas LEQ extends Token // ‘<=‘ case class OPAREN extends Token //( case class CPAREN extends Token //) ... case class IF() extends Token // ‘if’ case class WHILE() extends Token case class EOF() extends Token // End Of File i d3 = 0 LF w id3 = 0 while ( id3 < 10 ) lexer class Lexer(ch : CharStream) { var current : Token def next : Unit = { lexer code here }

Identifiers and Keywords regular expression for identifiers: letter (letter|digit)* if (isLetter) { b = new StringBuffer while (isLetter || isDigit) { b.append(ch.current) ch.next } } keywords.lookup(b.toString) { case None => token=ID(b.toString) case Some(kw) => token=kw } Keywords look like identifiers, but are simply indicated as keywords in language definition A constant Map from strings to keyword tokens if not in map, then it is ordinary identifier

Integer Constants regular expression for integers: digit digit* if (isDigit) { k = while (isDigit) { k = ch.next } token = IntConst(k) } Keywords look like identifiers, but are simply indicated as keywords in language definition A constant Map from strings to keyword tokens if not in map, then it is ordinary identifier

Decision Tree to Map Symbols to Tokens ch.current match { case '(' => {current = OPAREN; ch.next; return} case ')' => {current = CPAREN; ch.next; return} case '+' => {current = PLUS; ch.next; return} case '/' => {current = DIV; ch.next; return} case '*' => {current = MUL; ch.next; return} case '=' => { ch.next if (ch.current=='=') {ch.next; current = CompareEQ; return} else {current = AssignEQ; return} } case '<' => { if (ch.current=='=') {ch.next; current = LEQ; return} else {current = LESS; return}

Skipping Comments if (ch.current='/') { ch.next while (!isEOL && !isEOF) { } Nested comments? /* foo /* bar */ baz */

Further Important Topics Longest Match Rule Combining pieces together computing first symbols for regular expressions Example of tiny lexical analyzer see wiki

Computing first symbols

Computing nullable expressions

Automating Construction of Lexers

Example in javacc TOKEN: { <IDENTIFIER: <LETTER> (<LETTER> | <DIGIT> | "_")* > | <CONSTANT: <DIGIT> (<DIGIT>)* > | <LETTER: ["a"-"z"] | ["A"-"Z"]> | <DIGIT: ["0"-"9"]> } SKIP: { " " | "\n" | "\t"

Finite Automaton Kinds of finite automata: deterministic non-deterministic with epsilon transition with regular expressions on edges

Interpretation of Non-Determinism For a given string, some paths in automaton lead to accepting, some to rejecting states Does the automaton accept? yes, if there exists an accepting path Continued in next lecture