Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Construction 2011, Lecture 2

Similar presentations


Presentation on theme: "Compiler Construction 2011, Lecture 2"— Presentation transcript:

1 Compiler Construction 2011, Lecture 2
Drawing Hands M.C. Escher, 1948 Compiler Construction 2011, Lecture 2 Staff: Viktor Kuncak – Lectures Etienne Kneuss and Philippe Suter – {labs} Eva Darulova and Giuliano Losa – Exercises Regis Blanc – assistant Yvette Gallay – secretary Good starting 4min seemed to soon, but it’s good to start questions Minute 8 - quite dynamic. Good examples minute 14.5 – questions stressed in the start – better later Every 15 minutes some questions – and pause map rhoteric  real questions (write on a piece of paper, or tell to person next to you) Group discussions unclear 30 minutes is good time to introduce the course ----- merging both slides and board worked well initially, not so good later when asking questions, if they don’t answer, then ask them to discuss with a colleague Look less towards the blackboard 15-20min attention span Organization of the blackboard 2-5min to solve the small subproblem Good structure of the lecture vk: give questions after every lecture Grading percentage- but also give examples Did you understand the question Relounching the question

2 Reminder Register on: IS academia Moodle – so you can get our emails
Our wonderful repository (reachable from course page) So please form the groups

3 (e.g. x86, arm, JVM) efficient to execute
Compiler i=0 while (i < 10) { a[i] = 7*i+3 i = i + 1 } source code (e.g. Scala, Java,C) easy to write Construction data-flow graphs Compiler (scalac, gcc) optimizer i = 0 LF w h i l e i = 0 while ( i < 10 ) lexer parser assign while i 0 + * 3 7 i a[i] < 10 type check code gen characters words trees mov R1,#0 mov R2,#40 mov R3,#3 jmp +12 mov (a+R1),R3 add R1, R1, #4 add R3, R3, #7 cmp R1, R2 blt -16 machine code (e.g. x86, arm, JVM) efficient to execute

4 println(“”,id3); id3 = id3 + 1 } source code Construction
Compiler Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } source code Construction Compiler (scalac, gcc) i d3 = 0 LF w id3 = 0 while ( id3 < 10 ) lexer parser assign while i 0 + * 3 7 i a[i] < 10 characters words (tokens) trees Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

5 Today: Lexical Analysis. Summary:
lexical analyzer maps a stream of characters into a stream of tokens while doing that, it typically needs only bounded memory we can specify tokens for a lexical analyzers using regular expressions it is not difficult to construct a lexical analyzer manually we give an example for manually constructed analyzers, we often use the first character to decide on token class; a notion first(L) = { a | aw in L } we follow the maximal munch rule: lexical analyzer should eagerly accept the longest token that it can recognize from the current point it is possible to automate the construction of lexical analyzers; the starting point is conversion of regular expressions to automata tools that automate this construction are part of compiler-compilers, such as JavaCC described in the Tiger book automated construction of lexical analyzers from regular expressions is an example of compilation for a domain-specific language

6 While Language – Idea Small language used to illustrate key concepts
Also used in your first lab – interpreter later labs will use a more complex language we continue to use while in lectures ‘while’ and ‘if’ are the control statements no procedures, no exceptions the only variables are of ‘int’ type no variable declarations, they are initially zero no objects, pointers, arrays

7 While Language – Example Programs
x = 13; while (x > 1) {   println("x=", x);   if (x % 2 == 0) {     x = x / 2;   } else {     x = 3 * x + 1;   } } while (i < 100) { j = i + 1; while (j < 100) { println(“ “,i); println(“,”,j); j = j + 1; } i = i + 1; Does the program terminate for every initial value of x? (Collatz conjecture - open) Nested loop

8 Tokens (Words) of the While Language
Ident ::= letter (letter | digit)* integerConst ::= digit digit* stringConst ::= “ AnySymbolExceptQuote* “ keywords if else while println special symbols ( ) && < == + - * / % ! - { } ; , letter ::= a | b | c | … | z | A | B | C | … | Z digit ::= 0 | 1 | … | 8 | 9 regular expressions

9 Regular Expressions: Definition
One way to denote (often infinite) languages Regular expression is an expression built from: empty language  {ε}, denoted just ε {a} for a in Σ, denoted simply by a union, denoted | or, sometimes + concatenation, as multiplication or nothing Kleene star * Identifiers: letter (letter | digit)* (letter,digit are shorthands from before)

10 History: Kleene (from Wikipedia)
Stephen Cole Kleene  (January 5, 1909, Hartford, Connecticut, United States – January 25, 1994, Madison, Wisconsin) was an American mathematician who helped lay the foundations for theoretical computer science. One of many distinguished students of Alonzo Church, Kleene, along with Alan Turing, Emil Post, and others, is best known as a founder of the branch of mathematical logic known as recursion theory. Kleene's work grounds the study of which functions are computable. A number of mathematical concepts are named after him: Kleene hierarchy, Kleene algebra, the Kleene star (Kleene closure), Kleene's recursion theorem and the Kleene fixpoint theorem. He also invented regular expressions, and was a leading American advocate of mathematical intuitionism.

11 Manually Constructing Lexers

12 println(“”,id3); id3 = id3 + 1 } source code Construction
Compiler Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } source code Construction Compiler (scalac, gcc) i d3 = 0 LF w id3 = 0 while ( id3 < 10 ) lexer parser assign while i 0 + * 3 7 i a[i] < 10 characters words (tokens) trees Lexer is specified using regular expressions. Groups characters into tokens and classifies them into token classes.

13 Lexer input and Output Stream of Char-s Stream of Token-s lexer
class CharStream(fileName : String){ val file = new BufferedReader( new FileReader(fileName)) var current : Char = ' ' var eof : Boolean = false def next = { if (eof) throw EndOfInput("reading" + file) val c = file.read() eof = (c == -1) current = c.asInstanceOf[Char] } next Stream of Token-s sealed abstract class Token case class ID(content : String) // “id3” extends Token case class IntConst(value : Int) // 10 extends Token case class AssignEQ() ‘=‘ extends Token case class CompareEQ // ‘==‘ case class MUL() extends Token // ‘*’ case class PLUS() extends Token // + case clas LEQ extends Token // ‘<=‘ case class OPAREN extends Token //( case class CPAREN extends Token //) ... case class IF() extends Token // ‘if’ case class WHILE() extends Token case class EOF() extends Token // End Of File i d3 = 0 LF w id3 = 0 while ( id3 < 10 ) lexer class Lexer(ch : CharStream) { var current : Token def next : Unit = { lexer code here }

14 Identifiers and Keywords
regular expression for identifiers: letter (letter|digit)* if (isLetter) { b = new StringBuffer while (isLetter || isDigit) { b.append(ch.current) ch.next } } keywords.lookup(b.toString) { case None => token=ID(b.toString) case Some(kw) => token=kw } Keywords look like identifiers, but are simply indicated as keywords in language definition A constant Map from strings to keyword tokens if not in map, then it is ordinary identifier

15 Integer Constants regular expression for integers: digit digit*
if (isDigit) { k = while (isDigit) { k = ch.next } token = IntConst(k) } Keywords look like identifiers, but are simply indicated as keywords in language definition A constant Map from strings to keyword tokens if not in map, then it is ordinary identifier

16 Decision Tree to Map Symbols to Tokens
ch.current match { case '(' => {current = OPAREN; ch.next; return} case ')' => {current = CPAREN; ch.next; return} case '+' => {current = PLUS; ch.next; return} case '/' => {current = DIV; ch.next; return} case '*' => {current = MUL; ch.next; return} case '=' => { ch.next if (ch.current=='=') {ch.next; current = CompareEQ; return} else {current = AssignEQ; return} } case '<' => { if (ch.current=='=') {ch.next; current = LEQ; return} else {current = LESS; return}

17 Skipping Comments if (ch.current='/') { ch.next while (!isEOL && !isEOF) { } Nested comments? /* foo /* bar */ baz */

18 Further Important Topics
Longest Match Rule Combining pieces together computing first symbols for regular expressions Example of tiny lexical analyzer see wiki

19 Computing first symbols

20 Computing nullable expressions

21 Automating Construction of Lexers

22 Example in javacc TOKEN: { <IDENTIFIER: <LETTER> (<LETTER> | <DIGIT> | "_")* > | <CONSTANT: <DIGIT> (<DIGIT>)* > | <LETTER: ["a"-"z"] | ["A"-"Z"]> | <DIGIT: ["0"-"9"]> } SKIP: { " " | "\n" | "\t"

23 Finite Automaton Kinds of finite automata: deterministic
non-deterministic with epsilon transition with regular expressions on edges

24 Interpretation of Non-Determinism
For a given string, some paths in automaton lead to accepting, some to rejecting states Does the automaton accept? yes, if there exists an accepting path Continued in next lecture


Download ppt "Compiler Construction 2011, Lecture 2"

Similar presentations


Ads by Google