Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.

Slides:



Advertisements
Similar presentations
Lexical Analysis Dragon Book: chapter 3.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Week 2 Questions / Concerns Schedule this week: Homework1 & Lab1a due at midnight on Friday. Sherry will be in Klamath Falls on Friday Lexical Analyzer.
Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Compiler construction in4020 – lecture 10 Koen Langendoen Delft University of Technology The Netherlands.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Compiler construction 2002
Lecture 02 – Lexical Analysis Eran Yahav 1. 2 You are here Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Compiler construction 2002
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lexical Analysis (I) Compiler Baojian Hua
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
using Deterministic Finite Automata & Nondeterministic Finite Automata
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS 3304 Comparative Languages
CS314 – Section 5 Recitation 2
Chapter 3 Lexical Analysis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
RegExps & DFAs CS 536.
Finite-State Machines (FSMs)
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
CS416 Compiler Design lec00-outline February 23, 2019
Lexical Analysis - An Introduction
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands

Summary of lecture 1 compiler is a structured toolbox front-end: program text  annotated AST back-end: annotated AST  executable code lexical analysis: program text  tokens token specifications implementation by hand program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler

Quiz 2.7 What does the regular expression a?* mean? And a** ? Are these expressions erroneous? Are they ambiguous?

Overview Generating a lexical analyzer generic methods specific tool lex program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

Token description (f)lex: scanner generator for UNIX token description  C code format of the lex input file: definitions % rules % user code regular descriptions regular expressions + actions auxiliary C-code

Lex description to recognize integers an integer is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). base  [bo] integer  digit+ base? rule = expr + action {} signal application of a description %{ #include "lex.h" %} base [bo] digit [0-9] % {digit}+ {base}? {return INTEGER;} %

Lex resulting C-code char yytext[]; /* token representation */ int yylex(void); /* returns type of next token */ wrapper function to add token attributes % \n {line_number++;} % void get_next_token(void) { Token.class = yylex(); if (Token.class == 0) { Token.class = EOF; Token.repr = " "; return; } Token.pos.line_number = line_number; Token.repr = strdup(yytext); }

automatic generation program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description finite state automaton S0 ‘.’ digit S2 digit S3 digit S1 ‘.’

Finite-state automaton Recognize input character by character Transfer between states FSA Initial state S0 set of accepting states transition function: State x Char  State S0 ‘i’‘i’ S1S1S2S2 ‘f’‘f’

FSA examples integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ digit S0 ‘.’ digit S2 digit S3 digit S0 digit S1

Concurrent recognition integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ recognize both tokens in one pass digit S0 ‘.’ digit S2 digit S3 digit S0 digit S1

Concurrent recognition integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ naïve approach: merge initial states digit S0 ‘.’ digit S2 digit S3 digit S1

Concurrent recognition integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ correct approach: share common prefix transitions S0 ‘.’ digit S2 digit S3 digit S1 ‘.’

FSA implementation: transition table concurrent recognition of integers and fixed point numbers state character recognized token digitdotother S0 S1S2- S1 S2-integer S2 S3-- --fixed point S0 ‘.’ digit S2 digit S3 digit S1 ‘.’

FSA exercise (6 min.) draw an FSA to recognize integers base  [bo] integer  digit+ base? draw an FSA to recognize the regular expression (a|b)*bab

Answers

integer (a|b)*bab S2S3 S0S1 a b b b b a a a digit S0 digit S1 S2S2 [bo]

Break

Automatic generation: description  FSA start with initial set (S0) of all token descriptions to be recognized for each character (ch) find the set (S ch ) of descriptions that can start with ch extend the FSA with transition (S0,ch, S ch ) repeat adding transitions (to S ch ) until no new set is generated

Dotted items keeping track of matched characters in a token description: T  R regular expression  input already matchedstill to be matched T    

Types of dotted items shift item: dot in front of a basic pattern if   ‘i’ ‘f’ if  ‘i’  ‘f’ identifier   [a-z] [a-z0-9]* reduce item: dot at the end if  ‘i’ ‘f’  identifier  [a-z] [a-z0-9]*  non-basic item: dot in front of repeated pattern or parenthesis identifier  [a-z]  [a-z0-9]*

Character moves input T    c  c input c T   c  

Character moves T    c  T    [class]  T   .  input T    c  c input c T   c   c c  class T   c   T  .   T   [class]  

 moves T    (R)?   T   (R)?   T   (  R)?  T    (R)*   T   (R)*   T   (  R)*  T   (R  )*   T   (R)*   T   (  R)*  T   (R  )?   T   (R)?  

 moves T    (R)+   T   (  R)+  T   (R  )+   T   (R)+   T    (R 1 |R 2 |…)   T   (  R 1 |R 2 |…)  T   (R 1 |  R 2 |…)  … T   (R 1  |R 2 |…)   T   (R 1 |R 2 |…)   …… … T   (  R)+ 

FSA construction a state corresponds to a set of basic items a character move yields a new set expand non-basic items into basic items using  moves see if the resulting set was produced before, if not introduce a new state add transition

Example FSA construction tokens integer: I  (D)+ fixed-point: F  (D)* ‘.’ (D)+ initial state I   (D)+ F   (D)* ‘.’ (D)+ I  (  D)+ F  (  D)* ‘.’ (D)+ F  (D)*  ‘.’ (D)+ S0  moves

F  (D)* ‘.’ (D  )+ F  (D)* ‘.’ (D)+  F  (D)* ‘.’ (  D)+ S3 D I  (D  )+ F  (D  )* ‘.’ (D)+ Example FSA construction character moves I  (  D)+ F  (  D)* ‘.’ (D)+ F  (D)*  ‘.’ (D)+ S0 F  (D)* ‘.’  (D)+F  (D)* ‘.’ (  D)+ I  (D)+  I  (  D)+ F  (D  )* ‘.’ (D)+ I  (D)+  I  (  D)+ F  (D)*  ‘.’ (D)+ F  (  D)* ‘.’ (D)+ S1 D ‘.’ S2 ‘.’ D D

Exercise FSA construction (7 min.) draw the FSA (with item sets) for recognizing an identifier: identifier  letter (letter_or_digit_or_und* letter_or_digit+)? extend the above FSA to recognize the keyword ‘if’ as well. if  ‘i’ ‘f’

Answers

ID  L ((  LDU)* (LD)+)? ID  L ((LDU)* (  LD)+)? S2 ID   L ((LDU)* (LD)+)? S0 ID  L ((LDU)* (LD)+)?  ID  L ((  LDU)* (LD)+)? ID  L ((LDU)* (  LD)+)? S1 L LD U U ‘i’ LD ‘f’ LD U U S3 S4 accepting states S1 and S4

Transition table compression state character recognized token ‘i’‘f’LDU S0S3S1 -- S2identifier S2S1 S2 S3S1S4S1 S2 S4S1 S2keyword if

Transition table compression redundant rows empty transitions state character recognized token ‘i’‘f’LDU S0S3S1 -- S2identifier S2S1 S2 S3S1S4S1 S2 S4S1 S2keyword if S1S4S1 S2S1 S2S1 S3 S0 S1 S2 S4 S3 row displacement

Summary: generating a lexical analyzer tool: lex token descriptions + actions wrapper interface FSA construction dotted items character moves  moves program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

Homework study sections – lexical identification of tokens symbol tables macro processing print handout lecture 3 [blackboard] find a partner for the “practicum” register your group send to