Scanner Scanner 2301373 Introduction to Compilers.

Slides:



Advertisements
Similar presentations
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
CSc 453 Lexical Analysis (Scanning)
Chapter 3 Lexical Analysis Yu-Chen Kuo.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Lexical Analysis Natawut Nupairoj, Ph.D.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis Hira Waseem Lecture
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
2. Scanning College of Information and Communications Prof. Heejin Park.
Lexical Analyzer (Checker)
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
CS30003: Compilers Lexical Analysis Lecture Date: 05/08/13 Submission By: DHANJIT DAS, 11CS10012.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analyzer in Perspective
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Scanner Introduction to Compilers 1 Scanner.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 3(01/21/98)
The Role of Lexical Analyzer
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
System Software Theory (5KS03).
Lexical Analyzer in Perspective
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Short introduction to compilers
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Chapter 3: Lexical Analysis
Review: Compiler Phases:
Recognition of Tokens.
R.Rajkumar Asst.Professor CSE
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Scanner Scanner 2301373 Introduction to Compilers

Outline Introduction How to construct a scanner Regular expressions describing tokens FA recognizing tokens Implementing a DFA Error Handling Buffering 2301373 Introduction to Compilers Scanner

Introduction A scanner, sometimes called a lexical analyzer gets a stream of characters (source program) divides it into tokens Tokens are units that are meaningful in the source language. Lexemes are strings which match the patterns of tokens. 2301373 Introduction to Compilers Scanner

Examples of Tokens in C Tokens Lexemes identifier Age, grade,Temp, zone, q1 number 3.1416, -498127,987.76412097 string “A cat sat on a mat.”, “90183654” open parentheses ( close parentheses ) Semicolon ; reserved word if IF, if, If, iF 2301373 Introduction to Compilers Scanner

Scanning When a token is found: Examples of attributes It is passed to the next phase of compiler. Sometimes values associated with the token, called attributes, need to be calculated. Some tokens, together with their attributes, must be stored in the symbol/literal table. it is necessary to check if the token is already in the table Examples of attributes Attributes of a variable are name, address, type, etc. An attribute of a numeric constant is its value. 2301373 Introduction to Compilers Scanner

How to construct a scanner Define tokens in the source language. Describe the patterns allowed for tokens. Write regular expressions describing the patterns. Construct an FA for each pattern. Combine all FA’s which results in an NFA. Convert NFA into DFA Write a program simulating the DFA. 2301373 Introduction to Compilers Scanner

Regular Expression l f a character or symbol in the alphabet : an empty string : an empty set if r and s are regular expressions r | s r s r * (r ) l f 2301373 Introduction to Compilers Scanner

Extension of regular expr. [a-z] any character in a range from a to z . any character r + one or more repetition r ? optional subexpression ~(a | b | c), [^abc] any single character NOT in the set 2301373 Introduction to Compilers Scanner

Examples of Patterns l (a | A) = the set {a, A} [0-9]+ = (0 |1 |...| 9) (0 |1 |...| 9)* (0-9)? = (0 | 1 |...| 9 | ) [A-Za-z] = (A |B |...| Z |a |b |...| z) A . = the string with A following by any one symbol ~[0-9] = [^0123456789] = any character which is not 0, 1, ..., 9 l 2301373 Introduction to Compilers Scanner

Describing Patterns of Tokens reservedIF = (IF| if| If| iF) = (I|i)(F|f) letter = [a-zA-Z] digit =[0-9] identifier = letter (letter|digit)* numeric = (+|-)? digit+ (. digit+)? (E (+|-)? digit+)? Comments { (~})* } /* ([^*]*[^/]*)* */ ;(~newline)* newline 2301373 Introduction to Compilers Scanner

Disambiguating Rules IF is an identifier or a reserved word? A reserved word cannot be used as identifier. A keyword can also be identifier. <= is < and = or <=? Principle of longest substring When a string can be either a single token or a sequence of tokens, single-token interpretation is preferred. 2301373 Introduction to Compilers Scanner

FA Recognizing Tokens Identifier Numeric Comment . ~/ / * ~* letter letter,digit E digit . +,-,e ~/ / * ~* 2301373 Introduction to Compilers Scanner

Combining FA’s Identifiers Reserved words Combined E,e L,l S,s I,i F,f letter letter,digit E,e L,l S,s I,i F,f other letter letter,digit E,e L,l S,s I,i F,f 2301373 Introduction to Compilers Scanner

Lookahead letter, digit I,i F,f Return ID [other] Return IF 2301373 Introduction to Compilers Scanner

Implementing DFA nested-if transition table E,e L,l S,s I,i F,f letter,digit E,e L,l S,s I,i F,f [other] Return IF Return ID Return ELSE nested-if transition table 2301373 Introduction to Compilers Scanner

Nested IF letter, digit other 1 4 letter digit 2 … other 3 … switch (state) { case 0: { if isletter(nxt) state=1; elseif isdigit(nxt) state=2; else state=3; break; } case 1: { if isletVdig(nxt) else state=4; … letter, digit other 1 4 letter digit 2 … other 3 … 2301373 Introduction to Compilers Scanner

Transition table St ch 1 2 3 … letter .. digit 4 letter, digit other 1 1 2 3 … letter .. digit 4 letter, digit other 1 4 letter digit 2 … other 3 … 2301373 Introduction to Compilers Scanner

Simulating a DFA initialize current_state=start while (not final(current_state)) { next_state=dfa(current_state, next) current_state=next_state; } 2301373 Introduction to Compilers Scanner

Error Handling Delete an extraneous character Insert a missing character Replace an incorrect character by a correct character Transposing two adjacent characters 2301373 Introduction to Compilers Scanner

Delete an extraneous character . +,-,e digit +,-,e digit digit E % error digit digit digit 2301373 Introduction to Compilers Scanner

Insert a missing character . +,-,e digit +,-,e digit digit E +,-,e digit digit digit error 2301373 Introduction to Compilers Scanner

Replace an incorrect character . +,-,e . digit digit E +,-,e digit digit : digit error digit 2301373 Introduction to Compilers Scanner

Transpose adjacent characters > = error Correct token: >= 2301373 Introduction to Compilers Scanner

Buffering Single Buffer Buffer Pair Sentinel 2301373 Introduction to Compilers Scanner

Single Buffer forward begin found reload The first part of the token will be lost if it is not stored somewhere else ! 2301373 Introduction to Compilers Scanner

Buffer Pairs A buffer is reloaded when forward pointer reaches the end of the other buffer. Similar for the second half of the buffer. Check twice for the end of buffer if the pointer is not at the end of the first buffer! 2301373 Introduction to Compilers Scanner

Sentinel For the buffer pair, it must be checked twice for each move of the forward pointer if the pointer is at the end of a buffer. reload EOF sentinel Using sentinel, it must be checked only once for most of the moves of the forward pointer. 2301373 Introduction to Compilers Scanner