Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Compiler Construction
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CS 540 Spring CS 540 Spring 2013 GMU2 The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
2. Scanning College of Information and Communications Prof. Heejin Park.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Scanner Introduction to Compilers 1 Scanner.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis.
1st Phase Lexical Analysis
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Department of Software & Media Technology
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lecture 2 Lexical Analysis
Scanner Scanner Introduction to Compilers.
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Short introduction to compilers
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Finite-State Machines (FSMs)
Two issues in lexical analysis
Recognizer for a Language
Recognition of Tokens.
Scanner Scanner Introduction to Compilers.
4b Lexical analysis Finite Automata
Scanner Scanner Introduction to Compilers.
4b Lexical analysis Finite Automata
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Presentation transcript:

Lexical Analysis The Scanner Scanner 1

Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens Tokens are units that are meaningful in the source language. Lexemes are strings which match the patterns of tokens. Scanner2

Examples of Tokens in C Scanner3 TokensLexemes identifierAge, grade,Temp, zone, q1 number3.1416, , string“A cat sat on a mat.”, “ ” open parentheses( close parentheses) Semicolon; reserved word ifIF, if, If, iF

Scanning When a token is found: – It is passed to the next phase of compiler. – Sometimes values associated with the token, called attributes, need to be calculated. – Some tokens, together with their attributes, must be stored in the symbol/literal table. it is necessary to check if the token is already in the table Examples of attributes – Attributes of a variable are name, address, type, etc. – An attribute of a numeric constant is its value. Scanner4

How to construct a scanner Define tokens in the source language. Describe the patterns allowed for tokens. Write regular expressions describing the patterns. Construct an FA for each pattern. Combine all FA’s which results in an NFA. Convert NFA into DFA Write a program simulating the DFA. Scanner5

Regular Expression a character or symbol in the alphabet an empty string an empty set if r and s are regular expressions – r | s – r s – r * – (r ) Scanner6 

Extension of regular expr. [a-z] – any character in a range from a to z. – any character r + – one or more repetition r ? – optional subexpression ~(a | b | c), [^abc] – any single character NOT in the set Scanner7

Examples of Patterns (a | A) = the set {a, A} [0-9] + = (0 |1 |...| 9) (0 |1 |...| 9)* (0-9)? = (0 | 1 |...| 9 | ) [A-Za-z] = (A |B |...| Z |a |b |...| z) A. = the string with A following by any one symbol ~[0-9] = [^ ] = any character which is not 0, 1,..., 9 Scanner8

Describing Patterns of Tokens reservedIF = (IF| if| If| iF) = (I|i)(F|f) letter = [a-zA-Z] digit =[0-9] identifier = letter (letter|digit)* numeric = (+|-)? digit + (. digit + )? (E (+|-)? digit + )? Comments – { (~})* }// from tiny C grammar – /* ([^*]*[^/]*)* */// C-style comments – ;(~newline)* newline// Assembly lang comments Scanner9

Disambiguating Rules IF is an identifier or a reserved word? – A reserved word cannot be used as identifier. – A keyword can also be identifier. <= is < and = or <=? – Principle of longest substring When a string can be either a single token or a sequence of tokens, single-token interpretation is preferred. Scanner10

11 Nondeterministic Finite Automata A nondeterministic finite automaton (NFA) is a mathematical model that consists of 1.A set of states S 2.A set of input symbols  3.A transition function that maps state/symbol pairs to a set of states: S x {  +  }  set of S 4.A special state s 0 called the start state 5.A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no

12 STATE ab  00, Transition Table: 0123 a,b abb  S = {0,1,2,3} S 0 = 0  = {a,b} F = {3} Example NFA

13 NFA Execution An NFA says ‘yes’ for an input string if there is some path from the start state to some final state where all input has been processed. NFA(int s0, int input_element) { if (all input processed and s 0 is a final state) return Yes; if (all input processed and s 0 is not a final state) return No; for all states s 1 where transition(s 0,table[input_element]) = s 1 if (NFA(s 1,input_element+1) = = Yes) return Yes; for all states s 1 where transition(s 0,  ) = s 1 if (NFA(s 1,input_element) = = Yes) return Yes; return No; } Uses backtracking to search all possible paths

14 Deterministic Finite Automata A deterministic finite automaton (DFA) is a mathematical model that consists of 1.A set of states S 2.A set of input symbols  3.A transition function that maps state/symbol pairs to a state: S x   S 4.A special state s 0 called the start state 5.A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no

FA Recognizing Tokens Identifier Numeric Comment Scanner 15 ~/ / ** / ~* E digit. E +,-,e digit +,-,e letter letter,digit

Examples from textbook Section 2.3 identifier = letter(letter|digit)* Scanner16

Combining FA’s Identifiers Reserved words Combined Scanner 17 I,iF,f E,eL,lS,sE,e other letterletter,digit E,eL,lS,sE,e I,i F,f letter letter,digit

Lookahead Scanner 18 I,iF,f [other] letter, digit Return ID Return IF

Implementing DFA nested-if transition table Scanner19 letter,digit E,eL,lS,sE,e I,i F,f [other] Return IF Return ID Return ELSE

Nested IF switch (state) { case 0: { if isletter(nxt) state=1; elseif isdigit(nxt) state=2; else state=3; break; } case 1: { if isletVdig(nxt) state=1; else state=4; break; } … } Scanner letter digit other letter, digit other … …

Transition table St ch 0123… letter11.. digit21.. …34 Scanner letter digit other letter, digit other … …