CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.

Slides:



Advertisements
Similar presentations
Finite-State Machines with No Output Ying Lu
Advertisements

4b Lexical analysis Finite Automata
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CS5371 Theory of Computation
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
1 Foundations of Software Design Lecture 24: Compilers, Lexers, and Parsers; Intro to Graphs Marti Hearst Fall 2002.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
2. Scanning College of Information and Communications Prof. Heejin Park.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Department of Software & Media Technology
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Finite automate.
Lecture 2 Lexical Analysis
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Nondeterministic Finite Automata (NFA)
RegExps & DFAs CS 536.
Finite-State Machines (FSMs)
Two issues in lexical analysis
Recognizer for a Language
Deterministic Finite Automata
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Review: Compiler Phases:
Non Deterministic Automata
CS 3304 Comparative Languages
Finite Automata.
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Lexical Analysis.
Lecture 5 Scanning.
Presentation transcript:

CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata

Compilers Organization Lexical Analyzer (Scanner) Syntax Analyzer (Parser) Symantic Analyzer Intermediate Code Generator Optimizer Code Generator Symbol Table

The Scanner  Input: characters from the source program.  Groups characters into lexemes sequences of characters that "go together"  Output: tokens (plus maybe some additional information)  Scanner also discovers lexical errors (e.g., erroneous characters such as # in java).  each time scanner’s nextToken() method is called: find longest sequence of characters in input stream, starting with the current character, that corresponds to a lexeme, and should return the corresponding token

Scanner Generators  Scanner Generators make Scanners (don’t need to hand code a scanner)  Lex and Flex create C source code for scanner  JFlex creates Java source code for scanner  Input to Scanner Generator is a file containing (among other things) Regular Expressions

Scanner Generator.Jlex file Containing Regular Expressions.java file Containing Scanner code To understand Regular Expressions you need to understand Finite-State Automata

Finite State Automata  A compiler recognizes legal programs in some (source) language.  A finite-state machine recognizes legal strings in some language.  The input is a sequence of characters.  The output is to accept or reject input

Example FSM  Nodes are states.  Edges (arrows) are transitions, labeled with a single character. My single edge labeled "letter“ stands for 52 edges labeled 'a', 'b',..., 'z', 'A',..., 'Z'.(Similarly for “digit“)  S is the start state; every FSA has exactly one (a standard convention is to label the start state "S").  A is a final state. By convention, final states are drawn using a double circle, and non-final states are drawn using single circles. A FSA may have more than one final state. S A letter letter,digit

Applying FSA to Input  The FSA starts in its start state.  If there is a edge out of the current state whose label matches the current input character, then the FSA moves to the state pointed to by that edge, and "consumes" that character; otherwise, it gets stuck.  The finite-state automata stops when it gets stuck or when it has consumed all of the input characters.  An input string is accepted by a FSA if: The entire string is consumed (the machine did not get stuck) the machine ends in a final state.  The language defined by a FSA is the set of strings accepted by the FSA. S A letter letter,digit aX23 Y1aBss c 1AbeR6 343 A?

Try It  Question 1: Write a finite-state automata that accepts Java identifiers (one or more letters, digits, underscores, or dollar signs, not starting with a digit).  Question 2: Write a finite-state automata that accepts only Java identifiers that do not end with an underscore.

Another Example FSA FSA accepts integers with optional plus or minus S B digit A +-

FSA Formal Definition (5-tuple) Q – a finite set of states Σ – The alphabet of the automata (finite set of characters to label edges) δ – state transition function δ(state i,character)  state j q – The start state F – The set of final states

Transition Table for δ(state i,character)  state j Characters +-Digit States SAAB AB BB

Types of FSA  Deterministic (DFA) No State has more than one outgoing edge with the same label  Non-Deterministic (NFA) States may have more than one outgoing edge with same label. Edges may be labeled with ε, the empty string. The FSA can take an epsilon transition without looking at the current input character.

Example NFA NFA accepts integers with optional plus or minus A string is accepted by a NFA if there exists a sequence of moves starting in the start state, ending in a final state, that consumes the entire string S B digit A +- ε Consider Scanning +75 After ScanningCan be in State (nothing) S A + A -stuck- +7 B -stuck- +75 B -stuck- Accept Input

NFA, DFA equivalence For every non-deterministic finite-state automata M, there exists a deterministic automata M' such that M and M' accept the same language.

Programming a DFA  Use a Table current_state=S (start state) Repeat: read next character use table to update current_state Until machine gets stuck (reject) or entire input is read If current_state == one of final states accept Else reject Characters +-Digit State SAAB AB BB