Chapter Four: DFA Applications

Slides:



Advertisements
Similar presentations
MOD III. Input / Output Streams Byte streams Programs use byte streams to perform input and output of 8-bit bytes. This Stream handles the 8-bit.
Advertisements

Chapter Three: Closure Properties for Regular Languages
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Formal Language, chapter 4, slide 1Copyright © 2007 by Adam Webber Chapter Four: DFA Applications.
CSc 453 Lexical Analysis (Scanning)
Lexical Analysis (4.2) Programming Languages Hiram College Ellen Walker.
Unit 201 FILE IO Types of files Opening a text file for reading Reading from a text file Opening a text file for writing/appending Writing/appending to.
Strings and File I/O. Strings Java String objects are immutable Common methods include: –boolean equalsIgnoreCase(String str) –String toLowerCase() –String.
1 The First Step Learning objectives write Java programs that display text on the screen. distinguish between the eight built-in scalar types of Java;
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
1 Course Lectures Available on line:
Abstract Data Types (ADTs) and data structures: terminology and definitions A type is a collection of values. For example, the boolean type consists of.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Basic Java Programming CSCI 392 Week Two. Stuff that is the same as C++ for loops and while loops for (int i=0; i
Java means Coffee Java Coffee Beans The name “JAVA” was taken from a cup of coffee.
Chapter 9 1 Chapter 9 – Part 1 l Overview of Streams and File I/O l Text File I/O l Binary File I/O l File Objects and File Names Streams and File I/O.
Fall 2006Costas Busch - RPI1 Deterministic Finite Automaton (DFA) Input Tape “Accept” or “Reject” String Finite Automaton Output.
 JAVA Compilation and Interpretation  JAVA Platform Independence  Building First JAVA Program  Escapes Sequences  Display text with printf  Data.
Chapter 2: Java Fundamentals
1 Recitation 8. 2 Outline Goals of this recitation: 1.Learn about loading files 2.Learn about command line arguments 3.Review of Exceptions.
1 Week 12 l Overview of Streams and File I/O l Text File I/O Streams and File I/O.
1 Assignment #1 is due on Friday. Any questions?.
1 Prove the following languages over Σ={0,1} are regular by giving regular expressions for them: 1. {w contains two or more 0’s} 2. {|w| = 3k for some.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
CSc 453 Lexical Analysis (Scanning)
5-Dec-15 Sequential Files and Streams. 2 File Handling. File Concept.
Java Language Basics By Keywords Keywords of Java are given below – abstract continue for new switch assert *** default goto * package.
Chapter 9 1 Chapter 9 – Part 2 l Overview of Streams and File I/O l Text File I/O l Binary File I/O l File Objects and File Names Streams and File I/O.
CSI 3125, Preliminaries, page 1 Java I/O. CSI 3125, Preliminaries, page 2 Java I/O Java I/O (Input and Output) is used to process the input and produce.
Computing Machinery Chapter 4: Finite State Machines.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
A data type in a programming language is a set of data with values having predefined characteristics.data The language usually specifies:  the range.
Strings and File I/O. Strings Java String objects are immutable Common methods include: –boolean equalsIgnoreCase(String str) –String toLowerCase() –String.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Object Oriented Programming Lecture 2: BallWorld.
 It is a pure oops language and a high level language.  It was developed at sun microsystems by James Gosling.
Costas Busch - LSU1 Deterministic Finite Automata And Regular Languages.
Lecture Three: Finite Automata Finite Automata, Lecture 3, slide 1 Amjad Ali.
Information and Computer Sciences University of Hawaii, Manoa
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
CSc 453 Lexical Analysis (Scanning)
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CS 153: Concepts of Compiler Design October 17 Class Meeting
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
Introduction to Exceptions in Java
CHAPTER 5 JAVA FILE INPUT/OUTPUT
I/O Basics.
Two issues in lexical analysis
Starting JavaProgramming
Lexical Analysis - An Introduction
Introduction to Java Programming
Deterministic Finite Automata And Regular Languages Prof. Busch - LSU.
Lecture 5: Lexical Analysis III: The final bits
CS 3304 Comparative Languages
Exception Handling in Java
Chapter Two: Finite Automata
Chapter 2: Java Fundamentals
Lexical Analysis - An Introduction
Exceptions.
Data Structures and Algorithms for Information Processing
Exception Objects An exception is an abnormal condition that arises in a code sequence at rum time. Exception is a way of signaling serious problem.
Exceptions.
CMPE 152: Compiler Design March 19 Class Meeting
CSc 453 Lexical Analysis (Scanning)
A type is a collection of values
Presentation transcript:

Chapter Four: DFA Applications Formal Language, chapter 4, slide 1

We have seen how DFAs can be used to define formal languages We have seen how DFAs can be used to define formal languages. In addition to this formal use, DFAs have practical applications. DFA-based pieces of code lie at the heart of many commonly used computer programs. Formal Language, chapter 4, slide 2

Outline 4.1 DFA Applications 4.2 A DFA-Based Text Filter in Java 4.3 Table-Driven Alternatives Formal Language, chapter 4, slide 3

DFA Applications Programming language processing Scanning phase: dividing source file into "tokens" (keywords, identifiers, constants, etc.), skipping whitespace and comments Command language processing Typed command languages often require the same kind of treatment Text pattern matching Unix tools like awk, egrep, and sed, mail systems like ProcMail, database systems like MySQL, and many others Formal Language, chapter 4, slide 4

More DFA Applications Signal processing Speech processing and other signal processing systems use finite state models to transform the incoming signal Controllers for finite-state systems Hardware and software A wide range of applications, from industrial processes to video games Formal Language, chapter 4, slide 5

Outline 4.1 DFA Applications 4.2 A DFA-Based Text Filter in Java 4.3 Table-Driven Alternatives Formal Language, chapter 4, slide 6

The Mod3 DFA, Revisited We saw that this DFA accepts a language of binary strings that encode numbers divisible by 3 We will implement it in Java We will need one more state, since our natural alphabet is Unicode, not {0,1} Formal Language, chapter 4, slide 7

The Mod3 DFA, Modified Here, Σ is the Unicode character set The DFA enters the non-accepting trap state on any symbol other than 0 or 1 Formal Language, chapter 4, slide 8

/. A deterministic finite-state automaton that /** * A deterministic finite-state automaton that * recognizes strings that are binary * representations of integers that are divisible * by 3. Leading zeros are permitted, and the * empty string is taken as a representation for 0 * (along with "0", "00", and so on). */ public class Mod3 { /* * Constants q0 through q3 represent states, and * a private int holds the current state code. */ private static final int q0 = 0; private static final int q1 = 1; private static final int q2 = 2; private static final int q3 = 3; private int state; Formal Language, chapter 4, slide 9

static private int delta(int s, char c) { switch (s) { case q0: switch (c) { case '0': return q0; case '1': return q1; default: return q3; } case q1: switch (c) { case '0': return q2; case '1': return q0; default: return q3; } case q2: switch (c) { case '0': return q1; case '1': return q2; default: return q3; } default: return q3; } } Formal Language, chapter 4, slide 10

/. Reset the current state to the start state /** * Reset the current state to the start state. */ public void reset() { state = q0; } /** * Make one transition on each char in the given * string. * @param in the String to use */ public void process(String in) { for (int i = 0; i < in.length(); i++) { char c = in.charAt(i); state = delta(state, c); } } Formal Language, chapter 4, slide 11

/. Test whether the DFA accepted the string /** * Test whether the DFA accepted the string. * @return true iff the final state was accepting */ public boolean accepted() { return state==q0; } }   Usage example: Mod3 m = new Mod3(); m.reset(); m.process(s); if (m.accepted()) ... Formal Language, chapter 4, slide 12

import java.io.*; /** * A Java application to demonstrate the Mod3 class by * using it to filter the standard input stream. Those * lines that are accepted by Mod3 are echoed to the * standard output. */ public class Mod3Filter { public static void main(String[] args) throws IOException { Mod3 m = new Mod3(); // the DFA BufferedReader in = // standard input new BufferedReader(new InputStreamReader(System.in)); Formal Language, chapter 4, slide 13

// Read and echo lines until EOF. String s = in. readLine(); while (s // Read and echo lines until EOF. String s = in.readLine(); while (s!=null) { m.reset(); m.process(s); if (m.accepted()) System.out.println(s); s = in.readLine(); } } } Formal Language, chapter 4, slide 14

C:\>java Mod3Filter < numbers C:\>type numbers 000 001 010 011 100 101 110 111 1000 1001 1010   C:\>java Mod3Filter < numbers C:\> Formal Language, chapter 4, slide 15

Outline 4.1 DFA Applications 4.2 A DFA-Based Text Filter in Java 4.3 Table-Driven Alternatives Formal Language, chapter 4, slide 16

Making Delta A Table We might want to encode delta as a two- dimensional array Avoids method invocation overhead Then process could look like this: static void process(String in) { for (int i = 0; i < in.length(); i++) { char c = in.charAt(i); state = delta[state, c]; } } Formal Language, chapter 4, slide 17

Keeping The Array Small If delta[state,c] is indexed by state and symbol, it will be big: 4 by 65536! And almost all entries will be 3 Instead, we could index it by state and integer, 0 or 1 Then we could use exception handling when the array index is out of bounds Formal Language, chapter 4, slide 18

/. The transition function represented as an array /* * The transition function represented as an array. * The next state from current state s and character c * is at delta[s,c-'0']. */ static private int[][] delta = {{q0,q1},{q2,q0},{q1,q2},{q3,q3}}; /** * Make one transition on each char in the given * string. * @param in the String to use */ public void process(String in) { for (int i = 0; i < in.length(); i++) { char c = in.charAt(i); try { state = delta[state][c-'0']; } catch (ArrayIndexOutOfBoundsException ex) { state = q3; } } } Formal Language, chapter 4, slide 19

Tradeoffs Function or table? Truncated table or full table? By hand, a truncated table is easier Automatically generated systems generally produce the full table, so the same process can be used for different DFAs Table representation We used an int for every entry: wasteful! Could have used a byte, or even just two bits Time/space tradeoff: table compression saves space but slows down access Formal Language, chapter 4, slide 20