Lexical Analysis (4.2) Programming Languages Hiram College Ellen Walker.

Slides:



Advertisements
Similar presentations
1 2.Lexical Analysis 2.1Tasks of a Scanner 2.2Regular Grammars and Finite Automata 2.3Scanner Implementation.
Advertisements

1 Week 2 Questions / Concerns Schedule this week: Homework1 & Lab1a due at midnight on Friday. Sherry will be in Klamath Falls on Friday Lexical Analyzer.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Finite Automata CPSC 388 Ellen Walker Hiram College.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Chapter 4 Lexical and Syntax Analysis Sections
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Lexical and Syntax Analysis
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
CS 330 Programming Languages 09 / 26 / 2006 Instructor: Michael Eckmann.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
CS 153: Concepts of Compiler Design October 10 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CSc 453 Lexical Analysis (Scanning)
Introduction CPSC 388 Ellen Walker Hiram College.
ISBN Chapter 4 Lexical and Syntax Analysis.
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
CS 330 Programming Languages 09 / 20 / 2007 Instructor: Michael Eckmann.
Language Translation Part 2: Finite State Machines.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Formal Languages and Automata FORMAL LANGUAGES FINITE STATE AUTOMATA.
4.1 Introduction - Language implementation systems must analyze
Lexical and Syntax Analysis
CS 3304 Comparative Languages
Lecture 2 Lexical Analysis
Chapter 4 Lexical and Syntax Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
CS 153: Concepts of Compiler Design October 17 Class Meeting
CSc 453 Lexical Analysis (Scanning)
PROGRAMMING LANGUAGES
Finite-State Machines (FSMs)
Two issues in lexical analysis
Lexical and Syntax Analysis
Lexical Analysis - An Introduction
CS 3304 Comparative Languages
4b Lexical analysis Finite Automata
Designing a Predictive Parser
CS 3304 Comparative Languages
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Lexical and Syntax Analysis
4b Lexical analysis Finite Automata
Lexical Analysis - An Introduction
Lecture 5 Scanning.
CMPE 152: Compiler Design March 19 Class Meeting
CSc 453 Lexical Analysis (Scanning)
4.1 Introduction - Language implementation systems must analyze
Presentation transcript:

Lexical Analysis (4.2) Programming Languages Hiram College Ellen Walker

Lexical Analysis is Pattern Matching From a sequence of characters to a sequence of lexemes, e.g. – “public static void main(char[] args)” -> – Patterns are simpler (easy grammars), e.g. -> | -> a | b | c | … | z

Regular Grammars Subset of Context Free Grammars Every rule contains at most one non-terminal symbol (or can be rewritten so it does…)

Rewritten Grammar for ID Original: -> | -> a | b | c | … | z Rewrite: -> (a | b | c | … | z) | (a | b | c | … z ) Fully expanded (52 rules): -> a | b | c … a | b | c |… | z

Parsing using a Regular Grammar 1.Transform the grammar into a state machine 2.Implement the state machine in a computer program – By hand – Automatically, using table-lookup 3.Run this program on input strings

What is a State Machine? State machine abstraction – At any time, the process is in a “state” – Each time an “event” happens, the process takes an “action” and goes to the next state – We can describe the entire algorithm as a diagram where each state has an arrow for each event/action pair to the next appropriate state

State Machine for a Kitten Happy Hungry Sleeping Food available / EatToys available / Play X hrs passed / Awaken

State Machine for a Language Each “event” processes an input symbol Two important special states – Initial state: state the machine is in before the first symbol – Final state: state the machine is in whenever the sequence of symbols up to now is in the language

Transforming a Regular Grammar to a State Machine Put the grammar into a form so every rule is -> symbol Make a state for each nonterminal Make a transition (arrow) for each rule. The transition goes from to based on the symbol. The start symbol of the grammar is initial. There is one final state that every rule that doesn’t have a nonterminal on the right goes to.

State Machine Example -> a | b | a | b Two states: id (initial) and f (final) Example: aabba

Simpler State Machine This is a cleaner version of the other machine. Each character, state combination has only one next state. It is called a DFA (deterministic finite automaton)

Lexical Analysis for Integer Expressions

From DFA to Program Method doScan() reads tokens from an input stream (assume System.in for now) and creates a list of them in order. Method lex(s) scans and returns a single Token from a stream. A Token consists of a type (e.g. INT) and a string (e.g. “1234”) 09/15/10

Defining Constants //Number all the states Public static final int NUMSTATES = 4; Public static final int START = 0; Public static final int INT = 1; Public static final int ID = 2; Public static final int UNK = 3; Public static final int ERR = 4; 09/15/10

Constructing Transition Table (in constructor) String chars = “01234abcdef+-()” int[][] tt = new int[[chars.size()][NUMSTATES]; tt[ID][5] = ID; // ’a’ tt[ID][6] = ID; // ’b’ tt[START][5] = ID; // ’a’ tt[START][1] = INT; // … etc … tt[ID][0] = ERR; // … etc …

Recognizing Final States //For this grammar, all states but ERR are final //Usually, this method is a bit more complex boolean final(int state){ return (state != ERR); } 09/15/10

Lex Method //Read one token from the input ( any Scanner) public static Token lex(Scanner s){ //initialize variables StringBuilder lexeme = new StringBuilder; int state = START; char ch = s.nextChar(); … 09/15/10

Lex Method (cont’d) //loop through characters, updating state while (state != ERR){ oldstate = state; lexeme += ch; state = tt[oldstate][chars.indexOf(ch)]; ch = s.getChar(); } 09/15/10

Lex Method (cont’d) //return the token if final(oldstate) //valid token return new Token(oldstate,lexeme); else //not a valid token – return the chars return new Token(ERR, lexeme); } //end of lex() 09/15/10

From DFA to Program (cont’d) Public static boolean doScan(){ Scanner s = new Scanner (System.in); while(s.peek()){ //not EOF //removes whitespace eatWhitespace(s); token = lex(s); tokens.add(token); if (token.getType == ERR) return false; } return true;

Another Program (pp ) Programmed in C (no classes) Global variables instead of class variables (used in many functions, e.g. charClass) Token (int) and lexeme (string) unconnected States and transitions are implicit Lex() is a big case statement Many special purpose functions, e.g. getChar(), addChar(), lookup() executing portions of DFA 09/15/10