1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

Slides:



Advertisements
Similar presentations
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006.
Lexical Analyzer Second lecture. Compiler Construction Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical.
CS-338 Compiler Design Dr. Syed Noman Hasany Assistant Professor College of Computer, Qassim University.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
Compiler Construction
Lexical Analysis Recognize tokens and ignore white spaces, comments
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 October 2, October 2, 2015October 2, 2015October 2, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
CS-5800 Theory of Computation II PROJECT PRESENTATION By Quincy Campbell & Sandeep Ravikanti.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analyzer in Perspective
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
1 Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
1.  It is the first phase of compiler.  In computer science, lexical analysis is the process of converting a sequence of characters into a sequence.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 January 18, January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
1st Phase Lexical Analysis
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lexical Analyzer in Perspective
CS510 Compiler Lecture 2.
Chapter 3 Lexical Analysis.
CSc 453 Lexical Analysis (Scanning)
CSc 453 Lexical Analysis (Scanning)
Recognizer for a Language
Chapter 3: Lexical Analysis
Lexical Analysis and Lexical Analyzer Generators
Review: Compiler Phases:
Compiler Construction
COMPILERS LECTURE(6-Aug-13)
Compiler Construction
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

2 The Reason Why Lexical Analysis is a Separate Phase Simplifies the design of the compiler –LL(1) or LR(1) parsing with 1 token lookahead would not be possible (multiple characters/tokens to match) Provides efficient implementation –Systematic techniques to implement lexical analyzers by hand or automatically from specifications –Stream buffering methods to scan input Improves portability –Non-standard symbols and alternate character encodings can be normalized (e.g. trigraphs) November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

3 Interaction of the Lexical Analyzer with the Parser Lexical Analyzer Parser Source Program Token, tokenval Symbol Table Get next token error November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

4 Attributes of Tokens Lexical analyzer y := *x Parser token tokenval (token attribute) November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

5  Formalization November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators  Regular Expressions  Finite Automata  RE  Conversion  FA  Lexer Design

6 November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Keep in mind following questions Token –Lexical units –Atom parse element –Abstracted in syntax: e.g. Id Lexeme –Specific string making up token –Value / attribute related to a token –Concrete in language, e.g., Amt Spec of patterns for tokens –Alphabet  - a finite set –String s - a finite sequence from  –Language – a specific set of strings

7 Tokens, Patterns, and Lexemes A token is a classification of lexical units –For example: id and num Lexemes are the specific character strings that make up a token –For example: abc and 123 Patterns are rules describing the set of lexemes belonging to a token –For example: “letter followed by letters and digits” and “non-empty sequence of digits” November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

8 An alphabet  is a finite set of symbols (characters) A string s is a finite sequence of symbols from  –  s  denotes the length of string s –  denotes the empty string, thus  = 0 A language is a specific set of strings over some fixed alphabet  November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Specification of Patterns for Tokens: Definitions

9 Specification of Patterns for Tokens: String Operations The concatenation of two strings x and y is denoted by xy The exponentation of a string s is defined by s 0 =  s i = s i-1 s for i > 0 note that s  =  s = s November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

10 Union L  M = {s  s  L or s  M} Concatenation LM = {xy  x  L and y  M} Exponentiation L 0 = {  }; L i = L i-1 L Kleene closure L * =  i=0,…,  L i Positive closure L + =  i=1,…,  L i November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Specification of Patterns for Tokens: Language Operations

11 Basis symbols: –  is a regular expression denoting language {  } –a   is a regular expression denoting {a} If r and s are regular expressions denoting languages L(r) and M(s) respectively, then –r  s is a regular expression denoting L(r)  M(s) –rs is a regular expression denoting L(r)M(s) –r * is a regular expression denoting L(r) * –(r) is a regular expression denoting L(r) A language defined by a regular expression is called a regular set November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Specification of Patterns for Tokens: Regular Expressions

12 Nondeterministic Finite Automata An NFA is a 5-tuple (S, , , s 0, F) where S is a finite set of states  is a finite set of symbols, the alphabet  is a mapping from S  to a set of states s 0  S is the start state F  S is the set of accepting (or final) states November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

13 Conversion of an NFA into a DFA The subset construction algorithm converts an NFA into a DFA using:  -closure(s) = {s}  {t  s   …   t}  -closure(T) =  s  T  -closure(s) move(T,a) = {t  s  a t and s  T} The algorithm produces: Dstates is the set of states of the new DFA consisting of sets of states of the NFA Dtran is the transition table of the new DFA November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

14 November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Got it with following questions Tokens –L–Lexical units –A–Atom parse element –A–Abstracted in syntax: e.g. Id Lexeme –S–Specific string making up token –V–Value / attribute related to a token –C–Concrete in language, e.g., Amt Spec of patterns for tokens –A–Alphabet  - a finite set –S–String s - a finite sequence from  –L–Language – a specific set of strings

15 Thank you very much! Questions? November 1, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction