Chapter 3: Lexical Analysis

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

CSc 453 Lexical Analysis (Scanning)
Chapter 3 Lexical Analysis Yu-Chen Kuo.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lexical Analysis Natawut Nupairoj, Ph.D.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
CS30003: Compilers Lexical Analysis Lecture Date: 05/08/13 Submission By: DHANJIT DAS, 11CS10012.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analyzer in Perspective
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Scanner Introduction to Compilers 1 Scanner.
Overview of Previous Lesson(s) Over View  Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. 
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Role of Lexical Analyzer
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1st Phase Lexical Analysis
Lecture 6 Lexical Analysis Xiaoyin Wang CS5363 Programming Languages and Compilers.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
CS 3304 Comparative Languages
System Software Theory (5KS03).
Lexical Analyzer in Perspective
Lexical and Syntax Analysis
Scanner Scanner Introduction to Compilers.
Chapter 3 Lexical Analysis.
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Compiler Construction
Compiler Lecture 1 CS510.
Lexical analysis Jakub Yaghob
Lexical Analysis Why separate lexical and syntax analyses?
Syntax Analysis Sections :.
Department of Software & Media Technology
Lexical and Syntax Analysis
CPSC 388 – Compiler Design and Construction
Chapter 2: A Simple One Pass Compiler
Review: Compiler Phases:
CS 3304 Comparative Languages
Scanner Scanner Introduction to Compilers.
Designing a Predictive Parser
CS 3304 Comparative Languages
Other Issues - § 3.9 – Not Discussed
Scanner Scanner Introduction to Compilers.
High-Level Programming Language
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Lecture 5 Scanning.
Scanner Scanner Introduction to Compilers.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Chapter 3: Lexical Analysis 345CS Chapter 3: Lexical Analysis

Lexical Analysis Basic Concepts & Regular Expressions What does a Lexical Analyzer do? How does it Work? Formalizing Token Definition & Recognition LEX - A Lexical Analyzer Generator (Defer) Reviewing Finite Automata Concepts Non-Deterministic and Deterministic FA Conversion Process Regular Expressions to NFA NFA to DFA Relating NFAs/DFAs /Conversion to Lexical Analysis Concluding Remarks /Looking Ahead

Lexical Analyzer in Perspective parser symbol table source program token get next token Important Issue: What are Responsibilities of each Box ? Focus on Lexical Analyzer and Parser

Lexical Analyzer in Perspective Scan Input Remove WS, NL, … Identify Tokens Create Symbol Table Insert Tokens into ST Generate Errors Send Tokens to Parser PARSER Perform Syntax Analysis Actions Dictated by Token Order Update Symbol Table Entries Create Abstract Rep. of Source Generate Errors And More…. (We’ll see later)

What Factors Have Influenced the Functional Division of Labor ? Separation of Lexical Analysis From Parsing Presents a Simpler Conceptual Model From a Software Engineering Perspective Division Emphasizes High Cohesion and Low Coupling Implies Well Specified  Parallel Implementation Separation Increases Compiler Efficiency (I/O Techniques to Enhance Lexical Analysis) Separation Promotes Portability. This is critical today, when platforms (OSs and Hardware) are numerous and varied! Emergence of Platform Independence - Java

Introducing Basic Terminology What are Major Terms for Lexical Analysis? TOKEN A classification for a common set of strings Examples Include <Identifier>, <number>, etc. PATTERN The rules which characterize the set of strings for a token Recall File and OS Wildcards ([A-Z]*.*) LEXEME Actual sequence of characters that matches pattern and is classified by a token Identifiers: x, count, name, etc…

Introducing Basic Terminology Token Sample Lexemes Informal Description of Pattern const if relation id num literal <, <=, =, < >, >, >= pi, count, D2 3.1416, 0, 6.02E23 “core dumped” < or <= or = or < > or >= or > letter followed by letters and digits any numeric constant any characters between “ and “ except “ Classifies Pattern Actual values are critical. Info is : 1. Stored in symbol table 2. Returned to parser

Handling Lexical Errors Error Handling is very localized, with Respect to Input Source For example: whil ( x := 0 ) do generates no lexical errors in PASCAL In what Situations do Errors Occur? Prefix of remaining input doesn’t match any defined token Possible error recovery actions: Deleting or Inserting Input Characters Replacing or Transposing Characters Or, skip over to next separator to “ignore” problem

Designing efficient Lex Analyzers is efficiency an issue? 3 Lexical Analyzer construction techniques how they address efficiency? : Lexical Analyzer Generator Hand-Code / High Level Language (I/O facilitated by the language) Hand-Code / Assembly Language (explicitly manage I/O). In Each Technique … Who handles efficiency ? How is it handled ?

I/O - Key For Successful Lexical Analysis Character-at-a-time I/O Block / Buffered I/O Block/Buffered I/O Utilize Block of memory Stage data from source to buffer block at a time Maintain two blocks - Why (Recall OS)? Asynchronous I/O - for 1 block While Lexical Analysis on 2nd block Tradeoffs ? Block 1 Block 2 When done, issue I/O ptr... Still Process token in 2nd block

Algorithm: Buffered I/O with Sentinels Current token eof * M = E 2 C lexeme beginning forward (scans ahead to find pattern match) forward : = forward + 1 ; if forward is at eof then begin if forward at end of first half then begin reload second half ; forward : = forward + 1 end else if forward at end of second half then begin reload first half ; move forward to biginning of first half else / * eof within buffer signifying end of input * / terminate lexical analysis 2nd eof  no more input ! Block I/O Algorithm performs I/O’s. We can still have get & un getchar