Lexical Analysis (Sections )

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 4: Syntax Specification COMP 144 Programming Language Concepts Spring 2002 Felix.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Lexical analysis.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 Syntax Specification Regular Expressions. 2 Phases of Compilation.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
Compiler Construction Lexical Analysis. The word lexical means textual or verbal or literal. The lexical analysis implemented in the “SCANNER” module.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 Syntax Specification (Sections ) CSCI 431 Programming Languages Fall 2003 A modification of slides developed by Felix Hernandez-Campos at UNC.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CSc 453 Lexical Analysis (Scanning)
The Role of Lexical Analyzer
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Deterministic Finite Automata Nondeterministic Finite Automata.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
CS 3304 Comparative Languages
More on scanning: NFAs and Flex
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
A Simple Syntax-Directed Translator
Lexical Analysis.
Chapter 3 Lexical Analysis.
CS 326 Programming Languages, Concepts and Implementation
Chapter 2 :: Programming Language Syntax
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
CSc 453 Lexical Analysis (Scanning)
CSE 105 theory of computation
PROGRAMMING LANGUAGES
Formal Language Theory
CPSC 388 – Compiler Design and Construction
4 Lexical analysis.
Lexical Analysis - An Introduction
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Chapter 2 :: Programming Language Syntax
Lexical Analysis - An Introduction
Chapter 2 :: Programming Language Syntax
High-Level Programming Language
Compiler Construction
Lecture 5 Scanning.
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis (Sections 2.1.2-2.1.3) CSCI 431 Programming Languages Fall 2003 Lexical Analysis (Sections 2.1.2-2.1.3) A modification of slides developed by Felix Hernandez-Campos at UNC Chapel Hill

Phases of Compilation

Specification of Programming Languages PLs require precise definitions (i.e. no ambiguity) Language form (Syntax) Language meaning (Semantics) Consequently, PLs are specified using formal notation: Formal syntax Tokens Grammar Formal semantics

Phases of Compilation

Scanner Main task: identify tokens Basic building blocks of programs E.g. keywords, identifiers, numbers, punctuation marks Other tasks: remove comments, deal with pragmas, save source locations Desk calculator language example: read A sum := A + 3.45e-3 write sum write sum / 2 Makes parsing much easier (the parsers looks for relations between tokens) Other tasks: remove comments

Formal definition of tokens A set of tokens is a set of strings over an alphabet {read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …} A set of tokens is a regular set that can be defined by comprehension using a regular expression For every regular set, there is a deterministic finite automaton (DFA) that can recognize it i.e. determine whether a string belongs to the set or not Scanners extract tokens from source code in the same way DFAs determine membership

Regular Expressions A regular expression (RE) is: A single character The empty string,  The concatenation of two regular expressions Notation: RE1 RE2 (i.e. RE1 followed by RE2) The union of two regular expressions Notation: RE1 | RE2 The closure of a regular expression Notation: RE* * is known as the Kleene star * represents the concatenation of 0 or more strings

Token Definition Example Numeric literals in Pascal Definition of the token unsigned_number digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 unsigned_integer  digit digit* unsigned_number  unsigned_integer ( ( . unsigned_integer ) |  ) ( ( e ( + | – | ) unsigned_integer ) |  ) Recursion is not allowed! Notice the use of parentheses to avoid ambiguity

Scanning Pascal scanner Pseudo-code Accept the longest possible token (keyword while, p. 5) One character at a time (but we may need look ahead e.g. dotdot for Pascal’s subranges)

DFAs Scanners are deterministic finite automata (DFAs) With some hacks Simplified (keywords and identifiers) Lookup problems (but we may need look ahead e.g. dotdot for Pascal’s subranges)

Difficulties Keywords and variable names Look-ahead Pascal’s ranges [1..10] FORTRAN’s example DO 5 I=1,25 => Loop 25 times up to label 5 DO 5 I=1.25 => Assign 1.25 to DO5I NASA’s Mariner 1 (apocryphal?) Pragmas: significant comments Compiler options Run-time check (array out of bounds) Code improvements Profiling Hints for the compiler

Outline of the Scanner

Scanner Generators Scanners generators: E.g. lex, flex These programs take regular expressions as their input and return a program (i.e. a scanner) that can extract tokens from a stream of characters Automatic generation of scanners

Table-driven scanner Lexical errors

Scanners and String Processing Scanning is a common task in programming String processing E.g. reading configuration files, processing log files,… StringTokenizer and StreamTokenizer in Java Regular expressions in Perl and other scripting languages