Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
CSc 453 Lexical Analysis (Scanning)
Lexical Analysis (4.2) Programming Languages Hiram College Ellen Walker.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
1 Foundations of Software Design Lecture 24: Compilers, Lexers, and Parsers; Intro to Graphs Marti Hearst Fall 2002.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 3: Lexical and Syntactic Analysis.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Automata and Regular Expression Discrete Mathematics and Its Applications Baojian Hua
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
Topic #3: Lexical Analysis
Building lexical and syntactic analyzers
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.
Group 4 Java Compiler Group Members: Atul Singh(Y6127) Manish Agrawal(Y6241) Mayank Sachan(Y6253) Sudeept Sinha(Y6483)
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Syntax and Semantics Structure of programming languages.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
CSc 453 Lexical Analysis (Scanning)
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Syntax (2).
The Role of Lexical Analyzer
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
Programming Languages 2nd edition Tucker and Noonan
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Jakub Yaghob
Two issues in lexical analysis
Programming Languages 2nd edition Tucker and Noonan
Introduction to Java Programming
Lecture 4: Lexical Analysis & Chomsky Hierarchy
4b Lexical analysis Finite Automata
Programming Languages 2nd edition Tucker and Noonan
4b Lexical analysis Finite Automata
Syntactic sugar causes cancer of the semicolon.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic sugar causes cancer of the semicolon. A. Perlis

Copyright © 2006 The McGraw-Hill Companies, Inc. Contents 3.1 Chomsky Hierarchy 3.2 Lexical Analysis 3.3 Syntactic Analysis

Copyright © 2006 The McGraw-Hill Companies, Inc. Lexical Analysis Purpose: transform program representation Input: printable Ascii characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol.

Copyright © 2006 The McGraw-Hill Companies, Inc. Example Tokens Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char... Operators: + - * /... Punctuation: ;, ( ) { }

Copyright © 2006 The McGraw-Hill Companies, Inc. Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file

Copyright © 2006 The McGraw-Hill Companies, Inc. Why a Separate Phase? Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs

Copyright © 2006 The McGraw-Hill Companies, Inc. Regular Expressions RegExprMeaning xa character x \xan escaped character, e.g., \n { name }a reference to a name M | NM or N M NM followed by N M*zero or more occurrences of M

Copyright © 2006 The McGraw-Hill Companies, Inc. RegExprMeaning M+One or more occurrences of M M?Zero or one occurrence of M [aeiou]the set of vowels [0-9]the set of digits.Any single character

Copyright © 2006 The McGraw-Hill Companies, Inc. Clite Lexical Syntax CategoryDefinition anyChar[ -~] Letter[a-zA-Z] Digit[0-9] Whitespace[ \t] Eol\n Eof\004

Copyright © 2006 The McGraw-Hill Companies, Inc. CategoryDefinition Keywordbool | char | else | false | float | if | int | main | true | while Identifier{Letter}({Letter} | {Digit})* integerLit{Digit}+ floatLit{Digit}+\.{Digit}+ charLit‘{anyChar}’

Copyright © 2006 The McGraw-Hill Companies, Inc. CategoryDefinition Operator = | || | && | == | != | | >= | + | - | * | / | ! | [ | ] Separator : |. | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol}

Copyright © 2006 The McGraw-Hill Companies, Inc. Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex

Copyright © 2006 The McGraw-Hill Companies, Inc. Finite State Automata Set of states: representation – graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final states

Copyright © 2006 The McGraw-Hill Companies, Inc. Deterministic FSA Defn: A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.

Copyright © 2006 The McGraw-Hill Companies, Inc. A Finite State Automaton for Identifiers

Copyright © 2006 The McGraw-Hill Companies, Inc. Definitions A configuration on an fsa consists of a state and the remaining input. A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then: –If no input and state is final, then accept. –Otherwise, error.

Copyright © 2006 The McGraw-Hill Companies, Inc. An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.

Copyright © 2006 The McGraw-Hill Companies, Inc. Example (S, a2i$) ├ (I, 2i$) ├ (I, i$) ├ (I, $) ├ (F, ) Thus: (S, a2i$) ├* (F, )

Copyright © 2006 The McGraw-Hill Companies, Inc. Some Conventions Explicit terminator used only for program as a whole, not each token. An unlabeled arc represents any other valid input symbol. Recognition of a token ends in a final state. Recognition of a non-token transitions back to start state.

Copyright © 2006 The McGraw-Hill Companies, Inc. Recognition of end symbol (end of file) ends in a final state. Automaton must be deterministic. –Drop keywords; handle separately. –Must consider all sequences with a common prefix together.

Copyright © 2006 The McGraw-Hill Companies, Inc.

Lexer Code Parser calls lexer whenever it needs a new token. Lexer must remember where it left off. Greedy consumption goes 1 character too far –peek function –pushback function –no symbol consumed by start state

Copyright © 2006 The McGraw-Hill Companies, Inc. From Design to Code private char ch = ‘ ‘; public Token next ( ) { do { switch (ch) {... } } while (true); }

Copyright © 2006 The McGraw-Hill Companies, Inc. Remarks Loop only exited when a token is found Loop exited via a return statement. Variable ch must be global. Initialized to a space character. Exact nature of a Token irrelevant to design.

Copyright © 2006 The McGraw-Hill Companies, Inc. Translation Rules Traversing an arc from A to B: –If labeled with x: test ch == x –If unlabeled: else/default part of if/switch. If only arc, no test need be performed. –Get next character if A is not start state

Copyright © 2006 The McGraw-Hill Companies, Inc. A node with an arc to itself is a do-while. –Condition corresponds to whichever arc is labeled.

Copyright © 2006 The McGraw-Hill Companies, Inc. Otherwise the move is translated to a if/switch: –Each arc is a separate case. –Unlabeled arc is default case. A sequence of transitions becomes a sequence of translated statements.

Copyright © 2006 The McGraw-Hill Companies, Inc. A complex diagram is translated by boxing its components so that each box is one node. –Translate each box using an outside-in strategy.

Copyright © 2006 The McGraw-Hill Companies, Inc. private boolean isLetter(char c) { return ch >= ‘a’ && ch <= ‘z’ || ch >= ‘A’ && ch <= ‘Z’; }

Copyright © 2006 The McGraw-Hill Companies, Inc. private String concat(String set) { StringBuffer r = new StringBuffer(“”); do { r.append(ch); ch = nextChar( ); } while (set.indexOf(ch) >= 0); return r.toString( ); }

Copyright © 2006 The McGraw-Hill Companies, Inc. public Token next( ) { do { if (isLetter(ch) { // ident or keyword String spelling = concat(letters+digits); return Token.keyword(spelling); } else if (isDigit(ch)) { // int or float literal String number = concat(digits); if (ch != ‘.’) return Token.mkIntLiteral(number); number += concat(digits); return Token.mkFloatLiteral(number);

Copyright © 2006 The McGraw-Hill Companies, Inc. } else switch (ch) { case ‘ ‘: case ‘\t’: case ‘\r’: case eolnCh: ch = nextCh( ); break; case eofCh: return Token.eofTok; case ‘+’: ch = nextChar( ); return Token.plusTok; … case ‘&’: check(‘&’); return Token.andTok; case ‘=‘: return chkOpt(‘=‘, Token.assignTok, Token.eqeqTok);

Copyright © 2006 The McGraw-Hill Companies, Inc. Source Tokens // a first program // with 2 comments int main ( ) { char c; int i; c = 'h'; i = c + 3; } // main int main ( ) { char Identifierc ;