Lecture 4: Lexical Analysis & Chomsky Hierarchy

Slides:



Advertisements
Similar presentations
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
1 Foundations of Software Design Lecture 24: Compilers, Lexers, and Parsers; Intro to Graphs Marti Hearst Fall 2002.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Context-Free Grammars Lecture 7
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
COP4020 Programming Languages
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Building lexical and syntactic analyzers
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Grammars CPSC 5135.
PART I: overview material
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
C H A P T E R TWO Syntax and Semantic.
Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CPS 506 Comparative Programming Languages Syntax Specification.
Syntax and Semantics Structure of programming languages.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Syntax (2).
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
9/15/2010CS485, Lecture 2, Fall Lecture 2: Introduction to Syntax (Revised based on the Tucker’s slides)
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Lexical and Syntax Analysis
Chapter 3 Lexical Analysis.
CS 326 Programming Languages, Concepts and Implementation
Introduction to Parsing
CS510 Compiler Lecture 4.
Chapter 2 :: Programming Language Syntax
CSE 3302 Programming Languages
Chapter 3 – Describing Syntax
Automata and Languages What do these have in common?
PROGRAMMING LANGUAGES
CS314 – Section 5 Recitation 3
Programming Languages 2nd edition Tucker and Noonan
Review: Compiler Phases:
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Programming Languages 2nd edition Tucker and Noonan
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chapter 2 :: Programming Language Syntax
Syntactic sugar causes cancer of the semicolon.
Chapter 2 :: Programming Language Syntax
Chapter 10: Compilers and Language Translation
Discrete Maths 13. Grammars Objectives
Programming Languages 2nd edition Tucker and Noonan
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Lecture 4: Lexical Analysis & Chomsky Hierarchy (Revised based on the Tucker’s slides) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Revisit Expression Grammar Let us consider the following Grammar for Assignment: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer | ID Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Build a parse tree abc = x + y 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Levels of Syntax Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form. E.g., 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar So Expression Grammar For the following grammar: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer Concrete Syntax Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Lexical Syntax 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Regular Grammar Simplest; least powerful Concentrate on the lexical syntax Right regular grammar:   T*, B  N, a  T A →  B A → ε (ε is an empty string) or A → A → a 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Regular Grammar Left regular grammar:   T*, B  N, a  T A → B  A → ε A → a A regular grammar is either a left regular grammar or right regular grammar Consider the following grammar: S → aA A → Sb S → ε 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Regular Grammars Equivalent to: Regular expression Finite-state automaton Used in construction of tokenizers Less powerful than context-free grammars Not a regular language { aⁿ bⁿ | n ≥ 1 } and { am bⁿ | 1 ≤m≤n } i.e., cannot balance: ( ), { }, begin end 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Compilers & Interpreters Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens Source Program Machine Code Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Find syntax errors Find semantic errors 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Purpose: transform program representation Input: printable Ascii characters Output: tokens (Terminals T) Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol. A token is corresponding to a Terminal Symbol in CFG 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Example Tokens Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char ... Operators: + - * / ... Punctuation: ; , ( ) { } 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file All of the above languages can be defined by the CFG Grammar. 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Regular Expressions RegExpr Meaning x a character x \x an escaped character, e.g., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits . Any single character 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Clite Lexical Syntax Category Definition anyChar [ -~] Letter [a-zA-Z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Category Definition Identifier {Letter}({Letter} | {Digit})* integerLit {Digit}+ floatLit {Digit}+\.{Digit}+ charLit ‘{anyChar}’ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Category Definition Operator = | || | && | == | != | < | <= | >| >= | + | - | * | / |! | [ | ] Separator : | . | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol} 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar Chomsky Hierarchy Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Context-free Grammars BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Context-Sensitive Grammars Production: α → β |α| ≤ |β| α, β  (N  T)* ie, lefthand side can be composed of strings of terminals and nonterminals 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Expression Exercise Describe the languages denoted by the following REs 0(0|1)*0 ((|0)1*)* (0|1)*0(0|1)(0|1) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Expression Exercise Consider a small language using only the letter “z” and “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+,  Valid: /o/zzzz/oo/, /ozz/oz////o/ Invalid: /o/, /ozzzooo/zzzo/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Expression Exercise Consider a small language using only the letters “z”, “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+,  /o(o*z|/)*o+/ /o(o|z|/)*o/ /o/*(o*z/*)*o+/ /o(/|oz|oo)*o+/ /o(/*o*z)*/*o+/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Expression Exercise All Strings of 0’s and 1’s to satisfy the following condition all binary strings except empty string contains at least three 1s does not contain the substring 110 length is at least 1 and at most 3 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar