Context-free grammars (CFGs)

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Context-Free Grammars Lecture 7
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Grammars CPSC 5135.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
Introduction to Parsing
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analyzer (Parser)
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Context-free grammars. Roadmap Last time – Regex == DFA – JLex for generating Lexers This time – CFGs, the underlying abstraction for Parsers.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Introduction to Parsing
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
lec02-parserCFG May 8, 2018 Syntax Analyzer
Describing Syntax and Semantics
A Simple Syntax-Directed Translator
Parsing & Context-Free Grammars
Context-Free Grammars: an overview
Programming Languages Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax Specification and Analysis
What does it mean? Notes from Robert Sebesta Programming Languages
Linear Bounded Automata LBAs
Automata and Languages What do these have in common?
Syntax versus Semantics
CS 363 Comparative Programming Languages
Syntax Analysis Sections :.
Parsing Techniques.
CSE 3302 Programming Languages
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Lexical Analysis & Syntactic Analysis
Programming Language Syntax 2
Chapter 2: A Simple One Pass Compiler
Lecture 7: Introduction to Parsing (Syntax Analysis)
CHAPTER 2 Context-Free Languages
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Syntax-Directed Translation
CS 3304 Comparative Languages
Introduction to Parsing
Introduction to Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Chapter 3 Describing Syntax and Semantics.
BNF 9-Apr-19.
High-Level Programming Language
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Context-free grammars (CFGs)

regexp : JLex :: CFG : Java CUP Roadmap Last time RegExp == DFA Jlex: a tool for generating (Java code for) a lexer/scanner Mainly a collection of 〈regexp, action〉 pairs This time CFGs, the underlying abstraction for parsers Next week Java CUP: a tool for generating (Java code for) a parser Mainly a collection of 〈CFG-rule, action〉 pairs regexp : JLex :: CFG : Java CUP

RegExps Are Great! Perfect for tokenizing a language However, they have some limitations Can only define a limited family of languages Cannot use a RegExp to specify all the programming constructs we need No notion of structure Let’s explore both of these issues

Limitations of RegExps Cannot handle “matching” E.g., language of balanced parentheses L() = { (n )n where n > 0} No DFA exists for this language Intuition: A given FSM only has a fixed, finite amount of memory For an FSM, memory = the states With a fixed, finite amount of memory, how could an FSM remember how many “(“ characters it has seen?

Theorem: No RegExp/DFA can describe the language L() Proof by contradiction: Suppose that there exists a DFA A for L() and A has N states A has to accept the string (N )N with some path q0q1…qN…q2N+1 By the pigeonhole principle some state has to repeat: qi = qj for some i<j<N Therefore the run q0q1…qiqj+1…qN…q2N+1 is also accepting A accepts the string (N-(j-i) )N∉L(), which is a contradiction!

Limitations of RegExps: No Structure Our Enhanced-RegExp scanner can emit a stream of tokens: X = Y + Z … but this doesn’t really enforce any order of operations ID ASSIGN ID PLUS ID

The Chomsky Hierarchy LANGUAGE CLASS: power efficiency Recursively enumerable Context-Sensitive Turing machine Context-Free Happy medium? Regular FSM Noam Chomsky

Context Free Grammars (CFGs) A set of (recursive) rewriting rules to generate patterns of strings Can envision a “parse tree” that keeps structure

CFG: Intuition S → ‘(‘ S ‘)’ S S ( S ) A rule that says that you can rewrite S to be an S surrounded by a single set of parenthesis Before applying rule After applying rule S S ( S )

Context Free Grammars (CFGs) A CFG is a 4-tuple (N,Σ,P,S) N is a set of non-terminals, e.g., A, B, S, … Σ is the set of terminals P is a set of production rules S∈N is the initial non-terminal symbol (“start symbol”)

Context Free Grammars (CFGs) Placeholder / interior nodes in the parse tree A CFG is a 4-tuple (N,Σ,P,S) N is a set of non-terminals, e.g., A, B, S… Σ is the set of terminals P is a set of production rules S (in N) is the initial non-terminal symbol Tokens from scanner Rules for deriving strings If not otherwise specified, use the non-terminal that appears on the LHS of the first production as the start

Production Syntax LHS → RHS Examples: S  ‘(‘ S ‘)’ S  ε Expression: Sequence of terminals and nonterminals Single nonterminal symbol Examples: S  ‘(‘ S ‘)’ S  ε

Production Shorthand Nonterm → expression Nonterm→ ε equivalently: | ε Nonterm → expression | ε S  ‘(‘ S ‘)’ S  ε S  ‘(‘ S ‘)’ | ε S  ‘(‘ S ‘)’ | ε

Derivations To derive a string: Start by setting “Current Sequence” to the start symbol Repeatedly, Find a Nonterminal X in the Current Sequence Find a production of the form X→α “Apply” the production: create a new “current sequence” in which α replaces X Stop when there are no more non-terminals This process derives a string of terminal symbols

Derivation Syntax We’ll use the symbol “⇒” for “derives” We’ll use the symbol “ + ” for “derives in one or more steps” (also written as “ ⇒ + ”) We’ll use the symbol “ ∗ ” for “derives in zero or more steps” (also written as “ ⇒ ∗ ”)

An Example Grammar

An Example Grammar Terminals begin end semicolon assign id plus

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Program boundary

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Program boundary Represents “;” Separates statements

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Program boundary Represents “;” Separates statements Represents “=“ in an assignment statement

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Program boundary Represents “;” Separates statements Represents “=“ in an assignment statement Identifier / variable name

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Program boundary Represents “;” Separates statements Represents “=“ in an assignment statement Identifier / variable name Represents “+“ operator in an expression

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Nonterminals Prog Stmts Stmt Expr

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr Root of the parse tree

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr Root of the parse tree List of statements

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr Root of the parse tree List of statements A single statement

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr Root of the parse tree List of statements A single statement A mathematical expression

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Defines the syntax of legal programs Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Derivation Sequence

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Derivation Sequence

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Derivation Sequence Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog Derivation Sequence Prog Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog Derivation Sequence Prog begin Stmts end 1 Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Derivation Sequence Prog begin Stmts end 1 Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end 1 2 Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Stmt Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end 1 2 3 Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Stmt Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end begin id assign Expr semicolon Stmt end id assign Expr 1 2 3 4 Key terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Stmt id assign Expr Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end begin id assign Expr semicolon Stmt end begin id assign Expr semicolon id assign Expr end id assign Expr 1 2 3 4 Key 4 terminal Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Stmt id assign Expr Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end begin id assign Expr semicolon Stmt end begin id assign Expr semicolon id assign Expr end begin id assign id semicolon id assign Expr end id assign Expr 1 2 id 3 4 Key 4 terminal 5 Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Stmt id assign Expr Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end begin id assign Expr semicolon Stmt end begin id assign Expr semicolon id assign Expr end begin id assign id semicolon id assign Expr end begin id assign id semicolon id assign Expr plus id end id assign Expr Expr plus id 1 2 id 3 4 Key 4 terminal 5 6 Nonterminal Rule used

Productions Prog → begin Stmts end Stmts → Stmts semicolon Stmt | Stmt Stmt → id assign Expr Expr → id | Expr plus id Parse Tree Prog begin Stmts end Stmts semicolon Stmt Stmt id assign Expr Derivation Sequence Prog begin Stmts end begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end begin id assign Expr semicolon Stmt end begin id assign Expr semicolon id assign Expr end begin id assign id semicolon id assign Expr end begin id assign id semicolon id assign Expr plus id end begin id assign id semicolon id assign id plus id end id assign Expr Expr plus id 1 2 id id 3 4 Key 4 terminal 5 6 Nonterminal 5 Rule used

A five minute introduction Makefile

Makefiles: Motivation Typing the series of commands to generate our code can be tedious Multiple steps that depend on each other Somewhat complicated commands May not need to rebuild everything Makefiles solve these issues Record a series of commands in a script-like DSL Specify dependency rules and Make generates the results

Makefiles: Basic Structure <target>: <dependency list> <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java (tab)

Makefiles: Basic Structure <target>: <dependency list> <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java (tab)

Makefiles: Basic Structure <target>: <dependency list> <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java (tab) Example.class depends on example.java and IO.class

Makefiles: Basic Structure <target>: <dependency list> <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java (tab) Example.class depends on example.java and IO.class Example.class is generated by javac Example.java

Makefiles: Dependencies Example.class Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java Example.java IO.class IO.java

Makefiles: Dependencies Internal Dependency graph Example.class Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java Example.java IO.class IO.java

Makefiles: Dependencies Internal Dependency graph Example.class Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java Example.java IO.class A file is rebuilt if one of it’s dependencies changes IO.java

Makefiles: Variables You can thread common configuration values through your makefile

Makefiles: Variables You can thread common configuration values through your makefile Example JC = /s/std/bin/javac JFLAGS = -g

Makefiles: Variables You can thread common configuration values through your makefile Example JC = /s/std/bin/javac JFLAGS = -g Build for debug

Makefiles: Variables You can thread common configuration values through your makefile Example JC = /s/std/bin/javac JFLAGS = -g Example.class: Example.java IO.class $(JC) $(JFLAGS) Example.java IO.class: IO.java $(JC) $(JFLAGS) IO.java Build for debug

Makefiles: Phony Targets You can run commands via make Write a target with no dependencies (called phony) Will cause it to execute the command every time Example clean: rm –f *.class test: java –cp . Test.class

Makefiles: Phony Targets You can run commands via make Write a target with no dependencies (called phony) Will cause it to execute the command every time Example clean: rm –f *.class test: java –cp . Test.class

Makefiles: Phony Targets You can run commands via make Write a target with no dependencies (called phony) Will cause it to execute the command every time Example clean: rm –f *.class test: java –cp . Test.class

Recap We’ve defined context-free grammars More powerful than regular expressions Learned a bit about makefiles Next time: we’ll look at grammars in more detail