Syntax Analysis CSE 340 – Principles of Programming Languages

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Syntax Analysis CSE 340 – Principles of Programming Languages Fall 2016 Adam Doupé Arizona State University http://adamdoupe.com

Syntax Analysis The goal of syntax analysis is to transform the sequence of tokens from the lexer into something useful However, we need a way to specify and check if the sequence of tokens is valid NUM PLUS NUM DECIMAL DOT NUM ID DOT ID DOT DOT DOT NUM ID DOT ID

Using Regular Expressions PROGRAM = STATEMENT* STATEMENT = EXPRESSION | IF_STMT | WHILE_STMT | … OP = + | - | * | / EXPRESSION = (NUM | ID | DECIMAL) OP (NUM | ID | DECIMAL) 5 + 10 foo - bar 1 + 2 + 3

Using Regular Expressions Regular expressions are not sufficient to capture all programming constructs We will not go into the details in this class, but the reason is that regular languages (the set of all languages that can be described by regular expressions) cannot express languages with properties that we care about How to write a regular expression for matching parenthesis? L(R) = {𝜺, (), (()), ((())), …} Regular expressions (as we have defined them in this class) have no concept of counting (to ensure balanced parenthesis), therefore it is impossible to create R

Context-Free Grammars Syntax for context-free grammars Each row is called a production Non-terminals on the left Right arrow Non-terminals and terminals on the right Non-terminals will start with an upper case in our examples, terminals will be lowercase and are tokens S will typically be the starting non-terminal Example for matching parenthesis S → 𝜺 S → ( S ) Can also write more succinctly by combining production rules with the same starting non-terminals S→ ( S ) | 𝜺

CFG Example S→ ( S ) | 𝜺 Derivations of the CFG S⇒𝜺 S⇒ ( S ) ⇒ ( 𝜺 ) ⇒ () S ⇒ ( S) ⇒ ( ( S ) ) ⇒ ( ( 𝜺 ) ) ⇒ (())

CFG Example Exp→ Exp + Exp Exp→ Exp * Exp Exp→ NUM Exp ⇒ Exp * Exp ⇒ Exp * 3 ⇒ Exp + Exp * 3 ⇒ Exp + 2 * 3 ⇒ 1 + 2 * 3

CFG Example Exp→ Exp + Exp Exp→ Exp * Exp Exp→ NUM Exp ⇒ Exp * Exp ⇒ Exp * 3 ⇒ Exp + Exp * 3 ⇒ Exp + 2 * 3 ⇒ 1 + 2 * 3 TODO: Change this at some points to be neither a leftmost or a rightmost derivation

Leftmost Derivation Always expand the leftmost nonterminal Exp→ Exp + Exp Exp→ Exp * Exp Exp→ NUM Is this a leftmost derivation? Exp ⇒ Exp * Exp ⇒ Exp * 3 ⇒ Exp + Exp * 3 ⇒ Exp + 2 * 3 ⇒ 1 + 2 * 3 Exp ⇒ Exp * Exp ⇒ Exp + Exp * Exp ⇒ 1 + Exp * Exp ⇒ 1 + 2 * Exp ⇒ 1 + 2 * 3

Rightmost Derivation Always expand the rightmost nonterminal Exp→ Exp + Exp Exp→ Exp * Exp Exp→ NUM Exp ⇒ Exp * Exp ⇒ Exp * 3 ⇒ Exp + Exp * 3 ⇒ Exp + 2 * 3 ⇒ 1 + 2 * 3

Parse Tree We can also represent derivations using a parse tree May sound familiar Parse Tree Bytes Tokens Lexer Parser Source

Parse Tree Exp ⇒ Exp * Exp ⇒ Exp * 3 ⇒ Exp + Exp * 3 ⇒ Exp + 2 * 3 ⇒1 + 2 * 3 Exp Exp * Exp What computation does this parse tree represent? Exp + Exp 3 1 2

Parsing Derivations and parse tree can show how to generate strings that are in the language described by the grammar However, we need to turn a sequence of tokens into a parse tree Parsing is the process of determining the derivation or parse tree from a sequence of tokens Two major parsing problems: Ambiguous grammars Efficient parsing

Ambiguous Grammars Exp→ Exp + Exp Exp→ Exp * Exp Exp→ NUM How to parse 1 + 2 * 3? Exp ⇒ Exp * Exp ⇒ Exp + Exp * Exp ⇒ 1 + Exp * Exp ⇒ 1 + 2 * Exp ⇒ 1 + 2 * 3 Exp ⇒ Exp + Exp ⇒ 1 + Exp ⇒ 1 + Exp * Exp ⇒ 1 + 2 * Exp ⇒ 1 + 2 * 3

Ambiguous Grammars 1 + 2 * 3 Exp * 3 + 2 1 Exp Exp + Exp 1 Exp * Exp 2

Ambiguous Grammars A grammar is ambiguous if there exists two different leftmost derivations, or two different rightmost derivations, or two different parse trees for any string in the grammar Is English ambiguous? I saw a man on a hill with a telescope. Ambiguity is not desirable in a programming language Unlike in English, we don't want the compiler to read your mind and try to infer what you meant Fix

Parsing Approaches Various ways to turn strings into parse tree Bottom-up parsing, where you start from the terminals and work your way up Top-down parsing, where you start from the starting non-terminal and work your way down In this class, we will focus exclusively on top-down parsing We need to get into efficient parsing first to discuss the homework assignment

Top-Down Parsing S → A | B | C A → a B → Bb | b C → Cc | 𝜺 parse_S() { t_type = getToken() if (t_type == a) { ungetToken() parse_A() check_eof() } else if (t_type == b) { parse_B() else if (t_type == c) { ungetToken() parse_C() check_eof() } else if (t_type == eof) { // do EOF stuff else { syntax_error() Fix this

Predictive Recursive Descent Parsers Predictive recursive descent parser are efficient top-down parsers Efficient because they only look at next token, no backtracking/guessing To determine if a language allows a predictive recursive descent parser, we need to define the following functions FIRST(α), where α is a sequence of grammar symbols (non-terminals, terminals, and 𝜺) FIRST(α) returns the set of terminals and 𝜺 that begin strings derived from α FOLLOW(A), where A is a non-terminal FOLLOW(A) returns the set of terminals and $ (end of file) that can appear immediately after the non-terminal A

FIRST() Example S → A | B | C A → a B → Bb | b C → Cc | 𝜺 FIRST(S) = { a, b, c, 𝜺 } FIRST(A) = { a } FIRST(B) = { b } FIRST(C) = { 𝜺, c } When doing the animations, don't show S at start

Calculating FIRST(α) First, start out with empty FIRST() sets for all non-terminals in the grammar Then, apply the following rules until the FIRST() sets do not change: FIRST(x) = { x } if x is a terminal FIRST(𝜺) = { 𝜺 } If A → Bα is a production rule, then add FIRST(B) – { 𝜺 } to FIRST(A) If A → B0B1B2…BiBi+1…Bk and 𝜺 ∈ FIRST(B0) and 𝜺 ∈ FIRST(B1) and 𝜺 ∈ FIRST(B2) and … and 𝜺 ∈ FIRST(Bi), then add FIRST(Bi+1) – { 𝜺 } to FIRST(A) If A → B0B1B2…Bk and 𝜺 ∈ FIRST(B0) and 𝜺 ∈ FIRST(B1) and 𝜺 ∈ FIRST(B2) and … and 𝜺 ∈ FIRST(Bk), then add 𝜺 to FIRST(A)

Calculating FIRST Sets S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 INITIAL FIRST(S) = {} FIRST(S) = { } FIRST(S) = { a } FIRST(S) = { a, c, d, b} FIRST(S) = { a, c, d, b } FIRST(A) = {} FIRST(A) = { a } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = {} FIRST(B) = { b } FIRST(C) = {} FIRST(C) = { c, 𝜺 } FIRST(D) = {} FIRST(D) = { d, 𝜺 }

Calculating FIRST Sets S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 INITIAL FIRST(S) = {} FIRST(S) = { } FIRST(S) = { a } FIRST(S) = { a, c, d, b} FIRST(S) = { a, c, d, b } FIRST(A) = {} FIRST(A) = { a } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = {} FIRST(B) = { b } FIRST(C) = {} FIRST(C) = { c, 𝜺 } FIRST(D) = {} FIRST(D) = { d, 𝜺 }

Calculating FIRST Sets S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 INITIAL FIRST(S) = {} FIRST(S) = { } FIRST(S) = { a } FIRST(S) = { a, c, d, b} FIRST(S) = { a, c, d, b } FIRST(A) = {} FIRST(A) = { a } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = {} FIRST(B) = { b } FIRST(C) = {} FIRST(C) = { c, 𝜺 } FIRST(D) = {} FIRST(D) = { d, 𝜺 }

S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 FIRST(x) = { x } if x is a terminal FIRST(𝜺) = { 𝜺 } If A → Bα is a production rule, then add FIRST(B) – { 𝜺 } to FIRST(A) If A → B0B1B2…BiBi+1…Bk and 𝜺 ∈ FIRST(B0) and 𝜺 ∈ FIRST(B1) and 𝜺 ∈ FIRST(B2) and … and 𝜺 ∈ FIRST(Bi), then add FIRST(Bi+1) – { 𝜺 } to FIRST(A) If A → B0B1B2…Bk and 𝜺 ∈ FIRST(B0) and 𝜺 ∈ FIRST(B1) and 𝜺 ∈ FIRST(B2) and … and 𝜺 ∈ FIRST(Bk), then add ∈ to FIRST(A) S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 INITIAL FIRST(S) = {} FIRST(S) = { } FIRST(S) = { a } FIRST(S) = { a, c, d, b} FIRST(S) = { a, c, d, b } FIRST(A) = {} FIRST(A) = { a } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = {} FIRST(B) = { b } FIRST(C) = {} FIRST(C) = { c, 𝜺 } FIRST(D) = {} FIRST(D) = { d, 𝜺 }

FOLLOW() Example FOLLOW(A), where A is a non-terminal, returns the set of terminals and $ (end of file) that can appear immediately after the non-terminal A S → A | B | C A → a B → Bb | b C → Cc | 𝜺 FOLLOW(S) = { $ } FOLLOW(A) = { $ } FOLLOW(B) = { b, $ } FOLLOW(C) = { c, $ }

Calculating FOLLOW(A) First, calculate FIRST sets. Then, initialize empty FOLLOW sets for all non-terminals in the grammar Finally, apply the following rules until the FOLLOW sets do not change: If S is the starting symbol of the grammar, then add $ to FOLLOW(S) If B → αA, then add FOLLOW(B) to FOLLOW(A) If B → αAC0C1C2…Ck and 𝜺 ∈ FIRST(C0) and 𝜺 ∈ FIRST(C1) and 𝜺 ∈FIRST(C2) and … and 𝜺 ∈ FIRST(Ck), then add FOLLOW(B) to FOLLOW(A) If B → αAC0C1C2…Ck, then add FIRST(C0) – { 𝜺 } to FOLLOW(A) If B → αAC0C1C2…CiCi+1…Ck and 𝜺 ∈ FIRST(C0) and 𝜺 ∈ FIRST(C1) and 𝜺 ∈FIRST(C2) and … and 𝜺 ∈ FIRST(Ci), then add FIRST(Ci+1) – { 𝜺 } to FOLLOW(A)

Calculating FOLLOW Sets S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 FIRST(S) = { a, c, d, b } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = { b } FIRST(C) = { c, 𝜺 } FIRST(D) = { d, 𝜺 } INITIAL FOLLOW(S) = {} FOLLOW(S) = { $ } FOLLOW(A) = {} FOLLOW(A) = { b } FOLLOW(B) = {} FOLLOW(B) = { $, c, d } FOLLOW(C) = {} FOLLOW(C) = { $, d, b } FOLLOW(D) = {} FOLLOW(D) = { $, b }

Calculating FOLLOW Sets S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 FIRST(S) = { a, c, d, b } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = { b } FIRST(C) = { c, 𝜺 } FIRST(D) = { d, 𝜺 } INITIAL FOLLOW(S) = {} FOLLOW(S) = { $ } FOLLOW(A) = {} FOLLOW(A) = { b } FOLLOW(B) = {} FOLLOW(B) = { $, c, d } FOLLOW(C) = {} FOLLOW(C) = { $, d, b } FOLLOW(D) = {} FOLLOW(D) = { $, b }

Calculating FOLLOW Sets S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 FIRST(S) = { a, c, d, b } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = { b } FIRST(C) = { c, 𝜺 } FIRST(D) = { d, 𝜺 } INITIAL FOLLOW(S) = {} FOLLOW(S) = { $ } FOLLOW(A) = {} FOLLOW(A) = { b } FOLLOW(B) = {} FOLLOW(B) = { $, c, d } FOLLOW(C) = {} FOLLOW(C) = { $, d, b } FOLLOW(D) = {} FOLLOW(D) = { $, b }

If S is the starting symbol of the grammar, then add $ to FOLLOW(S) If B → αA, then add FOLLOW(B) to FOLLOW(A) If B → αAC0C1C2…Ck and 𝜺 ∈ FIRST(C0) and 𝜺 ∈ FIRST(C1) and 𝜺 ∈FIRST(C2) and … and 𝜺 ∈ FIRST(Ck), then add FOLLOW(B) to FOLLOW(A) If B → αAC0C1C2…Ck, then add FIRST(C0) – { 𝜺 } to FOLLOW(A) If B → αAC0C1C2…CiCi+1…Ck and 𝜺 ∈ FIRST(C0) and 𝜺 ∈ FIRST(C1) and 𝜺 ∈FIRST(C2) and … and 𝜺 ∈ FIRST(Ci), then add FIRST(Ci+1) – { 𝜺 } to FOLLOW(A) S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 FIRST(S) = { a, c, d, b } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = { b } FIRST(C) = { c, 𝜺 } FIRST(D) = { d, 𝜺 } INITIAL FOLLOW(S) = {} FOLLOW(S) = { $ } FOLLOW(A) = {} FOLLOW(A) = { b } FOLLOW(B) = {} FOLLOW(B) = { $, c, d } FOLLOW(C) = {} FOLLOW(C) = { $, d, b } FOLLOW(D) = {} FOLLOW(D) = { $, b }

If S is the starting symbol of the grammar, then add $ to FOLLOW(S) If B → αA, then add FOLLOW(B) to FOLLOW(A) If B → αAC0C1C2…Ck and 𝜺 ∈ FIRST(C0) and 𝜺 ∈ FIRST(C1) and 𝜺 ∈FIRST(C2) and … and 𝜺 ∈ FIRST(Ck), then add FOLLOW(B) to FOLLOW(A) If B → αAC0C1C2…Ck, then add FIRST(C0) – { 𝜺 } to FOLLOW(A) If B → αAC0C1C2…CiCi+1…Ck and 𝜺 ∈ FIRST(C0) and 𝜺 ∈ FIRST(C1) and 𝜺 ∈FIRST(C2) and … and 𝜺 ∈ FIRST(Ci), then add FIRST(Ci+1) – { 𝜺 } to FOLLOW(A) S → ABCD A → CD | aA B → b C → cC | 𝜺 D → dD | 𝜺 FIRST(S) = { a, c, d, b } FIRST(A) = { a, c, d, 𝜺 } FIRST(B) = { b } FIRST(C) = { c, 𝜺 } FIRST(D) = { d, 𝜺 } INITIAL FOLLOW(S) = {} FOLLOW(S) = { $ } FOLLOW(A) = {} FOLLOW(A) = { b } FOLLOW(B) = {} FOLLOW(B) = { $, c, d } FOLLOW(C) = {} FOLLOW(C) = { $, d, b } FOLLOW(D) = {} FOLLOW(D) = { $, b }

Predictive Recursive Descent Parsers At each parsing step, there is only one grammar rule that can be chosen, and there is no need for backtracking The conditions for a predictive parser are both of the following If A → α and A → β, then FIRST(α) ∩ FIRST(β) = ∅ If 𝜺 ∈ FIRST(A), then FIRST(A) ∩ FOLLOW(A) = ∅

Creating a Predictive Recursive Descent Parser Create a CFG Calculate FIRST and FOLLOW sets Prove that CFG allows a Predictive Recursive Descent Parser Write the predictive recursive descent parser using the FIRST and FOLLOW sets

Email Addresses How to parse/validate email addresses? name @ domain.tld Turns out, it is not so simple "cse 340"@example.com customer/department=shipping@example.com "Abc@def"@example.com "Abc\@def"@example.com "Abc\"@example.com"@example.com test "example @hello" <test@example.com> In fact, a company called Mailgun, which provides email services as an API, released an open-source tool to validate email addresses, based on their experience with real-world email How did they implement their parser? A recursive descent parser https://github.com/mailgun/flanker

Email Address CFG quoted-string atom dot-atom whitespace Address → Name-addr-rfc | Name-addr-lax | Addr-spec Name-addr-rfc → Display-name-rfc Angle-addr-rfc | Angle-addr-rfc Display-name-rfc → Word Display-name-rfc-list | whitespace Word Display-name-rfc-list Display-name-rfc-list → whitespace Word Display-name-rfc-list | epsilon Angle-addr-rfc → < Addr-spec > | whitespace < Addr-spec > | whitespace < Addr-spec > whitespace | < Addr-spec > whitespace Name-addr-lax → Display-name-lax Angle-addr-lax | Angle-addr-lax Display-name-lax → whitespace Word Display-name-lax-list whitespace | Word Display-name-lax-list whitespace Display-name-lax-list → whitespace Word Display-name-lax-list | epsilon Angle-addr-lax → Addr-spec | Addr-spec whitespace Addr-spec → Local-part @ Domain | whitespace Local-part @ Domain | whitespace Local-part @ Domain whitespace | Local-part @ Domain whitespace Local-part → dot-atom | quoted-string Domain → dot-atom Word → atom | quoted-string CFG taken from https://github.com/mailgun/flanker

Simplified Email Address CFG quoted-string (q-s) atom dot-atom (d-a) quoted-string-at (q-s-a) dot-atom-at (d-a-a) Address → Name-addr | Addr-spec Name-addr → Display-name Angle-addr | Angle-addr Display-name → Word Display-name-list Display-name-list → Word Display-name-list | 𝜺 Angle-addr → < Addr-spec > Addr-spec → d-a-a Domain | q-s-a Domain Domain → d-a Word → atom | q-s

Address → Name-addr | Addr-spec Name-addr → Display-name Angle-addr | Angle-addr Display-name → Word Display-name-list Display-name-list → Word Display-name-list | 𝜺 Angle-addr → < Addr-spec > Addr-spec → d-a-a Domain | q-s-a Domain Domain → d-a Word → atom | q-s FIRST INITIAL Address {} { d-a-a, q-s-a } { d-a-a, q-s-a, < } { d-a-a, q-s-a, <, atom, q-s } Name-addr { < } { <, atom, q-s } Display-name { atom, q-s } Display-name-list { 𝜺 } { 𝜺, atom, q-s } Angle-addr Addr-spec Domain { d-a } Word

Address → Name-addr | Addr-spec Name-addr → Display-name Angle-addr | Angle-addr Display-name → Word Display-name-list Display-name-list → Word Display-name-list | 𝜺 Angle-addr → < Addr-spec > Addr-spec → d-a-a Domain | q-s-a Domain Domain → d-a Word → atom | q-s FIRST INITIAL Address {} { d-a-a, q-s-a } { d-a-a, q-s-a, < } { d-a-a, q-s-a, <, atom, q-s } Name-addr { < } { <, atom, q-s } Display-name { atom, q-s } Display-name-list { 𝜺 } { 𝜺, atom, q-s } Angle-addr Addr-spec Domain { d-a } Word

Address → Name-addr | Addr-spec Name-addr → Display-name Angle-addr | Angle-addr Display-name → Word Display-name-list Display-name-list → Word Display-name-list | 𝜺 Angle-addr → < Addr-spec > Addr-spec → d-a-a Domain | q-s-a Domain Domain → d-a Word → atom | q-s FIRST(Address) = { d-a-a, q-s-a, <, atom, q-s } FIRST(Name-addr) = { <, atom, q-s } FIRST(Display-name) = { atom, q-s } FIRST(Display-name-list) = { 𝜺, atom, q-s } FIRST(Angle-addr) = { < } FIRST(Addr-spec) = { d-a-a, q-s-a } FIRST(Domain) = { d-a } FIRST(Word) = { atom, q-s } FOLLOW INITIAL Address {} { $ } Name-addr Display-name { < } Display-name-list Angle-addr Addr-spec { $, > } Domain Word { atom, q-s, < }

Address → Name-addr | Addr-spec Name-addr → Display-name Angle-addr | Angle-addr Display-name → Word Display-name-list Display-name-list → Word Display-name-list | 𝜺 Angle-addr → < Addr-spec > Addr-spec → d-a-a Domain | q-s-a Domain Domain → d-a Word → atom | q-s FIRST(Name-addr) ∩ FIRST(Addr-spec) FIRST(Display-name Angle-addr) ∩ FIRST(Angle-addr) FIRST(Word Display-name-list) ∩ FIRST(𝜺) FIRST(d-a-a Domain) ∩ FIRST(q-s-a Domain) FIRST(atom) ∩ FIRST(q-s) FIRST(Display-name-list) ∩ FOLLOW(Display-name-list) FIRST(Address) = { d-a-a, q-s-a, <, atom, q-s } FIRST(Name-addr) = { <, atom, q-s } FIRST(Display-name) = { atom, q-s } FIRST(Display-name-list) = { 𝜺, atom, q-s } FIRST(Angle-addr) = { < } FIRST(Addr-spec) = { d-a-a, q-s-a } FIRST(Domain) = { d-a } FIRST(Word) = { atom, q-s } FOLLOW(Address) = { $ } FOLLOW(Name-addr) = { $ } FOLLOW(Display-name) = { < } FOLLOW(Display-name-list) = { < } FOLLOW(Angle-addr) = { $ } FOLLOW(Addr-spec) = { $, > } FOLLOW(Domain) = { $, > } FOLLOW(Word) = { atom, q-s, < }

Address → Name-addr | Addr-spec FOLLOW(Address) = { $ } FIRST(Address) = { d-a-a, q-s-a, <, atom, FOLLOW(Name-addr) = { $ } q-s } FOLLOW(Addr-spec) = { $, > } FIRST(Name-addr) = { <, atom, q-s } FIRST(Addr-spec) = { d-a-a, q-s-a } parse_Address() { t_type = getToken(); // Check FIRST(Name-addr) if (t_type == < || t_type == atom || t_type == q-s ) { ungetToken(); parse_Name-addr(); printf("Address -> Name-addr"); } // Check FIRST(Addr-spec) else if (t_type == d-a-a || t_type == q-s-a) { parse_Addr-spec(); printf("Address -> Addr-spec"); else { syntax_error();

Name-addr → Display-name Angle-addr | Angle-addr FOLLOW(Name-addr) = { $ } FOLLOW(Display-name) = { < } FIRST(Name-addr) = { <, atom, q-s } FOLLOW(Angle-addr) = { $ } FIRST(Display-name) = { atom, q-s } FIRST(Angle-addr) = { < } parse_Name-addr() { t_type = getToken(); // Check FIRST(Display-name Angle-addr) if (t_type == atom || t_type == q-s) { ungetToken(); parse_Display-name(); parse_Angle-addr(); printf("Name-addr -> Display-name Angle-addr"); } // Check FIRST(Angle-addr) else if (t_type == <) { printf("Name-addr -> Angle-addr"); else { syntax_error();

Display-name → Word Display-name-list FOLLOW(Display-name-list) = { < } FIRST(Display-name) = { atom, q-s } FOLLOW(Word) = { atom, q-s, < } FIRST(Display-name-list) = { 𝜺, atom, q-s } FIRST(Word) = { atom, q-s } FOLLOW(Display-name) = { < } parse_Display-name() { t_type = getToken(); // Check FIRST(Word Display-name-list) if (t_type == atom || t_type == q-s) { ungetToken(); parse_Word(); parse_Display-name-list(); printf("Display-name -> Word Display-name-list"); } else { syntax_error();

Display-name-list → Word Display-name-list | 𝜺 FOLLOW(Word) = { atom, q-s, < } FIRST(Display-name-list) = { 𝜺, atom, q-s } FIRST(Word) = { atom, q-s } FOLLOW(Display-name-list) = { < } parse_Display-name-list() { t_type = getToken(); // Check FIRST( Word Display-name-list) if (t_type == atom || t_type == q-s) { ungetToken(); parse_Word(); parse_Display-name-list(); printf("Display-name-list -> Word Display-name-list"); } // Check FOLLOW(Display-name-list) else if (t_type == <) { printf("Display-name-list -> 𝜺"); else { syntax_error(); }

Angle-addr → < Addr-spec > FOLLOW(Addr-spec) = { $, > } FIRST(Angle-addr) = { < } FIRST(Addr-spec) = { d-a-a, q-s-a } FOLLOW(Angle-addr) = { $ } parse_Angle-addr() { t_type = getToken(); // Check FIRST(< Addr-spec >) if (t_type == <) { // ungetToken()? parse_Addr-spec(); if (t_type != >) { syntax_error(); } printf("Angle-addr -> < Addr-spec >"); else {

Addr-spec → d-a-a Domain | q-s-a Domain FOLLOW(Domain) = { $, > } FIRST(Addr-spec) = { d-a-a, q-s-a } FIRST(Domain) = { d-a } FOLLOW(Addr-spec) = { $, > } parse_Addr-spec() { t_type = getToken(); // Check FIRST(d-a-a Domain) if (t_type == d-a-a) { // ungetToken()? parse_Domain(); printf("Addr-spec -> d-a-a Domain"); } // Check FIRST(q-s-a Domain) else if (t_type == q-s-a) { printf("Addr-spec -> q-s-a Domain"); else { syntax_error(); }

Domain → d-a FIRST(Domain) = { d-a } FOLLOW(Domain) = { $, > } parse_Domain() { t_type = getToken(); // Check FIRST(d-a) if (t_type == d-a) { printf("Domain -> d-a"); } else { syntax_error();

Word → atom | q-s FIRST(Word) = { atom, q-s } FOLLOW(Word) = { atom, q-s, < } parse_Word() { t_type = getToken(); // Check FIRST(atom) if (t_type == atom) { printf("Word -> atom"); } // Check FIRST(q-s) else if (t_type == q-s) { printf("Word -> q-s"); else { syntax_error();

Predictive Recursive Descent Parsers For every non-terminal A in the grammar, create a function called parse_A For each production rule A → α (where α is a sequence of terminals and non-terminals), if getToken() ∈ FIRST(α) then choose the production rule A → α For every terminal and non-terminal a in α, if a is a non-terminal call parse_a, if a is a terminal check that getToken() == a If 𝜺 ∈ FIRST(α), then check that getToken() ∈ FOLLOW(A), then choose the production A → α If getToken() ∉ FIRST(A), then syntax_error(), unless 𝜺 ∈ FIRST(A), then getToken() ∉ FOLLOW(A) is syntax_error()