The meaning of it all! The role of finite automata and grammars in compiler design.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
LESSON 18.
Introduction The compilation approach uses a program called a compiler, which translates programs written in a high-level programming language into machine.
Chapter 4 Lexical and Syntax Analysis Sections
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
ISBN Chapter 4 Lexical and Syntax Analysis.
ISBN Chapter 4 Lexical and Syntax Analysis.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Lexical and Syntax Analysis
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
ISBN Lecture 04 Lexical and Syntax Analysis.
Chapter 4 Lexical and Syntax Analysis. Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing.
Lexical and syntax analysis
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
 How to check whether an input string is a sentence of a grammar and how to construct a parse tree for the string.  A Parser for grammar G is a program.
CS 321 Programming Languages and Compilers Bottom Up Parsing.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
ISBN Chapter 4 Lexical and Syntax Analysis.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Syntax and Semantics Structure of programming languages.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Programming Languages Translator
CS510 Compiler Lecture 4.
Bottom-Up Parsing.
Parsing and Parser Parsing methods: top-down & bottom-up
Unit-3 Bottom-Up-Parsing.
Parsing IV Bottom-up Parsing
Table-driven parsing Parsing performed by a finite state machine.
Compiler Construction
4 (c) parsing.
Parsing Techniques.
Lexical and Syntax Analysis
BOTTOM UP PARSING Lecture 16.
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Parsing IV Bottom-up Parsing
Bottom-Up Parsing “Shift-Reduce” Parsing
BNF 9-Apr-19.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

The meaning of it all! The role of finite automata and grammars in compiler design

Compiler-Compilers! There are many applications of automata and grammars inside and outside computer science, the main applications in computer science being in the area of compiler design. Suppose we have designed (on paper) a new programming language with nice features. We have worked out their syntax and the way they should work. Now, all we need is a compiler for this language. It’s too complex to write a compiler from scratch! What we could do is to make use of theoretical tools like automata and grammars that recognize/generate strings of symbols of various kinds* and formally specify the syntax of the new computer programming language. Such a formal specification (plus other details) can be used by “magical programs” known as compiler-compilers to automatically generate a compiler for the programming language! But for these theoretical tools we would have to spell out the syntax of a language in, say, plain English which will not be precise enough and programs would find it hard to “understand” such a description to be able to generate a compiler on its own. * A program can be viewed as a (very long!) string that adheres to certain rules dictated by the programming language.

Admiral Grace Hopper, Pioneer of compiler design

Lexical Analysis Lexical Analyzer Lexical Analyzer for(i=0;i<=10;i++) identifier keyword i 10 Symbol Table Raw stream of characters Stream of tokens for( i=0;<=10++)i;i constant

Parsing ‘Parse’ – to relateStatement for(;;)statement FOR-statement exp Assign_stmt expexp exp expid=iconst 0 id++i <=idconst i10 exp For ( I = 0 ; I <= 10 ; I ++ ) assignment

Finite state automata as lexical analysers 0 W 1 H 2 I 3 L 4 E 5 6 other than letter/digit 789 F OR FI other than letter/digit L 15 S 16 E other than letter/digit E Automaton for recognizing keywords Automaton for recognizing identifiers 0 letter 1 other than letter / digit letter, digit 2

Converting a Finite state automaton into a computer program A: Read next_char If next_char is a letter goto B else FAIL( ) B: Read next_char If next_char is either a letter or a digit goto B else goto C Automata for recognizing identifiers A letter B other than letter / digit letter, digit C Note: Instead of using “A” and “B” as labels for GOTO statements, one could use them as names of individual functions/procedures that can be invoked. FAIL( ) is a function that “puts back” the character just read and starts up the next transition diagram.

Grammars as syntax specification tools Finite state automata are used to describe tokens. Grammars are much more “expressive” than finite state automata and can be used to describe more complicated syntactical structures in a program---for instance, the syntax of a FOR statement in C language. Grammars only describe/generate strings. We need a process which, given an input string (a statement in a program, say), pronounces whether it is derivable from a given grammar or not. Such a process is known as parsing.

Types of Parsing S  aAcBe A  Ab | b B  d a b b c d e A A B S S (i) (ii) Reducing the input string to the start symbol “Expanding” the start symbol down to the input string AB ace b A b d “EXPANDING” the start symbol (according to production rules of the given grammar), and subsequently every non-terminal symbol that occurs in the “expansion”* till we arrive at the input string. (* technically, it is called a sentential form ) top down bottom up We take a “chunk” of the input string and REDUCE it to ( replace it with) the symbol on LHS of a production rule. In other words, the parse tree is constructed by beginning at the leaves and working up towards the root.

Shift-Reduce: a bottom up parsing technique abbcde Input $ a We shift symbols from input string (from left to right) usually onto a stack so that the “chunk” of symbols (matching the RHS of a production) which is to be reduced to the corresponding LHS wiil eventually appear on top of stack. (The chunk getting reduced is referred to as the “handle”.) S  aAcBe A  Ab | b B  d $ b $ a A b $ a A c d $ a A c B e $ S

What is a handle ? A substring of the input string that matches the RHS of a production replacing which (by the corresponding LHS) would eventually lead to a reduction to the start symbol is called a handle. S  aAcBe A  Ab | b B  d S  aAcBeC  aAcde  aAbcde  abbcde A Right Most Derivation (RMD) Non-terminal symbols on the right get expanded first before those on the left get expanded. When we do this in reverse, though, (now reducing symbols- -- not expanding ) pieces of string on the left get reduced first before those on the right. Bottom up parsing can be viewed as “RMD in reverse direction”.

The problem with discovering handles Discovering the handle may not be easy always! There may be more than one substring emerging on top-of-stack that matches the RHS of a production. a b b c d e AA $ a A b $ a b S  aAcBe A  Ab | b B  d $ a A ? $ a A A There’s no way a AAcde can be reduced to S. (When we make an incorrect choice of handle we get stuck half-way through, before we can arrive at the start symbol.)

The problem with discovering handles In the exercises we did, we took decisions as to when to shift and when to reduce symbols (by ourselves, using our cleverness!). However, these can (and must ) be done automatically by the parser program in tune with the given grammar. The well-known LR parser can do this and is beyond our present scope.

Top down parsing Formal : Construct parse tree (for the input) by beginning at the root and creating the nodes (of the tree) in preorder. In other words, it’s an attempt to find a Left Most Derivation for an input string. Informal : Instead of starting to work on the input string and reduce it to the start symbol (by replacing “chunks” of it with non-terminal symbols), we begin with the start symbol itself and ask: “How can I expand this in order to arrive (eventually) at the input string?” We ask the same question for every non-terminal symbol occurring in the resulting expansions. We choose an appropriate expansion of a certain non-terminal by glancing at the input string, i.e. by taking cues from the symbol being scanned (and also the next few symbols) in the input.

Top down parsing: an example S  cAd A  ab | a cad Input Acd S a b A cd S Start with S. Only one expansion is possible. OK! It matches with the first symbol in the input. Now, how to expand A ? Try every expansion one by one! cad mismatch! (so, try another expansion) match! cad a A cd S (so, move on!) cad a A cd S match! (we’re done!) (i)(ii) (iii) (iv)

Top down parsing: an example function S( ) { if input_symbol = ‘ c ’ then { ADVANCE( ); if A( ) then { if input_symbol = ‘d’ then { ADVANCE( ); return TRUE; } return FALSE; } function A( ) { isave = input_pointer; if input_symbol = ‘ a ’ then { ADVANCE( ); if input_symbol = ‘b’ then { ADVANCE( ); return TRUE; } input_pointer = isave; /* Try second expansion */ if input_symbol = ‘a’ then { ADVANCE( ); return TRUE; } return FALSE; } A program to do top-down parsing might use a separate procedure for every non- terminal

Problems with this approach (i) Order in which the expansions are tried S  cAd A  a | ab cab Input d a A cd S mismatch! match d is part of S and hence a new expansion for S will be tried (in vain!). Hence, cabd will be rejected as invalid (but is actually valid). Remedy: Rewrite grammar so that there is no more than ONE expansion for every non-terminal sharing the same “prefix”; use “left factoring” to realise this.

Problems with this approach (ii) Left recursion A  AαA  Aα Production rule of the said form exhibits (immediate) left recursion. (More precisely,) a grammar has left recursion if, at some point, A “yields” A α, i.e. if A α can be derived from A in one or more steps. Why is left recursion dangerous? It’s because the function A ( ) corresponding to non-terminal A ) will be forced to invoke itself repeatedly and endlessly. Remedy: To eliminate left recursion (from the grammar)!

Eliminating (immediate) left recursion A  Aα | β A  β A’ A’  αA’ | ε A  A α  A α α  A α α α  β α α α E  E + T | T T  T * F | F F  ( E ) | id E  TE’ E’  + TE’ | ε T  FT’ T’  *FT’ | ε F  ( E ) | id e.g.

Left factoring A  αβ | αγ A  αA’ A’  β | γ S  iCtS | iCtSeS | a C  b e.g. S  iCtSS’ | a S’  eS | ε C  b