Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
ISBN Chapter 3 Describing Syntax and Semantics.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 3: Formal Translation Models
Chapter 2 A Simple Compiler
COP4020 Programming Languages
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
ISBN Chapter 3 Describing Syntax and Semantics.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
LESSON 04.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analyzer (Parser)
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
1 CS Programming Languages Class 04 September 5, 2000.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Programming Languages Translator
CS510 Compiler Lecture 4.
PROGRAMMING LANGUAGES
CSE 3302 Programming Languages
R.Rajkumar Asst.Professor CSE
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
COMPILER CONSTRUCTION
Presentation transcript:

Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review of regular expressions, context-free grammars, derivation, parsing and ambiguity Lecture notes available online before class: go to Schedule

Programming Language Syntax Read: Scott, Chapter 2.1 and 2.2

Spring 16 CSCI 4430, A Milanova 3 Lecture Outline Formal languages Regular expressions Context-free grammars Derivation Parse Parse trees Ambiguity Scanning

Spring 16 CSCI 4430, A Milanova 4 Last Class: Compiler Scanner Parser Machine-independent code improvement Code generation character stream token stream parse tree modified intermediate form target language (assember) Semantic analysis and intermediate code generation abstract syntax tree or intermediate form Machine-dependent code improvement modified target language

Spring 16 CSCI 4430, A Milanova 5 Syntax and Semantics Syntax is the form or structure of expressions, statements, and program units of a given language Syntax of a Java while statement: while ( boolean_expr ) statement Partial syntax of an if statement: if ( boolean_expr ) statement Semantics is the meaning of expressions, statements and program units of a given language Semantics of while ( boolean_expr ) statement Execute statement repeatedly (0 or more times) as long as boolean_expr evaluates to true

Spring 16 CSCI 4430, A Milanova 6 Formal Languages Theoretical foundations – Automata Theory A language is a set of strings (also called sentences) over a finite alphabet A generator is a set of rules that generate the strings in the language A recognizer reads input strings and determines whether they belong to the language Languages are characterized by the complexity of generation/recognition rules E.g., regular languages E.g., context-free languages

Question What are the classes of formal languages? The Chomsky hierarchy: Regular languages Context-free languages Context-sensitive languages Recursively enumerable languages Spring 16 CSCI 4430, A Milanova 7

8 Formal Languages Recognizers become more complex as languages become more complex Regular languages Describe PL tokens (e.g., keywords, identifiers, numeric literals) Generated by Regular Expressions Recognized by a Finite Automaton (scanner) Context-free languages Describe more complex PL constructs (e.g., expressions and statements) Generated by a Context-free Grammar Recognized by a Push-down Automaton (parser) Even more complex constructs

Spring 16 CSCI 4430, A Milanova 9 Formal Languages Main application of formal languages: enable proof of relative difficulty of certain computational problems Our focus: formal languages provide the formalism for describing PL constructs Most compelling application of formal languages! Building a scanner Building a parser Central issue: build efficient, linear-time parsers

Spring 16 CSCI 4430, A Milanova 10 Regular Expressions Simplest structure Formalism to describe the simplest programming language constructs, the tokens each symbols (e.g., “+”, “-”) is a token an identifier (e.g., rate, initial) is a token a numeric constant (e.g., 59) is a token etc. Recognized by a finite automaton

Regular Expressions A Regular Expression is one of the following: A character, e.g., a The empty string, denoted by  Two regular expressions next to each other, R 1 R 2, meaning any string generated by R 1 followed by (concatenated with) any string generated by R 2 Two regular expressions separated by |, R 1 | R 2 meaning any string generated by R 1 or any string generated by R 2 Spring 16 CSCI 4430, A Milanova 11

Question What is the language defined by reg. exp. (a | b) (a a | b b) ? {aaa, abb, baa, bbb} We saw concatenation and alternation. What operation is still missing? Spring 16 CSCI 4430, A Milanova 12

Regular Expressions A Regular Expression is one of the following: A character, e.g., a The empty string, denoted by  R 1 R 2 R 1 | R 2 A regular expression followed by a Kleene star, R*, meaning the concatenation of zero or more strings generated by R E.g., a * generates { , a, aa, aaa, … } Spring 16 CSCI 4430, A Milanova 13

Regular Expressions Operator precedence Kleene * has highest precedence Followed by concatenation Followed by alternation | E.g., a b | c generates … E.g., a b* generates … Spring 16 CSCI 4430, A Milanova 14

Question What is the language defined by regular expression (0 | 1) * 1 ? Answer: all strings of 0 s and 1 s that end with 1 What about 0* (1 0* 1 0*)* ? Spring 16 CSCI 4430, A Milanova 15

Spring 16 CSCI 4430, A Milanova 16 Regular Expressions in Programming Languages Describe tokens Let letter  a|b|c| … |z digit  1|2|3|4|5|6|7|8|9|0 Which token is this? 1. letter ( letter | digit )* ? 2. digit digit * ? 3. digit *. digit digit * ?

Regular Expressions in Programming Languages Which token is this: number  integer | real real  integer exponent | decimal ( exponent | ε ) decimal  digit* (. digit | digit. ) digit* exponent  ( e | E ) ( + | - | ε ) integer integer  digit digit* digit  1|2|3|4|5|6|7|8|9|0 Spring 16 CSCI 4430, A Milanova 17

Context-Free Grammars Unfortunately, regular languages cannot specify all constructs in programming E.g., can we write a regular expression that specifies valid arithmetic expressions? id * ( id + id * ( number – id ) ) Among other things, we need to ensure that parentheses are matched! Answer is no. We need context-free languages and context-free grammars! Spring 16 CSCI 4430, A Milanova 18

Spring 16 CSCI 4430, A Milanova 19 Grammar A grammar is a formalism to describe the strings of a (formal) language A grammar consists of a set of terminals, set of nonterminals, a set of productions, a start symbol Terminals are the characters in the alphabet Nonterminals represent language constructs Productions are rules for forming syntactically correct constructs Start symbol tells where to start applying the rules

Spring 16 CSCI 4430, A Milanova 20 Notation Specification of identifier: Regular expression: letter ( letter | digit )* BNF: ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 ::= a | b | c | … | x | y | z ::= | | Textbook and slides: digit  1|2|3|4|5|6|7|8|9|0 (also BNF)letter  a|b|c|d|…|z id  letter | id letter | id digit Nonterminals shown in italic Terminals shown in typewriter

Spring 16 CSCI 4430, A Milanova 21 Regular Grammars Regular grammars generate regular languages The rules in regular grammars are of the form: Each left-hand-side (lhs) has exactly one nonterminal Each right-hand-side (rhs) is one of the following A single terminal symbol or A single nonterminal symbol or A nonterminal followed by a terminal e.g., 1 2 * | 0 + S  A | B A  1 | A 2 B  0 | B 0

Question Is this a regular grammar: Spring 16 CSCI 4430, A Milanova 22 S  0 A A  S 1 S  ε

Spring 16 CSCI 4430, A Milanova 23 Lecture Outline Formal languages Regular expressions Context-free grammars Derivation Parse Parse trees Ambiguity Scanning

Spring 16 CSCI 4430, A Milanova 24 Context-free Grammars (CFGs) Context-free grammars generate context-free languages Most of what we need in programming languages can be specified with CFGs Context-free grammars have rules of the form: Each left-hand-side has exactly one nonterminal Each right-hand-side contains an arbitrary sequence of terminals and/or nonterminals A context-free grammar e.g. 0 n 1 n,n≥1S  0 S 1 S  0 1

Question Can you give examples of a non-context-free language? E.g., a n b m c n d m n≥1, m≥1 E.g., wcw where w is in (0|1)* E.g., a n b n c n n≥1(canonical example) Spring 16 CSCI 4430, A Milanova 25

Spring 16 CSCI 4430, A Milanova 26 Context-free Grammars Can be used to generate strings in the context-free language (derivation) Can be used to recognize well-formed strings in the context-free language (parse) We are concerned with two special CFGs, called LL and LR grammars

Derivation Simple context-free grammar for expressions: expr  id | ( expr ) | expr op expr op  + | * We can generate (derive) expressions: expr  expr op expr  expr op id  expr + id  expr op expr + id  expr op id + id  expr * id + id  id * id + id sentential form sentence, string or yield 27

Derivation A derivation is the process that starts from the start symbol, and at each step, replaces a nonterminal with the right-hand-side of a production E.g., expr op expr derives expr op id We replaced the right (underlined) expr with id due to production expr  id An intermediate sentence is called a sentential form E.g., expr op id is a sentential form 28

Derivation The resulting sentence is called yield E.g., id*id+id is the yield of our derivation What is a left-most derivation? Replaces the left-most nonterminal in the sentential form at each step What is a right-most derivation? Replaces the right-most nonterminal in the sentential form at each step There are derivations that are neither left- or right-most 29

Question What kind of derivation is this: expr  expr op expr  expr op id  expr + id  expr op expr + id  expr op id + id  expr * id + id  id * id + id 30 Spring 16 CSCI 4430, A Milanova

Question What kind of derivation is this: expr  expr op expr  expr op id  expr + id  expr op expr + id  id op expr + id  id op id + id  id * id + id Spring 16 CSCI 4430, A Milanova 31

Parse Recall our context-free grammar for expressions: expr  id | ( expr ) | expr op expr op  + | * A parse is the reverse of a derivation Spring 16 CSCI 4430, A Milanova 32 id * id + id  expr * id + id  expr op id + id  expr op expr + id  expr + id  expr op id  expr op expr  expr

Parse A parse starts with the string of terminals, and at each step, replaces the right-hand- side (rhs) of a production with the left-hand- side (lhs) of that production. E.g., …  expr op expr + id  expr + id Here we replaced expr op expr (the rhs of production expr  expr op expr) with expr (the lhs of the production) Spring 16 CSCI 4430, A Milanova 33

Parse Tree expr  id | ( expr ) | expr op expr op  + | * 34 Internal nodes are nonterminals. Children are the rhs of a rule for that nonterminal. Leaf nodes are terminals. expr  expr op expr  expr op id  expr + id  expr op expr + id  expr op id + id  expr * id + id  id * id + id expr op expr id expr op expr id expr * +

35 Ambiguity A grammar is ambiguous if some string can be generated by two or more distinct parse trees There is no algorithm which can tell if an arbitrary context-free grammar is ambiguous Ambiguity arises in programming language grammars Arithmetic expressions If-then-else: the dangling else problem Ambiguity is bad Spring 16 CSCI 4430, A Milanova

Ambiguity expr  id | ( expr ) | expr op expr op  + | * How many parse trees for id * id + id ? Which one is “correct”? 36 id + expr op expr id expr op expr id expr * + expr op expr id expr op expr expr * Tree 1:Tree 2:

Ambiguity expr  id | ( expr ) | expr op expr op  + | * How many parse trees for id + id + id ? Which one is “correct”? 37 id + expr op expr id expr op expr id expr + + expr op expr id expr op expr expr + Tree 1:Tree 2:

38 Handling Ambiguity Rewrite the grammar into unambiguous one: expr  expr + term | term term  term * factor | factor factor  id | ( expr ) Forces left associativity of + and * Forces higher precedence of * over + Our ambiguous grammar, slightly simplified: expr  id | ( expr ) | expr + expr | expr * expr

39 Rewriting Expression Grammars: Intuition expr  id | ( expr ) | expr + expr | expr * expr Spring 16 CSCI 4430, A Milanova

40 Rewriting Expression Grammars: Intuition E.g., look at id + id*id*id + id + id*id expr + term id*id id expr terms in the sum expr + term id*id*id id term Spring 16 CSCI 4430, A Milanova

41 Rewriting Expression Grammars: Intuition Another new nonterminal, factor and productions: term  term * factor | factor factor  id | ( expr ) Spring 16 CSCI 4430, A Milanova

42 Lecture Outline Formal languages Regular expressions Context-free grammars Derivation Parse Parse trees Ambiguity Scanning

Scanner groups characters into tokens Scanner simplifies the job of the Parser Scanner is essentially a Finite Automaton Regular expressions specify the syntax of tokens Scanner recognizes the tokens in the program 43 Scanner Parser position = initial + rate * 60; id = id + id * 60

Question Why most programming languages disallow nested multi-line comments? Spring 16 CSCI 4430, A Milanova 44

Calculator Language Tokens times  * plus  + id  letter ( letter | digit )* except for read and write which are keywords (keywords are tokens as well) Spring 16 CSCI 4430, A Milanova 45

Ad-hoc Scanner for Calculator language skip any initial white space (space, tab, newline) if current_char in { +, * } return corresponding single-character token (plus or times) if current_char is a letter read any additional letters and digits check to see if the resulting string is read or write if so then return the corresponding token else return id else announce an ERROR Spring 16 CSCI 4430, A Milanova 46

The Scanner as a DFA Spring 16 CSCI 4430, A Milanova * letter letter, digit Start space, tab, newline

Building a Scanner Scanners are (usually) automatically generated from regular expressions: Step 1: From a Regular Expression to an NFA Step 2: From an NFA to a DFA Step 3: Minimizing the DFA lex/flex utilities generate scanner code Scanner code explicitly captures the states and transitions of the DFA Spring 16 CSCI 4430, A Milanova 48

Control-flow-based Scanning state := 1 loop read current_char case state of 1: case current_char of ‘ + ’, ‘ * ’, ‘ a ’…’ z ’ … 2: case current_char of … Spring 16 CSCI 4430, A Milanova 49

Table-driven Scanning Sketch of transition table. See Scott, page for details. Spring 16 CSCI 4430, A Milanova 50 space,tab,newline*+digitletterother times plus id space

Summary Formal Languages in Programming Regular Expressions Context-free Grammars Derivation, Parse, Parse tree, Ambiguity Scanning Spring 16 CSCI 4430, A Milanova 51

52 Group Exercise expr  expr × expr | expr ^ expr | id How many parse trees for id×id^id×id ? No need to draw them all Rewrite this grammar into an equivalent unambiguous grammar where ^ has higher precedence than × ^ is right-associative × is left-associative Spring 16 CSCI 4430, A Milanova

Next Class We will cover Chapter and from Scott’s book Spring 16 CSCI 4430, A Milanova 53