Parsing 1 of 4: Grammars and Parse Trees

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

Working with Groups - an introduction to facilitation.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Context-Free Grammars Lecture 7
Slide 1 Chapter 2-b Syntax, Semantics. Slide 2 Syntax, Semantics - Definition The syntax of a programming language is the form of its expressions, statements.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
Chapter 2 A Simple Compiler
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
PPL Syntax & Formal Semantics Lecture Notes: Chapter 2.
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
Syntax Directed Definitions Synthesized Attributes
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
COMP Parsing 1 of 4 Lecture 21. Parsing text Making sense of "structured text": Compiling programs (javac, g++, Python etc) Rendering web sites.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
LESSON 04.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
Parsing 2 of 4: Scanner and Parsing
LESSON 16.
A Simple Syntax-Directed Translator
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Parsing and Parser Parsing methods: top-down & bottom-up
Chapter 3 – Describing Syntax
What does it mean? Notes from Robert Sebesta Programming Languages
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
Welsh Critical Care Improvement Programme
CS 363 Comparative Programming Languages
Parsing Techniques.
Compiler Design 4. Language Grammars
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
Chapter 2: A Simple One Pass Compiler
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
George Evangelinos | Senior Learning Technologist
Computer Science and Engineering
FROM VALENCIA TO BEIJING.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
SYNTAX DIRECTED DEFINITION
Tuesday Pick up the handout, Emancipation: A Life Fable from the handout station. Chunk, read, and begin to summarize your sections When you have finished.
BNF 9-Apr-19.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Parsing 1 of 4: Grammars and Parse Trees COMP261 Lecture 16 Parsing 1 of 4: Grammars and Parse Trees

Parsing text Making sense of "structured text": Compiling/analysing programs (javac, g++, Python etc) Rendering web sites (Chrome, Mozilla Firefox, etc) Processing database queries (PostgreSQL, MySQL, etc) Domain specific languages Checking grammar in a word processor Understanding natural language A parser is a program that reads structured text This is not a whole lecture – 30 mins max

What is “structured text”? while ( A[k] != x ) { k++; } <html><head><title>My Web Page</title></head> <body> <p>Thank you for viewing my silly site!</p> </body></html> DELETE FROM DomesticStudentsFor2017 WHERE mark = ‘E’;

Not “structured text” I KEEP six honest serving-men (they taught me all I knew); Their names are What and Why and When and How and Where and Who. I send them over land and sea, I send them east and west; But after they have worked for me, I give them all a rest. I let them rest from nine till five, for I am busy then, As well as breakfast, lunch, and tea, for they are hungry men. But different folk have different views; I know a person small— She keeps ten million serving-men, who get no rest at all! She sends 'em abroad on her own affairs, From the second she opens her eyes— One million Hows, two million Wheres, And seven million Whys! The Elephant's child, Rudyard Kipling

What is “structured text”? We must be able to describe the structure of texts: A URL is a string of the form http://name/name/… where each name is a string of letters, digits and dots. An email address is a string of the form id@id.id… where each id is a string of letters and digits. A while statement is a string of the form while ( exp ) stmt where exp is a valid expression and stmt is a valid statement A statement is either a while statement, an if statement, ….

What is “structured text” We must also be able to describe the structure of a particular text so we can extract its components. http://ecs.vuw.ac.nz/lindsay/home.html lindsay@ecs.vuw.ac.nz while ( A[k] != x ) { k++; }

Describing structured text: Grammars A grammar is a set of rules, describing the structure of strings in a formal language (set of strings). The rules describe how to form strings from the language’s alphabet, and names its structural components. whileStmt ::= “while” “(“ condition “)” statement Can show alternative forms: statement ::= whileStmt | ifStmt | … A grammar only describes the form of allowable strings, not their meaning. Official name is context-free grammar – look it up!

Example: Java statements (simplified!!) statement ::= variable “=” exp “;” | “if” “(“ exp “)” statement [ “else” statement ] | “while” “(” exp “)” statement | “do” statement “while” “(“ exp “)” | “{“ [ statement ]* “}” | … exp ::= variable | constant | … Ex: Look on-line for full Java grammar

Example: A simple html grammar HTMLFILE ::= “<html>” [ HEAD ] BODY “</html>” HEAD ::= “<head>” TITLE “</head>” TITLE ::= “<title>” TEXT “</title>” BODY ::= “<body>” [ BODYTAG ]* “</body>” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG H1TAG ::= “<h1>” TEXT “</h1>” PTAG ::= “<p>” TEXT “</p>” OLTAG ::= “<ol>” [ LITAG ]+ “</ol>” ULTAG ::= “<ul>” [ LITAG ]+ “</ul>” LITAG ::= “<li>” TEXT “</li>” TEXT ::= sequence of characters other than < and >

Nonterminals Top level nonterminal (start symbol) usually first Names of structural components of strings (not part of the text) Defined by rules. HTMLFILE ::= “<html>” [ HEAD ] BODY “</html>” HEAD ::= “<head>” TITLE “</head>” TITLE ::= “<title>” TEXT “</title>” BODY ::= “<body>” [ BODYTAG ]* “</body>” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG H1TAG ::= “<h1>” TEXT “</h1>” PTAG ::= “<p>” TEXT “</p>” OLTAG ::= “<ol>” [ LITAG ]+ “</ol>” ULTAG ::= “<ul>” [ LITAG ]+ “</ul>” LITAG ::= “<li>” TEXT “</li>” TEXT ::= sequence of characters other than < and > | = "or" [… ] = "optional" [… ]* = "any number of times" [… ]+ = "one or more times"

Terminals Literal strings or patterns of characters that can occur in texts Here they are enclosed in double quote marks HTMLFILE ::= “<html>” [ HEAD ] BODY “</html>” HEAD ::= “<head>” TITLE “</head>” TITLE ::= “<title>” TEXT “</title>” BODY ::= “<body>” [ BODYTAG ]* “</body>” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG H1TAG ::= “<h1>” TEXT “</h1>” PTAG ::= “<p>” TEXT “</p>” OLTAG ::= “<ol>” [ LITAG ]+ “</ol>” ULTAG ::= “<ul>” [ LITAG ]+ “</ul>” LITAG ::= “<li>” TEXT “</li>” TEXT ::= sequence of characters other than < and >

Using the Grammar Given some text: Is it a valid piece of HTML? <head><title> Today</title></head> <body><h1> My Day </h1> <ul><li>meeting</li><li> lecture </li></ul> <p> parsing stuff</p> </body> </html> Is it a valid piece of HTML? Does it conform to the grammar rules? What is the structure? (Needed in order to process it) what are the components? what types are the components? how are they related?

What kind of structure? Structure of a text is hierarchical Can be described by an ordered tree Leaves correspond to terminals. Internal node labelled with nonterminals Root is labelled with the start symbol. Each internal node and its children correspond to a grammar rule (or an alternative in a grammar rule). The text consists of all the terminals on the fringe of the tree A concrete syntax tree or parse tree represents the syntactic structure of a string according to some formal grammar, showing all the components of the rules

Concrete Parse Tree HTMLFILE <html> </html> BODY HEAD TITLE BODYTAG BODYTAG BODYTAG <title> TEXT </title> Today ULTAG PTAG H1TAG <ul> </ul> <p> </p> LITAG <h1> LITAG TEXT TEXT </h1> <li> TEXT </li> <li> TEXT </li> My Day parsing stuff meeting lecture

Grammars define possible parse trees Each grammar rule defines possible structures that may occur in a parse tree. H1TAG ::= “<h1>” TEXT “</h1>” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG HTMLFILE ::= “<html>” [ HEAD ] BODY “</html>”

Is that too much information? Yes! We know that every HEAD has “<head>” and “</head>” terminals, we only care about what TITLE there is and only the unknown string part of the title. An abstract syntax tree (AST) represents the abstract syntactic structure of the text. Each node of the tree denotes a construct occurring in the text. The syntax is ‘abstract’ in that it does not represent all the elements of the full syntax. Only keep things that are semantically meaningful.

Abstract Syntax Tree : HTMLFILE <html> </html> HEAD BODY TITLE BODYTAG BODYTAG BODYTAG <title> TEXT </title> Today ULTAG PTAG H1TAG <ul> </ul> <p> </p> LITAG LITAG <h1> TEXT </h1> TEXT TEXT <li> </li> <li> TEXT </li> My Day parsing stuff meeting lecture

Abstract Syntax Tree (AST) Can delete some of the Nodes to avoid chains of singletons HTMLFILE HEAD BODY TITLE BODYTAG BODYTAG BODYTAG H1TAG ULTAG PTAG TEXT LITAG LITAG TEXT TEXT Today TEXT TEXT My Day parsing stuff meeting lecture

How do we write programs to do this? The process of getting from the input string to the parse tree consists of two steps: Lexical analysis: convert a sequence of characters into a sequence of tokens. Note that java.util.Scanner allows us to do lexical analysis with great ease! Syntactic analysis or parsing: analyse a text, made of a sequence of tokens, to determine its grammatical structure with respect to a given grammar. Assignment will require you to write a recursive descent parser discussed in the next lecture!