COMP261 2015 Parsing 1 of 4 Lecture 21. Parsing text Making sense of "structured text": Compiling programs (javac, g++, Python etc) Rendering web sites.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Working with Groups - an introduction to facilitation.
CSE 425: Semantic Analysis Semantic Analysis Allows rigorous specification of a program’s meaning –Lets (parts of) programming languages be proven correct.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
The Query Compiler Varun Sud ID: 104. Agenda Parsing  Syntax analysis and Parse Trees.  Grammar for a simple subset of SQL  Base Syntactic Categories.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Context-Free Grammars Lecture 7
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Chapter 2 A Simple Compiler
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
PPL Syntax & Formal Semantics Lecture Notes: Chapter 2.
PPL Syntax & Formal Semantics Lecture Notes: Chapter 2.
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
CPS 506 Comparative Programming Languages Syntax Specification.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Croeso Welcome. Working with Groups - a brief introduction to facilitation.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Sahar Mosleh California State University San MarcosPage 1 Finite State Machine.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
PPL Syntax & Formal Semantics Lecture Notes: Chapter 2.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Parsing 1 of 4: Grammars and Parse Trees
Chapter 3 – Describing Syntax
Parsing 2 of 4: Scanner and Parsing
Building AST's for RPAL Programs
A Simple Syntax-Directed Translator
CS 404 Introduction to Compiler Design
CS510 Compiler Lecture 4.
Syntax Analysis Chapter 4.
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
Formal Language Theory
Welsh Critical Care Improvement Programme
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
Chapter 2: A Simple One Pass Compiler
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
Lecture 4: Lexical Analysis & Chomsky Hierarchy
George Evangelinos | Senior Learning Technologist
FROM VALENCIA TO BEIJING.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Building AST's for RPAL Programs
CSC 4181 Compiler Construction Context-Free Grammars
Tuesday Pick up the handout, Emancipation: A Life Fable from the handout station. Chunk, read, and begin to summarize your sections When you have finished.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

COMP Parsing 1 of 4 Lecture 21

Parsing text Making sense of "structured text": Compiling programs (javac, g++, Python etc) Rendering web sites (Chrome, Mozilla Firefox, etc) Processing database queries (PostgreSQL, MySQL, etc) Checking grammar Understanding natural language

What is “structured text” My Web Page Thank you for viewing my silly site! DELETE FROM DomesticStudentsFor2012 WHERE mark = ‘E’;

Not “structured text” I KEEP six honest serving-men (they taught me all I knew); Their names are What and Why and When and How and Where and Who. I send them over land and sea, I send them east and west; But after they have worked for me, I give them all a rest. I let them rest from nine till five, for I am busy then, As well as breakfast, lunch, and tea, for they are hungry men. But different folk have different views; I know a person small— She keeps ten million serving-men, who get no rest at all! She sends 'em abroad on her own affairs, From the second she opens her eyes— One million Hows, two million Wheres, And seven million Whys! The Elephant's child, Rudyard Kipling

What kind of text we really mean? Text is “structured” if it can be described using a grammar: A grammar is a set of rules of a specific kind, for forming strings in a formal language. The rules describe how to form strings from the language’s alphabet that are valid according to the language’s syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context – only their form.

A grammar example A simple html grammar: HTMLFILE ::= “ ” [ HEAD ] BODY “ ” HEAD ::= “ ” TITLE “ ” TITLE ::= “ ” TEXT “ ” BODY ::= “ ” [ BODYTAG ]* “ ” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG H1TAG ::= “ ” TEXT “ ” PTAG ::= “ ” TEXT “ ” OLTAG ::= “ ” [ LITAG ]+ “ ” ULTAG ::= “ ” [ LITAG ]+ “ ” LITAG ::= “ ” TEXT “ ” TEXT ::= sequence of characters other than

Nonterminals elements of the grammar that are not part of the text defined by rules. HTMLFILE ::= “ ” [ HEAD ] BODY “ ” HEAD ::= “ ” TITLE “ ” TITLE ::= “ ” TEXT “ ” BODY ::= “ ” [ BODYTAG ]* “ ” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG H1TAG ::= “ ” TEXT “ ” PTAG ::= “ ” TEXT “ ” OLTAG ::= “ ” [ LITAG ]+ “ ” ULTAG ::= “ ” [ LITAG ]+ “ ” LITAG ::= “ ” TEXT “ ” TEXT ::= sequence of characters other than Top level nonterminal usually first | = "or" [NT ] = "optional" [NT ]* = "any number of times" [NT ]+ = "one or more times"

Terminals literal strings or patterns of characters HTMLFILE ::= “ ” [ HEAD ] BODY “ ” HEAD ::= “ ” TITLE “ ” TITLE ::= “ ” TEXT “ ” BODY ::= “ ” [ BODYTAG ]* “ ” BODYTAG ::= H1TAG | PTAG | OLTAG | ULTAG H1TAG ::= “ ” TEXT “ ” PTAG ::= “ ” TEXT “ ” OLTAG ::= “ ” [ LITAG ]+ “ ” ULTAG ::= “ ” [ LITAG ]+ “ ” LITAG ::= “ ” TEXT “ ” TEXT ::= sequence of characters other than

Using the Grammar Given some text: Today My Day meeting lecture parsing stuff Is it a valid piece of HTML? –Does it conform to the grammar rules? What is the structure? (Needed in order to process it) –what are the components? –what types are the components? –how are they related?

What kind of structure? Text conforming to a grammar has a tree structure –Ordered tree – order of the children matters –Each node in the tree and its children correspond to a grammar rule –Each internal node labeled by the nonterminal on LHS of rule –Leaves correspond to terminals. A concrete parse tree represents the syntactic structure of a string according to some formal grammar, showing all the components of the rules An abstract syntax tree leaves out elements of the rules that are not essential to the structure.

Concrete Parse Tree: HTMLFILE HEAD BODY TITLE BODYTAG H1TAG My Day ULTAG PTAG TEXT parsing stuff LITAG meeting Today lecture TEXT

Is that too much information? Yes! For example, we know that every HEAD will contain “ ” and “ ” terminals, we only care about what TITLE there is and only the unknown string part of that title. Definition: 1.An abstract syntax tree (AST) is a tree representation of the abstract syntactic structure of the text. 2.Each node of the tree denotes a construct occurring in the text. 3.The syntax is ‘abstract’ in that it does not represent all the elements of the full syntax.

Abstract Syntax Tree : HTMLFILE HEAD BODY TITLE BODYTAG H1TAG My Day ULTAG PTAG TEXT parsing stuff LITAG meeting Today lecture TEXT

Abstract Syntax Tree (AST) HTMLFILE HEAD BODY TITLE Today BODYTAG H1TAG My Day ULTAG PTAG TEXT parsing stuff LITAG meeting lecture TEXT

How do we write programs to do this? The process of getting from the input string to the parse tree consists of two steps: 1.Lexical analysis: the process of converting a sequence of characters into a sequence of tokens. Note that java.util.Scanner allows us to do lexical analysis with great ease! 2.Syntactic analysis or parsing: the process of analysing text, made of a sequence of tokens to determine its grammatical structure with respect to a given grammar. Assignment will require you to write a recursive descent parser discussed in the next lecture!