LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong. Last Time Talked about: – 1. Declarative (logical) reading of grammar rules – 2. Prolog query: s(String,[]).

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong. Administrivia We'll postpone Homework 4 review until next week …
Advertisements

Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/21.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/18.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/11.
LING 388 Language and Computers Lecture 8 9/25/03 Sandiway FONG.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/25.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/6.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/7.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
LING 388: Language and Computers Sandiway Fong Lecture 17: 10/25.
LING 388 Language and Computers Lecture 11 10/7/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 5: 9/5.
LING 388 Language and Computers Lecture 9 9/30/03 Sandiway FONG.
Specifying Languages CS 480/680 – Comparative Languages.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
LING 388 Language and Computers Lecture 6 9/18/03 Sandiway FONG.
Language Translation Principles Part 1: Language Specification.
Compiler Construction 1. Objectives Given a context-free grammar, G, and the grammar- independent functions for a recursive-descent parser, complete the.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
LING/C SC/PSYC 438/538 Lecture 14 Sandiway Fong. Administrivia Midterm – This Wednesday – A bit like doing a homework in real time – Bring your laptop.
1 Homework #7 (Models of Computation, Spring, 2001) Due: Section 1; April 16 (Monday) Section 2; April 17 (Tuesday) 2. Covert the following context-free.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Context-Free Grammars
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
Introduction to Language Theory
Phrase-structure grammar A phrase-structure grammar is a quadruple G = (V, T, P, S) where V is a finite set of symbols called nonterminals, T is a set.
Grammar G = (V N, V T, P, S) –V N : Nonterminal symbols –V T : Terminal symbols V N  V T = , V N ∪ V T = V – P : a finite set of production rules α 
CMSC 330: Organization of Programming Languages Context-Free Grammars.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Context Free Grammars.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
LING/C SC/PSYC 438/538 Lecture 14 Sandiway Fong. Administrivia Homework 6 graded.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong. Did you install SWI Prolog?
1 Chapter 6 Simplification of CFGs and Normal Forms.
Multiplying Powers Dividing Powers Zero ExponentsNegative.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
LING/C SC/PSYC 438/538 Lecture 16 Sandiway Fong. SWI Prolog Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Language and Grammar classes
The chomsky hierarchy Module 03.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Context-Free Grammars
Context free grammar.
PARSE TREES.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Context-Free Grammars
Context-Free Grammars
Chapter 7 Regular Grammars
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
Context-Free Grammars
Context-Free Grammars
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong

Last Time Talked about: – 1. Declarative (logical) reading of grammar rules – 2. Prolog query: s(String,[]). Case 1. String is known: Is String ∈ L(G)? Case 2. String is unknown: Enumerate L(G) – 3. Different search strategies Prolog's (left-to-right) depth-first search Iterative deepening

Beyond Regular Languages Beyond regular languages – a n b n = {ab, aabb, aaabbb, aaaabbbb,... } n≥1 – is not a regular language That means no FSA, RE or RG can be built for this set 1.We only have a finite number of states to play with … 2.We’re only allowed simple free iteration (looping) 3.Pumping Lemma proof

Beyond Regular Languages Language – a n b n = {ab, aabb, aaabbb, aaaabbbb,... } n>=1 A regular grammar extended to allow both left and right recursive rules can accept/generate it: 1.a --> [a], b. 2.b --> [b]. 3.b --> a, [b]. Example: Set membership Set enumeration

Beyond Regular Languages Language – a n b n = {ab, aabb, aaabbb, aaaabbbb,... } n>=1 A regular grammar extended to allow both left and right recursive rules can accept/generate it: 1.a --> [a], b. 2.b --> [b]. 3.b --> a, [b]. Intuition: – grammar implements the stacking of partial trees balanced for a’s and b’s: B A b a A B A b a

Beyond Regular Languages Language – a n b n = {ab, aabb, aaabbb, aaaabbbb,... } n>=1 A regular grammar extended to allow both left and right recursive rules can accept/generate it: 1.a --> [a], b. 2.b --> [b]. 3.b --> a, [b]. A type-2 or context-free grammar (CFG) has no restrictions on what can go on the RHS of a grammar rule Note : – CFGs still have a single nonterminal limit for the LHS of a rule Example: 1.s --> [a], [b]. 2.s --> [a], s, [b].

Extra Argument: Parse Tree Recovering a parse tree – when want Prolog to return more than just true/false answers – in case of true, we can compute a syntax tree representation of the parse – by adding an extra argument to nonterminals – applies to all grammar rules (not just regular grammars) Example sheeptalk again DCG (non-regular, context-free): s --> [b], [a], a, [!]. a --> [a]. (base case) a --> [a], a. (recursive case) s a! a a a a b

Extra Argument: Parse Tree Tree: Prolog data structure: – term – hierarchical – allows sequencing of arguments – functor(arg 1,..,arg n ) – each arg i could be another term or simple atom s a! a a a a b s(b,a,a(a,a(a)),!)

Extra Arguments: Parse Tree DCG – s --> [b],[a], a, [!]. – a --> [a]. (base case) – a --> [a], a. (right recursive case) base case –a --> [a]. –a(subtree) --> [a]. –a(a(a)) --> [a]. recursive case –a --> [a], a. –a(subtree) --> [a], a(subtree). –a(a(a,A)) --> [a], a(A). s a! a a a a b s(b,a,a(a,a(a)),!) Idea: for each nonterminal, add an argument to store its subtree

Extra Arguments: Parse Tree Prolog grammar – s --> [b], [a], a, [!]. – a --> [a]. (base case) – a --> [a], a. (right recursive case) base and recursive cases –a(a(a)) --> [a]. –a(a(a,A)) --> [a], a(A). start symbol case –s --> [b], [a], a, [!]. –s(tree) --> [b], [a], a(subtree), [!]. –s(s(b,a,A,!) ) --> [b], [a], a(A), [!]. s a! a a a a b s(b,a,a(a,a(a)),!)

Extra Arguments: Parse Tree Prolog grammar – s --> [b], [a], a, [!]. – a --> [a]. (base case) – a --> [a], a. (right recursive case) Equivalent Prolog grammar computing a parse –s(s(b,a,A,!)) --> [b], [a], a(A), [!]. –a(a(a)) --> [a]. –a(a(a,A)) --> [a], a(A).

Extra Arguments Extra arguments are powerful – they allow us to impose (grammatical) constraints and change the expressive power of the system (if used as memory) Example: – a n b n c n n>0 is not context-free (context-sensitive)

Extra arguments A context-free grammar (CFG) + extra argument (EA) for the context-sensitive language { a n b n c n | n>0}: 1.s(s(A,A,A)) --> a(A), b(A), c(A). 2.a(a(a)) --> [a]. 3.a(a(a,X)) --> [a], a(X). 4.b(a(a)) --> [b]. 5.b(a(a,X)) --> [b], b(X). 6.c(a(a)) --> [c]. 7.c(a(a,X)) --> [c], c(X).

Extra arguments A CFG+EA for a n b n c n n>0: Set membership question

Extra arguments A CFG+EA grammar for a n b n c n n>0: Set enumeration

Another grammar for {a n b n c n |n>0} Use Prolog’s arithmetic predicates. { … } embeds Prolog code inside grammar rules These are not nonterminal or terminal symbols. Used in grammar rules, we must enclose these statements within curly braces These are not nonterminal or terminal symbols. Used in grammar rules, we must enclose these statements within curly braces 16

Another Grammar for {a n b n c n |n>0} Explicit computation of the number of a’s using arithmetic. { … } embeds Prolog code inside grammar rules

Another Grammar for {a n b n c n |n>0} Parsing the a’s

Another Grammar for {a n b n c n |n>0} Computing the b’s

Another Grammar for {a n b n c n |n>0} Computing the c’s

Another grammar for {a n b n c n |n>0} Grammar is “correct” but not so efficient… – consider string [a,a,b,b,b,b,b,b,b,c,c] s --> a(X), b(X), c(X). a(1) --> [a]. a(N) --> [a], a(M), {N is M+1}. b(1) --> [b]. b(N) --> [b], b(M), {N is M+1}. c(1) --> [c]. c(N) --> [c], c(M), {N is M+1}. counts upwards could change to count down could change to count down

A context-sensitive grammar for {a n b n c n |n>0} Context-sensitive grammar has rules of the form LHS  RHS – such that both LHS and RHS can be arbitrary strings of terminals and non-terminals, and – |RHS| ≥ |LHS| (exception: S  ε, S not in RHS) This is almost a normal Prolog DCG: – (but rules 5 & 6 contain more than one non-terminal on the LHS): 1.s --> [a,b,c]. 2.s --> [a],a,[b,c]. 3.a --> [a,b], c. 4.a --> [a],a,[b],c. 5.c,[b] --> [b], c. 6.c,[c] --> [c,c]. rules 5 and 6 are responsible for shuffling the c's to the end

A context-sensitive grammar for {a n b n c n |n>0} ?- listing([s,a,c]). 1.s([a, b, c|A], A). 2.s([a|A], C) :- a(A, B), B=[b, c|C]. 3.a([a, b|A], B) :- c(A, B). 4.a([a|A], D) :- a(A, B), B=[b|C], c(C, D). 5.c(A, C) :- A=[b|B], c(B, D), C=[b|D]. 6.c([c, c|A], [c|A]). 1.s --> [a,b,c]. 2.s --> [a],a,[b,c]. 3.a --> [a,b], c. 4.a --> [a],a,[b],c. 5.c,[b] --> [b], c. 6.c,[c] --> [c,c].

A context-sensitive grammar for {a n b n c n |n>0} [a,a,a,b,b,b,c,c,c] 1.s 2.[a],a,[b,c] 3.[a],[a],a,[b],c,[b,c] 4.[a],[a],[a,b],c,[b],c,[b,c] 5.[a],[a],[a,b],[b],c,c,[b,c] 6.[a],[a],[a,b],[b],c,[b],c,[c] 7.[a],[a],[a,b],[b],[b],c,c,[c] 8.[a],[a],[a,b],[b],[b],c,[c,c] 9.[a],[a],[a,b],[b],[b],[c,c,c] 10.[a,a,a,b,b,b,c,c,c] 1.s --> [a,b,c]. 2.s --> [a],a,[b,c]. 3.a --> [a,b], c. 4.a --> [a],a,[b],c. 5.c,[b] --> [b], c. 6.c,[c] --> [c,c].

A context-sensitive grammar for {a n b n c n |n>0} 1.s([a,a,a,b,b,b,c,c,c],[]) 1.a([a,a,b,b,b,c,c,c],B) 1.a([a,b,b,b,c,c,c],B) 1.c([b,b,c,c,c],B) 1.c([b,c,c,c],D) 1.c([c,c,c],D) 2.=> c([c,c,c],[c,c]) 3.C=[b|[c,c]] 2.=> c([b,c,c,c],[b,c,c]) 3.C=[b|[b,c,c]] 2.=> c([b,b,c,c,c],[b,b,c,c]) 2.=> a([a,b,b,b,c,c,c],[b,b,c,c]) 3.[b,b,c,c]=[b|C](C=[b,c,c]) 4.c([b,c,c],D) 1.c([c,c],D) 2.=> c([c,c],[c]) 3.C=[b|[c]] 5.=> c([b,c,c],[b,c]) 2.=> a([a,a,b,b,b,c,c,c],[b,c]) 3.[b,c]=[b,c|[]] 1.s([a, b, c|A], A). 2.s([a|A], C) :- a(A, B), B=[b, c|C]. 3.a([a, b|A], B) :- c(A, B). 4.a([a|A], D) :- a(A, B), B=[b|C], c(C, D). 5.c(A, C) :- A=[b|B], c(B, D), C=[b|D]. 6.c([c, c|A], [c|A]).

A context-sensitive grammar for {a n b n c n |n>0} 1.s([a,a,b,b,b,c,c,c],[]) 1.a([a,b,b,b,c,c,c],B) 1.c([b,b,c,c,c],B) 1.c([b,c,c,c],D) 1.c([c,c,c],D) 2.=> c([c,c,c],[c,c]) 3.C=[b|[c,c]] 2.=> c([b,c,c,c],[b,c,c]) 3.C=[b|[b,c,c]] 2.=> c([b,b,c,c,c],[b,b,c,c]) 2.=> a([a,b,b,b,c,c,c],[b,b,c,c]) 3.[b,b,c,c]=[b,c|[]] FAIL 1.s([a, b, c|A], A). 2.s([a|A], C) :- a(A, B), B=[b, c|C]. 3.a([a, b|A], B) :- c(A, B). 4.a([a|A], D) :- a(A, B), B=[b|C], c(C, D). 5.c(A, C) :- A=[b|B], c(B, D), C=[b|D]. 6.c([c, c|A], [c|A]).

A context-sensitive grammar for {a n b n c n |n>0} 1.s([a,a,a,b,b,c,c,c],[]) 1.a([a,a,b,b,c,c,c],B) 1.a([a,b,b,c,c,c],B) 1.c([b,c,c,c],B) 1.c([c,c,c],D) 2.=> c([c,c,c],[c,c]) 3.C=[b|[c,c]] 2.=> c([b,c,c,c],[b,c,c]) 2.=> a([a,b,b,c,c,c],[b,c,c]) 3.[b,c,c]=[b|C](C=[c,c]) 4.c([c,c],D) 5.=> c([c,c],[c]) 2.=> a([a,a,b,b,c,c,c],[c]) 3.[c]=[ b,c|[]] FAIL 1.s([a, b, c|A], A). 2.s([a|A], C) :- a(A, B), B=[b, c|C]. 3.a([a, b|A], B) :- c(A, B). 4.a([a|A], D) :- a(A, B), B=[b|C], c(C, D). 5.c(A, C) :- A=[b|B], c(B, D), C=[b|D]. 6.c([c, c|A], [c|A]).

A context-sensitive grammar for {a n b n c n |n>0} 1.s([a,a,a,b,b,b,c,c],[]) 1.a([a,a,b,b,b,c,c],B) 1.a([a,b,b,b,c,c],B) 1.c([b,b,c,c],B) 1.c([b,c,c],D) 1.c([c,c],D) 2.=> c([c,c],[c]) 3.C=[b|[c]] 2.=> c([b,c,c],[b,c]) 3.C=[b|[b,c]] 2.=> c([b,b,c,c],[b,b,c]) 2.=> a([a,b,b,b,c,c],[b,b,c]) 3.[b,b,c]=[b|C](C=[b,c]) 4.c([b,c],D) 1.c([c],D) FAIL 1.s([a, b, c|A], A). 2.s([a|A], C) :- a(A, B), B=[b, c|C]. 3.a([a, b|A], B) :- c(A, B). 4.a([a|A], D) :- a(A, B), B=[b|C], c(C, D). 5.c(A, C) :- A=[b|B], c(B, D), C=[b|D]. 6.c([c, c|A], [c|A]).

Natural Language Parsing Syntax trees are a big deal in NLP Stanford Parser – – Uses probabilistic rules learnt from a Treebank corpus We do a lot with Treebanks in the follow-on course to this one (LING 581, Spring) 29