LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.

Slides:



Advertisements
Similar presentations
Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
Advertisements

LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
CS5371 Theory of Computation
LING 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/21.
LING 388 Language and Computers Lecture 8 9/25/03 Sandiway FONG.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/6.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/7.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
LING 388 Language and Computers Lecture 11 10/7/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 5: 9/5.
LING 388: Language and Computers Sandiway Fong Lecture 13: 10/10.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
LING 388 Language and Computers Lecture 6 9/18/03 Sandiway FONG.
LING 388: Language and Computers Sandiway Fong 10/4 Lecture 12.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
LING 388: Language and Computers Sandiway Fong Lecture 7.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
LING/C SC/PSYC 438/538 Lecture 12 10/4 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
9.7: Chomsky Hierarchy.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong. Did you install SWI Prolog?
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
LING/C SC/PSYC 438/538 Lecture 16 Sandiway Fong. SWI Prolog Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Lecture 6: Context-Free Languages
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong. Last Time Talked about: – 1. Declarative (logical) reading of grammar rules – 2. Prolog query: s(String,[]).
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Computability Joke. Context-free grammars Parsing. Chomsky
Transition Graphs.
Context-Free Grammars: an overview
CS 404 Introduction to Compiler Design
Programming Languages Translator
Syntax Specification and Analysis
Syntax Specification and Analysis
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Complexity and Computability Theory I
Automata and Languages What do these have in common?
Natural Language Processing - Formal Language -
Context Sensitive Grammar & Turing Machines
PUSHDOWN AUTOMATA. PUSHDOWN AUTOMATA Hierarchy of languages Regular Languages  Finite State Machines, Regular Expression Context Free Languages 
PARSE TREES.
Pushdown Automata Reading: Chapter 6.
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
Context-Free Languages
Hierarchy of languages
CSE322 The Chomsky Hierarchy
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Programming Language Syntax 2
Lecture 7: Introduction to Parsing (Syntax Analysis)
Intro to Data Structures
CHAPTER 2 Context-Free Languages
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong

Today's Topics Homework 10 graded Homework 12 out today. Due next Monday midnight,

Regular Languages Three formalisms, same expressive power Regular expressions Finite State Automata Regular Grammars We’ll look at this case using Prolog

Chomsky Hierarchy Chomsky Hierarchy Type-1 Context-sensitive rules division of grammar into subclasses partitioned by “generative power/capacity” Type-0 General rewrite rules Turing-complete, powerful enough to encode any computer program can simulate a Turing machine anything that’s “computable” can be simulated using a Turing machine Type-1 Context-sensitive rules weaker, but still very powerful anbncn Type-2 Context-free rules weaker still anbn Pushdown Automata (PDA) Type-3 Regular grammar rules very restricted Regular Expressions a+b+ Finite State Automata (FSA) finite state transducer read/write head Natural languages? tape read write ⏪/ ⏹ / ⏩

Chomsky Hierarchy FSA Type-1 Type-3 Type-2 DCG = Type-0 Regular Grammars FSA Regular Expressions Type-3 Type-2 DCG = Type-0

Prolog Grammar Rule System known as “Definite Clause Grammars” (DCG) based on type-2 restrictions (context-free grammars) but with extensions (powerful enough to encode the hierarchy all the way up to type-0) Prolog was originally designed (1970s) to also support natural language processing we’ll start with the bottom of the hierarchy i.e. the least powerful regular grammars (type-3)

Definite Clause Grammars (DCG) Background a “typical” formal grammar contains 4 things <N,T,P,S> a set of non-terminal symbols (N) these symbols will be expanded or rewritten by the rules a set of terminal symbols (T) these symbols cannot be expanded production rules (P) of the form LHS  RHS In regular and CF grammars, LHS must be a single non-terminal symbol RHS: a sequence of terminal and non-terminal symbols: possibly with restrictions, e.g. for regular grammars a designated start symbol (S) a non-terminal to start the derivation Language set of terminal strings generated by <N,T,P,S> e.g. through a top-down derivation

Definite Clause Grammars (DCG) Example grammar (regular): S  aB B  aB B  bC B  b C  bC C  b Background a “typical” formal grammar contains 4 things <N,T,P,S> a set of non-terminal symbols (N) a set of terminal symbols (T) production rules (P) of the form LHS  RHS a designated start symbol (S) Notes: Start symbol: S Non-terminals: {S,B,C} (uppercase letters) Terminals: {a,b} (lowercase letters)

DefiniteClause Grammars (DCG) Example Formal grammar DCG format S  aB s --> [a],b. B  aB b --> [a],b. B  bC b --> [b],c. B  b b --> [b]. C  bC c --> [b],c. C  b c --> [b]. Notes: Start symbol: S Non-terminals: {S,B,C} (uppercase letters) Terminals: {a,b} (lowercase letters) DCG format: both terminals and non-terminal symbols begin with lowercase letters variables begin with an uppercase letter (or underscore) --> is the rewrite symbol terminals are enclosed in square brackets (list notation) nonterminals don’t have square brackets surrounding them the comma (,) represents the concatenation symbol a period (.) is required at the end of every DCG rule

Regular Grammars Regular or Chomsky hierarchy type-3 grammars are a class of formal grammars with a restricted RHS LHS → RHS “LHS rewrites/expands to RHS” all rules contain only a single non-terminal, and (possibly) a single terminal) on the right hand side Canonical Forms: x --> y, [t]. x --> [t]. (left recursive) or x --> [t], y. x --> [t]. (right recursive) where x and y are non-terminal symbols and t (enclosed in square brackets) represents a terminal symbol. Note: can’t mix these two forms (and still have a regular grammar)! can’t have both left and right recursive rules in the same grammar Terminology: or “left/right linear”

Definite Clause Grammars (DCG) 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. What language does our regular grammar generate? by writing the grammar in Prolog, we have a ready-made recognizer program no need to write a separate grammar rule interpreter (in this case) Example queries ?- s([a,a,b,b,b],[]). Yes ?- s([a,b,a],[]). No Note: Query uses the start symbol s with two arguments: (1) sequence (as a list) to be recognized and (2) the empty list [] one or more a’s followed by one or more b’s Prolog lists: In square brackets, separated by commas e.g. [a] [a,b,c]

Prolog lists revisited Perl lists: Python lists: @list = ("a", "b", "c"); list = ["a", "b", "c"] @list = qw(a b c); @list = (); list = [] Prolog lists: List = [a, b, c] (List is a variable) List = [a|[b|[c|[]]]] (a = head, tail = [b|[c|[]]]) List = [] Mixed notation: [a|[b,c]] [a,b|[c]]

Regular Grammars Tree representation Example Derivation: s ?- s([a,a,b],[]). true There’s a choice of rules for nonterminal b: Prolog tries the first rule Through backtracking It can try other choices Derivation: s [a], b (rule 1) [a],[a],b (rule 2) [a],[a],[b] (rule 4) 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. our a regular grammar all terminals, so we stop Using trace, we can observe the progress of the derivation…

Regular Grammars Tree representation Example Derivation: s ?- s([a,a,b,b,b],[]). Derivation: s [a], b (rule 1) [a],[a],b (rule 2) [a],[a],[b],c (rule 3) [a],[a],[b],[b],c (rule 5) [a],[a],[b],[b],[b] (rule 6) 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b].

Prolog Derivations Prolog’s computation rule: For grammars: Try first matching rule in the database (remember others for backtracking) Backtrack if matching rule leads to failure undo and try next matching rule (or if asked for more solutions) For grammars: Top-down left-to-right derivations left-to-right = expand leftmost nonterminal first Leftmost expansion done recursively = depth-first

Prolog Derivations For a top-down derivation, logically, we have: Choice about which rule to use for nonterminals b and c No choice About which nonterminal to expand next 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. Bottom up derivation for [a,a,b,b] [a],[a],[b],[b] [a],[a],[b],c (rule 6) [a],[a],b (rule 3) [a],b (rule 2) s (rule 1) Prolog doesn’t give you bottom-up derivations for free … you’d have to program it up separately

SWI Prolog 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery why we have to type two arguments with the nonterminal at the command prompt Recall list notation: [1|[2,3,4]] = [1,2,3,4] s([a|A], B) :- b(A, B). 2. b([a|A], B) :- b(A, B). 3. b([b|A], B) :- c(A, B). 4. b([b|A], A). 5. c([b|A], B) :- c(A, B). 6. c([b|A], A).

Regex, FSA and a Regular Grammar Textbook Exercise: find a RE for Examples (* denotes string not in the language): *ab *ba bab λ (empty string) bb *baba babab

Regex, FSA and a Regular Grammar Draw a FSA and convert it to a RE: b b [Powerpoint Animation] > 1 2 3 4 b a b ε Ungraded exercise: Could use the technique shown in a previous class: eliminate one state at a time b* | b ab+ = b*|b(ab+)+

Regex, FSA and a Regular Grammar Regular Grammar in Prolog. Let’s begin with something like: s --> [b], b. (grammar generates bb+) b --> [b]. b --> [b], b. Add rule: b --> [a], b. (grammar generates b+(a|b)*b) Classroom Exercise: How would you fix this grammar so it can handle the specified language?

Regex, FSA and a Regular Grammar Regular Grammar in Prolog notation: s --> []. (s = ”start state”) s --> [b], b. (b = ”seen a b”) s --> [b], s. b --> [a], c. (c = ”expect a b”) c --> [b]. c --> [b], b. c --> [b], c.

Regex, FSA and a Regular Grammar Compare the FSA with our Regular Grammar (RG) s --> []. (s = ”start state”) s --> [b], b. (b = ”seen a b”) s --> [b], s. b --> [a], c. (c = ”expect a b”) c --> [b]. c --> [b], b. c --> [b], c. 1 2 3 4 > b a ε There is a straightforward correspondence between right recursive RGs and FSA

Regex, FSA and a Regular Grammar Informally, we can convert RG to a FSA by treating non-terminals as states and introducing (new) states for rules of the form x --> [a]. s --> []. s --> [b], b. s --> [b], s. b --> [a], c. c --> [b]. c --> [b], b. c --> [b], c. [Powerpoint animation] in order of the RG rules b b > s b c a e b b

Homework 12 Question1: write a Prolog regular grammar for the following language: Σ = {a, b, c} L = strings from Σ* with an odd number of a's and c's and an even number of b's (assume 0 is even; observe the canonical forms for regular grammars; don't use any empty category rules) How many grammar rules do you have? submit your program and sample runs Examples: babc *babcb abcb *abc cbba *cbbca accaccac *accacca ac *aa *λ (empty string)

Homework 12 Question 2: note the order of the rules matters. What happens when you run the following Prolog query on your grammar? s(String,[]). `(assume here s is the start symbol) Can you reorder your grammar rules so that s(String,[]) returns strings in the language? Show your grammar and run(s).

Homework 12 Question 3: is it possible to arrange your grammar rules so that s(String,[]) enumerates the possible strings in the language? Show your grammar and run(s) if it's possible. Explain why not if it's not possible.