LING 438/538 Computational Linguistics Sandiway Fong Lecture 5: 9/5.

Slides:



Advertisements
Similar presentations
CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/27.
LING 388: Language and Computers Sandiway Fong 9/29 Lecture 11.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/29.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/22.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
1 Introduction to Computability Theory Lecture5: Context Free Languages Prof. Amos Israeli.
CS5371 Theory of Computation
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/21.
LING 364: Introduction to Formal Semantics
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/18.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/11.
LING 388 Language and Computers Lecture 8 9/25/03 Sandiway FONG.
LING 388: Language and Computers Sandiway Fong Lecture 28: 12/6.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/6.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/13.
LING 388: Language and Computers Sandiway Fong Lecture 11: 10/3.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/7.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
LING 388: Language and Computers Sandiway Fong Lecture 28: 12/5.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
LING 388: Language and Computers Sandiway Fong Lecture 17: 10/25.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
LING 388 Language and Computers Lecture 11 10/7/03 Sandiway FONG.
LING 388 Language and Computers Lecture 9 9/30/03 Sandiway FONG.
Chapter 3: Formal Translation Models
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
LING 364: Introduction to Formal Semantics Lecture 5 January 26th.
LING 388: Language and Computers Sandiway Fong Lecture 13: 10/10.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
LING 388 Language and Computers Lecture 6 9/18/03 Sandiway FONG.
LING 388: Language and Computers Sandiway Fong Lecture 8.
LING 388: Language and Computers Sandiway Fong 10/4 Lecture 12.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
LING 388: Language and Computers Sandiway Fong Lecture 7.
LING 388: Language and Computers Sandiway Fong Lecture 30 12/8.
Design contex-free grammars that generate: L 1 = { u v : u ∈ {a,b}*, v ∈ {a, c}*, and |u| ≤ |v| ≤ 3 |u| }. L 2 = { a p b q c p a r b 2r : p, q, r ≥ 0 }
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Lecture # 5 Pumping Lemma & Grammar
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Introduction to Parsing
LING 388: Language and Computers Sandiway Fong 9/27 Lecture 10.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Chapter 3 Part II Describing Syntax and Semantics.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
9.7: Chomsky Hierarchy.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2010 with acknowledgement.
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong. Did you install SWI Prolog?
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
LING/C SC/PSYC 438/538 Lecture 16 Sandiway Fong. SWI Prolog Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery.
Lecture 6: Context-Free Languages
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
CSCI 2670 Introduction to Theory of Computing September 16, 2004.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong. Last Time Talked about: – 1. Declarative (logical) reading of grammar rules – 2. Prolog query: s(String,[]).
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
Presentation transcript:

LING 438/538 Computational Linguistics Sandiway Fong Lecture 5: 9/5

Administrivia Homework 1 –due today by midnight your answers should come in one file your name should appear at the top acceptable formats: –.txt (plain text),.doc (Microsoft Word),.pdf (Adobe) –I will send out acknowledgment of receipt of homework tonight –I posted the following on the listserv see also Prolog Computation Summary slide for LING 388 Lecture 4 edu/~sandiway/ling388p /lecture4.pdf

So far... A quick introduction to Prolog –a new style of writing definitions that are also computer programs Concepts –database DB (assert facts, closed world assumption) –Prolog “if” rule (:-) –recursion (base case, recursive case) –atoms, variables, lists (comma-delimited, head/tail) –negation (“failure to prove”, no negative facts in DB) –computation tree (matching the DB, finite and infinite)

Today’s Topic Grammar rules –builtin Prolog support for writing grammars

Road Map This week –grammar rules –regular grammars Reading assignment –(looking ahead) –textbook –Chapter 2: Regular Expressions and Automata –pg. 22–56

Chomsky Hierarchy division of grammar into subclasses partitioned by “ generative power/capacity ” Type-0 General rewrite rules –Turing-complete, powerful enough to encode any computer program –can simulate a Turing machine –anything that ’ s “ computable ” can be simulated using a Turing machine –Type-1 Context-sensitive rules –weaker, but still very power – a n b n c n Type-2 Context-free rules weaker still a n b n Pushdown Automata (PDA) –Type-3 Regular grammar rules –very restricted – Regular Expressions a + b + – Finite State Automata (FSA) Turing machine: artist’s conception from Wikipedia tape read/write head finite state machine

Chomsky Hierarchy Regular Grammars FSA Regular Expressions DCG = Type-0 Type-3 Type-2 Type-1

Prolog Grammar Rule System known as “Definite Clause Grammars” (DCG) –based on type-2 (context-free grammars) –but with extensions –powerful enough to encode the hierarchy all the way up to type-0 –we’ll start with the bottom of the hierarchy i.e. the least powerful regular grammars (type-3)

DefiniteClause Grammars (DCG) some facts –built-in feature not an accident: Prolog was originally designed to (also) support natural language processing –notation resembles grammar rules used by linguists and computer scientists most importantly –uses the same Prolog computation rule –(you already know from homework 1) –match the database –from 1st rule on down...

DefiniteClause Grammars (DCG) background –a “typical” formal grammar contains 4 things – a set of non-terminal symbols (N) –appear on the left-hand-side (LHS) of production rules –these symbols will be expanded by the rules a set of terminal symbols (T) –appear on the right-hand-side (RHS) of production rules –consequently, these symbols cannot be expanded production rules (P) of the form –LHS  RHS –In regular and CF grammars, LHS must be a single non-terminal symbol –RHS: a sequence of terminal and non-terminal symbols: possibly with restrictions, e.g. for regular grammars a designated start symbol (S) –a non-terminal to start the derivation

DefiniteClause Grammars (DCG) Background a “typical” formal grammar contains 4 things –a set of non-terminal symbols (N) –a set of terminal symbols (T) –production rules (P) of the form LHS  RHS –a designated start symbol (S) Example S  aB B  aB B  bC B  b C  bC C  b Notes: Start symbol: S Non-terminals: {S,B,C} (uppercase letters) Terminals: {a,b} (lowercase letters)

DefiniteClause Grammars (DCG) Example –Formal grammarProlog format –S  aB s --> [a],b. –B  aB b --> [a],b. –B  bC b --> [b],c. –B  b b --> [b]. –C  bC c --> [b],c. –C  b c --> [b]. Notes: –Start symbol: S –Non-terminals: {S,B,C} (uppercase letters) –Terminals: {a,b} (lowercase letters) Prolog format: both terminals and non- terminal symbols begin with lowercase letters –Variables begin with an uppercase letter (or underscore) --> is the rewrite symbol terminals are enclosed in square brackets (list notation) nonterminals don ’ t have square brackets surrounding them the comma (,: and) represents the concatenation symbol a period (. ) is required at the end of every DCG rule

DefiniteClause Grammars (DCG) What language does our grammar generate? by writing the grammar in Prolog, we have a ready-made recognizer program –no need to write a separate parser (in this case) Example queries –?- s([a,a,b,b,b],[]). Yes –?- s([a,b,a],[]). No Note: –Query uses the start symbol s with two arguments: –(1) sequence (as a list) to be recognized and –(2) the empty list [] One or more a’s followed by one or more b’s

DefiniteClause Grammars (DCG) (simplified) computation tree –?- s([a,a,b,b,b],[]). (rule 1) ?- b([a,b,b,b],[]). (rule 2) –?- b([b,b,b],[]). (rule 3) »?- c([b,b],[]). (rule 5) » ?- c([b],[]). (rule 6) » Yes –?- s([a,b,a],[]). (rule 1) ?- b([b,a],[]). (rule 3 ) –?- c([a],[]). –No Prolog’s computation rule allows it to handle ambiguity –; looks for more possible derivations 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b].

DefiniteClause Grammars (DCG) Language: Sheeptalk –ba! –baa! –baaa! –ba...a! not in language –b! –bab! –!

DefiniteClause Grammars (DCG) language: Sheeptalk –ba! –baa! –baaa! –ba...a! grammar –s --> [b], a, [‘!’]. –a --> [a]. (base case) –a --> a, [a]. (recursive case) BTW, you can’t assert grammar rules ?- assert(a --> [b]). Yes ?- listing. :- dynamic (-->)/2. a-->[b]. need to load them in from a file!

DefiniteClause Grammars (DCG) computation rule also allows for infinite loops –just like with regular Prolog rules example (sheeptalk) –s --> [b], a, [‘!’]. –a --> a, [a]. (left recursive rule) –a --> [a]. (simplified) computation tree –?- s([b,a,a,’!’],[]). ?- a([a,a,’!’],L). ?- L - [] = [‘!’]. (pseudo-code, ‘ - ’ is list difference,TBE) –?- a([a,a,’!’],L). (left recursive rule) ? - a([a,a,’!’],L’). (left recursive rule) –?- a([a,a,’!’],L”). (left recursive rule) »… try grammar with 3rd rule coming before 2nd rule or substituting the right recursive counterpart of rule 2 a --> [a], a.

Definite Clause Grammars (DCG) other features –right-hand-side can be empty e.g. a --> []. useful for grammar writing: encode empty categories e.g. NP  ε NP  λ np --> []. –no designated start symbol e.g. subgrammar c from the first example generates the language: one or more b’s other features –add arguments to non-terminals e.g. s(N) --> [a], b(1,N). can pass arguments up and down the derivation construct parse trees check feature structures –insert regular Prolog queries into the middle of a grammar rule using { } e.g. b(M,N) --> [a], {addOne(M,P)}, b(P,N). allows us to insert code to count the number of a’s here or compute anything we like … general rewrite rule (type-0) power

Definite Clause Grammars (DCG) the relation between DCG rules and Prolog is one of –“syntactic sugar” –notation is for convenience of the grammar rule writer only actually, DCG rules are translated automatically into regular Prolog rules when loaded into the database computation rule works on the translated rules directly –not the DCG rules this is (unfortunately) non-transparent –using the debugger, you will see the translated rules rather than the easy-to-read source DCG rules

Definite Clause Grammars (DCG) each DCG rule corresponds to a Prolog rule –named using the left-hand-side of the DCG rule example: –s --> [a], b. –becomes –s(L1,L3) :- ‘C’(L1,a,L2), b(L2,L3). –where L1, L2, L3 are lists S spans the difference between L1 and L3 [a] spans the difference between L1 and L2 b spans the difference between L2 and L3 [a],b spans the difference between L1 and L3 (same as S ) ‘C’(L1,a,L2) refers to a SWI-Prolog built-in named ‘C’ that holds if a is the difference between L1 and L2