Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic DCGs and SLPs Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Slides:



Advertisements
Similar presentations
1© 2003 T. Abou-Assaleh, N. Cercone, & V. Keselj Expressing Probabilistic Context-Free Grammars in the Relaxed Unification Formalism Tony Abou-Assaleh.
Advertisements

CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
CSA4050: Advanced Topics in NLP Semantics IV Partial Execution Proper Noun Adjective.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
First Order Logic Resolution
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
September PROBABILISTIC CFGs & PROBABILISTIC PARSING Universita’ di Venezia 3 Ottobre 2003.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Features and Unification
1/17 Probabilistic Parsing … and some other approaches.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
October 2004csa4050: Semantics II1 CSA4050: Advanced Topics in NLP Semantics II The Lambda Calculus Semantic Representation Encoding in Prolog.
ICML-Tutorial, Banff, Canada, 2004 Overview 1.Introduction to PLL 2.Foundations of PLL –Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Markov Logic And other SRL Approaches
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Artificial Intelligence: Natural Language
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Computing Science, University of Aberdeen1 CS4025: Grammars l Grammars for English l Features l Definite Clause Grammars See J&M Chapter 9 in 1 st ed,
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Knowledge Based Systems (CM0377) Lecture 6 (last modified 20th February 2002)
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NATURAL LANGUAGE PROCESSING
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Chapter 7. Propositional and Predicate Logic
Natural Language Processing
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Data Mining Lecture 11.
Training Tree Transducers
N-Gram Model Formulas Word sequences Chain rule of probability
Chapter 7. Propositional and Predicate Logic
Presentation transcript:

Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic DCGs and SLPs Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Many slides taken from Kristian Kersting

Overview  Expressive power of PCFGs, HMMs, BNs still limited First order logic is more expressive  Why not combine logic with probabilities ? Probabilistic logic learning Statistical relational learning  One example in NLP Probabilistic DCGs (I.e. PCFGs + Unification)

Context One of the key open questions of artificial intelligence concerns "probabilistic logic learning", i.e. the integration of probabilistic reasoning with machine learning. first order logic representations and Sometimes called Statistical Relational Learning

So far  We have largely been looking at probabilistic representations and ways of learning these from data BNs, HMMs, PCFGs  Now, we are going to look at their expressive power, and make traditional probabilistic representations more expressive using logic Probabilistic First Order Logics Lift BNs, HMMs, PCFGs to more expressive frameworks Upgrade also the underlying algorithms

Prob. Definite Clause Grammars  Recall : Prob. Regular Grammars Prob. Context-Free Grammars  What about Prob. Turing Machines ? Or Prob. Grammars ? Combine PCFGs with Unification A more general language exists : stochastic logic programs (SLPs)  Prolog + PCFGs

Probabilistic Context Free Grammars 1.0 : S -> NP, VP 1.0 : NP -> Art, Noun 0.6 : Art -> a 0.4 : Art -> the 0.1 : Noun -> turtle 0.1 : Noun -> turtles … 0.5 : VP -> Verb 0.5 : VP -> Verb, NP 0.05 : Verb -> sleep 0.05 : Verb -> sleeps …. The turtle sleeps Art Noun Verb NP VP S P(parse tree) = 1x1x.5x.1x.4x.05

We defined

PCFGs

Probabilistic Definite Clause Grammar 1.0 : S -> NP(Num), VP(Num) 1.0 NP(Num) -> Art(Num), Noun(Num) 0.6 Art(sing) -> a 0.2 Art(sing) -> the 0.2 Art(plur) -> the 0.1 Noun(sing) -> turtle 0.1 Noun(plur) -> turtles … 0.5 VP(Num) -> Verb(Num) 0.5 VP(Num) -> Verb(Num), NP(Num) 0.05 Verb(sing) -> sleep 0.05 Verb(plur) -> sleeps …. The turtle sleeps Art(s) Noun(s) Verb(s) NP(s) VP(s) S P(derivation tree) = 1x1x.5x.1x.2 x.05

In SLP notation /31/ P(s([the,turtles,sleep],[])=1/6 1 1/3 1/2 1

Stochastic Logic Programs  Correspondence between CFG - SLP Symbols - Predicates Rules - Clauses Derivations - SLD-derivations/Proofs  So, a stochastic logic program is an annotated logic program. Each clause has an associated probability label. The sum of the probability labels for clauses defining a particular predicate is equal to 1.

Probabilistic Definite Clause Grammar 1.0 : S -> NP(Num), VP(Num) 1.0 NP(Num) -> Art(Num), Noun(Num) 0.6 Art(sing) -> a 0.2 Art(sing) -> the 0.1 Noun(sing) -> turtle 0.1 Noun(plur) -> turtles … 0.5 VP(Num) -> Verb(Num) 0.5 VP(Num) -> Verb(Num), NP(Num) 0.05 Verb(sing) -> sleep 0.05 Verb(plur) -> sleeps …. The turtle sleeps Art(s) Noun(s) Verb(s) NP(s) VP(s) S P(derivation tree) = 1x1x.5x.1x.2 x.05 What about “A turtles sleeps” ?

PDCGs

PDCGs : distributions

PCFGs : distributions

Sampling  PRGs, PCFGs, PDCGs, and SLPs can also be used for sampling sentences, ground atoms that follow from the program  Rather straightforward. Consider PDCGs: Probabilistically explore parse-tree At each step, select possible resolvents using the probability labels attached to clauses If derivation succeeds, return corresponding sentence If derivation fails, then restart.

Questions we can ask (and answer) about PDCGs and SLPs

Answers  The algorithmic answers to these questions, again extend those of PCFGs and HMMs, in particular, Tabling is used (to record probabilities of partial proofs/parse trees and intermediate results) Failure Adjusted EM (FAM) is used to solve parameter re-estimation problem  Only sentences observed  Unobserved: Possible refutations and derivations for observed sentences  Therefore EM, e.g., Failure Adjusted Maximisation (Cussens) and more recently (Sato) for PRISM  Topic of recent research

Answers  There is a decent implementation of these ideas in the system PRISM by Taisuke Sato for a variant of SLPs/PDCGs 

Structure Learning  From proof trees : De Raedt et al AAAI 05 Learn from proof-trees (parse trees) instead of from sentences Proof-trees carry much more information Upgrade idea of tree bank grammars and PCFGs  Given A set of proof trees  Find A PDCG that maximizes the likelihood

Initial Rule Set DCG S -> NP(s), VP(s) NP(s) -> Art(s), Noun(s) VP(s) -> Verb(s) Art(s) -> the Noun(s) -> turtle Verb(s) -> sleeps The turtle sleeps Art(s) Noun(s) Verb(s) NP(s) VP(s) S P(derivation tree) = 1x1x.5x.1x.4x.05 How to get the variables back ?

Learning PDCGs from Proof Trees  Based on Tree-Bank Grammar idea, e.g. Penn Tree Bank  Key algorithm Let S be the set of all (instantiated) rules that occur in an example proof tree Initialize parameters repeat as long as the score of S improves  Generalize S  Estimate the parameters of S using Cussens’ FAM (which can be simplified - proofs are now observed) Output S

Generalizing Rules in SLPs  Generalization in logic Take two rules for same predicate and replace them by the lgg under -subsumption (Plotkin) Example department(cs,nebel) -> prof(nebel), in(cs), course(ai), lect(nebel,ai). department(cs,burgard) -> prof(burgard), in(cs),course(ai), lect(burgard,ai) Induce department(cs,P) -> prof(P), in(cs),course(ai), lect(P,ai)

Strong logical constraints  Replacing the rules r1 and r2 by the lgg should preserve the proofs/parse trees !  So, two rules r1 and r2 should only be generalized when There is a one to one mapping (with corresponding substitutions) between literals in r1, r2 and lgg(r1,r2)  Exclude father(j,a) -> m(j),f(a),parent(j,a) father(j,t) -> m(j),m(t), parent(j,t)  Gives father(j,P) -> m(j),m(X),parent(j,P)

Experiment

In all experiments : correct structure induced !

Conclusions  SLPs and PDCGs extend PCFGs as a representation  Proof-trees for PDCGs and PCFGs correspond to parse- trees in PCFGs  Also : a lot of related work in Statistical Relational Learning and Probabilistic Inductive Logic Programming Combining Graphical Models and Prob. Grammars with ideas (such as unification and relations) from first order logic Many approaches -  Bayesian Nets (and Bayesian Logic Programms)  Markov Networks (and Markov Logic Networks)  … Current research topic  EU project APRIL II