Sum-Product Networks CS886 Topics in Natural Language Processing

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Sum-Product Networks: A New Deep Architecture
Bayesian Belief Propagation
Autonomic Scaling of Cloud Computing Resources
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Graphical models: approximate inference and learning CA6b, lecture 5.
The Factor Graph Approach to Model-Based Signal Processing Hans-Andrea Loeliger.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Artificial Intelligence and Lisp Lecture 7 LiU Course TDDC65 Autumn Semester, 2010
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Artificial Neural Networks
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Latent Tree Models Part II: Definition and Properties
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
A Brief Introduction to Graphical Models
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Randomized Algorithms for Bayesian Hierarchical Clustering
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Pattern Recognition and Machine Learning
Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,
Sum-Product Networks: A New Deep Architecture
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Daphne Koller Introduction Motivation and Overview Probabilistic Graphical Models.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Sum-Product Networks Ph.D. Student : Li Weizhuo
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Fill-in-The-Blank Using Sum Product Network
Learning Deep Generative Models by Ruslan Salakhutdinov
Statistical Models for Automatic Speech Recognition
Intelligent Information System Lab
Markov Networks.
Read R&N Ch Next lecture: Read R&N
CSE 515 Statistical Methods in Computer Science
Markov Networks.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Markov Random Fields Presented by: Vladan Radosavljevic.
Expectation-Maximization & Belief Propagation
Topic Models in Text Processing
Discriminative Probabilistic Models for Relational Data
Read R&N Ch Next lecture: Read R&N
Markov Networks.
CS249: Neural Language Model
Presentation transcript:

Sum-Product Networks CS886 Topics in Natural Language Processing Guest Lecture by Pascal Poupart University of Waterloo July 22, 2015

Outline What is a Sum-Product Network? Inference Language modeling

What is a Sum-Product Network? Poon and Domingos, UAI 2011 Acyclic directed graph of sums and products Leaves can be indicator variables or univariate distributions

Two Views Deep architecture with clear semantics Tractable probabilistic graphical model

Deep Architecture Specific type of deep neural network Advantage: Activation function: product Advantage: Clear semantics and well understood theory

Probabilistic Graphical Models Bayesian Network Graphical view of direct dependencies Inference #P: intractable Markov Network Graphical view of correlations Inference #P: intractable Sum-Product Network Graphical view of computation Inference P: tractable

Probabilistic Inference SPN represents a joint distribution over a set of random variables Example: Pr⁡( 𝑋 1 =𝑡𝑟𝑢𝑒, 𝑋 2 =𝑓𝑎𝑙𝑠𝑒)

Marginal Inference Example: Pr⁡( 𝑋 2 =𝑓𝑎𝑙𝑠𝑒)

Conditional Inference Example: Pr 𝑋 1 =𝑡𝑟𝑢𝑒 𝑋 2 =𝑓𝑎𝑙𝑠𝑒 = Pr 𝑋 1 =𝑡𝑟𝑢𝑒, 𝑋 2 =𝑓𝑎𝑙𝑠𝑒 Pr 𝑋 2 =𝑓𝑎𝑙𝑠𝑒 = Hence any inference query can be answered in two bottom passes of the network Linear complexity!

Semantics A valid SPN encodes a hierarchical mixture distribution Sum nodes: hidden variables (mixture) Product nodes: factorization (independence)

Definitions The scope of a node is the set of variables that appear in the sub-SPN rooted at the node An SPN is decomposable when each product node has children with disjoint scopes An SPN is complete when each sum node has children with identical scopes A decomposable and complete SPN is a valid SPN

Relationship with Bayes Nets Any SPN can be converted into a bipartite Bayesian network (Zhao, Melibari, Poupart, ICML 2015)

Parameter Learning Data Parameter Learning: estimate the weights Expectation-Maximization, Gradient descent Instances ? Data Attributes

Structure Learning Alternate between Data Clustering: sum nodes Variable partitioning: product nodes

Applications Image completion (Poon, Domingos; 2011) Activity recognition (Amer, Todorovic; 2012) Language modeling (Cheng et al.; 2014) Speech modeling (Perhaz et al.; 2014)

Language Model An SPN-based n-gram model Fixed structure Discriminative weight learning by gradient descent

Results From Cheng et al. 2014

Conclusion Sum-Product Networks Work in progress at Waterloo Deep architecture with clear semantics Tractable probabilistic graphical model Work in progress at Waterloo Improved structure learning: H. Zhao Online parameter learning: H. Zhao, A. Rashwan SPNs for sequence data: M. Melibari Decision SPNs: M. Melibari Open problem: Thorough comparison of SPNs to other deep networks