Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Dynamic Bayesian Networks (DBNs)
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
CUSTOMER NEEDS ELICITATION FOR PRODUCT CUSTOMIZATION Yue Wang Advisor: Prof. Tseng Advanced Manufacturing Institute Hong Kong University of Science and.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Lecture 14: Collaborative Filtering Based on Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative.
Data Mining Techniques Outline
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Today Logistic Regression Decision Trees Redux Graphical Models
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Pattern Recognition and Machine Learning
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Today Graphical Models Representing conditional dependence graphically
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
Learning Deep Generative Models by Ruslan Salakhutdinov
Multimodal Learning with Deep Boltzmann Machines
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Random Fields Presented by: Vladan Radosavljevic.
Learning Probabilistic Graphical Models Overview Learning Problems.
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research 1, 2000

2 Contents t Introduction t Dependency Networks t Probabilistic Inference t Collaborative Filtering t Experimental Results

3 Introduction  Representation of dependency network: Collection of regressions or classifications among variables using the machinery of Gibbs sampling. A cyclic graph. u Advantages: computationally efficient algorithm, useful for encoding and displaying predictive relationships, useful for the task of predicting preferences (collaborative filtering), useful for answering probabilistic queries. u Disadvantages: not useful for encoding casual relationships, difficult to construct using knowledge based approach.

4 Dependency Networks  Bayesian network and Dependency network. A consistent dependency network for X: (G, P) Each local distribution can be obtained from the joint distribution p(x). Cf. Markov network: (U,  ) P: a set of conditional probability distributions.(easier to interpret than potentials)  : a set of potential functions.  Markov network (undirected graphical model) and consistent dependency network have the same representation power.

5 Probabilistic Inference  Probabilistic Inference: Given a graphical model for X = (Y, Z) where Y is a set of target variables and Z is a set of input variables, what is p(y|z) or p(x) (a density estimation)?  Given a consistent dependency network for X, a probabilistic inference can be done by converting it to a Markov network, triangulating then applying one of the standard algorithms like junction tree algorithm.  Here Gibbs algorithm is considered for recovering the joint distribution p(x) of X.

6 Probabilistic Inference  Calculation of p(x): u Ordered Gibbs Sampler: Initialize each variable Resample each X i by p(x i | x \ x i ) in an order X 1, …, X n. u An ordered Gibbs sampler recovers the joint distribution for X.  Calculation of p(y|z) Naïve approach: Use Gibbs sampler directly, only samples for Z=z to compute. If either p(y|z) or p(z) is small, many iterations are required

7 Probabilistic Inference  Calculation of p(y|z) u For small p(z): Modified ordered Gibbs sampler. (fix Z=z in the ordered Gibbs sampling.) u For small p(y|z): Y contains many variables. Use dependency network to avoid some Gibbs sampling. Eg.) [X 1 X 2  X 3 ] : X 1 can be determined with no Gibbs sampling.(Cf. Step 7 in Algorithm 1), X 2, X 3 can be determined by modified Gibbs samplers each with target x 2, x 3 each.

8 Probabilistic Inference

9 Dependency Network  Extension of consistent dependency networks: For computational concerns, independently estimate each local distribution using a classification algorithm.(Feature selection in the classification process also governs the structure of the dependency network). u Drawbacks: structural inconsistency due to heuristic search and finite data effects. ( x  y, not vice versa) u Large sample size will overcome this inconsistency. u The ordered Gibbs sampler yields a joint distribution for the domain whether the local distributions are consistent or not.(ordered pseudo Gibbs sampler)

10 Dependency Network  Conditions for consistency: A minimal consistent dependency network for a positive distribution p(x) must be bi-directional. (a consistent dependency network is minimal if for every node and parent, the node is not independent to one of its parent given the remaining parents)  Decision Trees for local probability: X i : target variable, X \ X i : input variables parameter prior: uniform, structure prior: k f (f is # free par.) Use Bayesian score to learn the tree structure.

11 Dependency Network  Decision Trees for local probability (cont`d): u Start from a single root node. u Each leaf node is a binary split of some input variable until no further increase of score.

12 Dependency Network

13 Dependency Network

14 Collaborative Filtering  Preference prediction based on the history.  Prediction of what products a person will buy knowing the items already purchased  Express each item as a binary variable (Breece et al. (1998)) u Use this data set of learnings to learn a BN for the joint distribution of these variables u Given a new user’s preference x, use a Bayesian network to estimate p(x i =1 | x \ x i =0) for each product X i not purchased yet. u Return a list of recommended items ranked by these estimates u This method outperforms memory-based and cluster-based methods. (Breece) u In this paper, p(x i =1 | x \ x i =0)  p(x i =1 | Pa i ) ( a direct lookup in a dependency network)

15 Experimental Results Accuracy evaluation : Score (x 1,…, x 1 | model) = - [  log p(x i | mpdel)] / [nN] (Average # of bits needed to encode the obs. of var. in the test set) : measures how well the dependency network predicts an entire case.

16 t Criteria of a user’s expected utility for a list of recommendation. (P(k) : A probability a user will examine the k-th item on a recommendation list.)

17

18  Data sets: u Sewall/Shah: college plans of high school seniors u WAM: women’s preference for a career in Math. u Digits: images of handwritten digits u Nielson: 5 or more watch of TV shows u MS.COM: visit of a cite for users of MS.com u MSNBC: visitors of MSNBC for reading the most popular 1001 stories on the cite.