Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Learning with Missing Data
Unsupervised Learning
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Part Based Models Andrew Harp. Part Based Models Physical arrangement of features.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Segmentation and Fitting Using Probabilistic Methods
A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.
Active Learning and Collaborative Filtering
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Overview Full Bayesian Learning MAP learning
Bayesian network inference
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Lecture 14: Collaborative Filtering Based on Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
Lecture 5: Learning models using EM
1 Collaborative Filtering Rong Jin Department of Computer Science and Engineering Michigan State University.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
Top-N Recommendation Algorithm Based on Item-Graph
Clustering.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Recommender systems Ram Akella November 26 th 2008.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
Algorithms for Efficient Collaborative Filtering Vreixo Formoso Fidel Cacheda Víctor Carneiro University of A Coruña (Spain)
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Google News Personalization: Scalable Online Collaborative Filtering
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Online Learning for Collaborative Filtering
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Variational Bayesian Methods for Audio Indexing
Lecture 2: Statistical learning primer for biologists
Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.
Flat clustering approaches
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Methods and Metrics for Cold-Start Recommendations
Irina Rish IBM T.J.Watson Research Center
Bayesian Models in Machine Learning
Learning Markov Networks
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Probabilistic Latent Preference Analysis
Markov Networks.
Presentation transcript:

Learning in Bayesian Networks

Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete Data Learning The Learning Problem

Known Structure Complete Data

Known Structure Incomplete Data

Unknown Structure Complete Data

Unknown Structure Incomplete Data

Known Structure Method A CPTs A Method B CPTs B

Known Structure = Pr A + CPTs A = Pr B + CPTs B Which probability distribution should we choose? Common criterion: Choose distribution that maximizes likelihood of data

Known Structure = Pr A + CPTs A = Pr B + CPTs B d1d1 d6d6 Data D Pr A (D) = Pr A (d 1 ) … Pr A (d m ) Likelihood of data given Pr A Pr B (D) = Pr B (d 1 ) … Pr B (d m ) Likelihood of data given Pr B

Maximizing Likelihood of Data Complete Data: Unique set of CPTs which maximize likelihood of data Incomplete Data: No Unique set of CPTs which maximize likelihood of data

Maximizing Likelihood of Data Complete Data: Unique set of CPTs which maximize likelihood of data Incomplete Data: No Unique set of CPTs which maximize likelihood of data

Known Structure, Complete Data Data D d1d1 d6d6 Estimated parameter: Number of data points d i with d b c Number of data points d i with b c =

Known Structure, Complete Data Data D d1d1 d6d6 Estimated parameter:

Complexity Network with: –Nodes: n –Parameters: k –Data points: m Time complexity: O(m k ) (straightforward implementation) Space complexity: O(k + mn) parameter count

Known Structure, Incomplete Data Estimated parameters at iteration i+1 (using the CPTs at iteration i): Pr 0 corresponds to the initial Bayesian network (random CPTs)

Known Structure, Incomplete Data EM Algorithm (Expectation-Maximization): -Initial CPTs to random values -Repeat until convergence: -Estimate parameters using current CPTs (E-step) -Update CPTs using estimates (M-step)

EM Algorithm Likelihood of data cannot get smaller after an iteration Algorithm is not guaranteed to return the network which absolutely maximizes likelihood of data It is guaranteed to return a local maxima: Random re-starts Algorithm is stopped when –change in likelihood gets very small –Change in parameters gets very small

Complexity Network with: –Nodes: n –Parameters: k –Data points: m –Treewidth: w Time complexity (per iteration): O(m k n 2 w ) (straightforward implementation) Space complexity: O(k + nm + n 2 w ) parameter count + space for data + space for inference

Collaborative Filtering Collaborative Filtering (CF) finds items of interest to a user based on the preferences of other similar users. –Assumes that human behavior is predictable

Where is it used? E-commerce –Recommend products based on previous purchases or click- stream behavior –Ex: Amazon.com Information sites –Rate items based on previous user ratings –Ex: MovieLens, Jester

John5-32 Sam-415 Cindy3-5- Bob CF

Memory-based Algorithms Use the entire database of user ratings to make predictions. –Find users with similar voting histories to the active user. –Use these users’ votes to predict ratings for products not voted on by the active user.

Model-based Algorithms Construct a model from the vote database. Use the model to predict the active user’s ratings.

Bayesian Clustering Use a Naïve Bayes network to model the vote database. m vote variables: one for each title. –Represent discrete vote values. 1 “cluster” variable –Represents user personalities

Naïve Bayes C V1V1 V2V2 V3V3 VmVm …

C V1V1 V2V2 V3V3 VmVm …

Inference –Evidence: known votes v k for titles k  I –Query: title j for which we need to predict vote Expected value of vote:    w h kjj Ikvhvhp 1 ):|Pr( C V1V1 V2V2 V3V3 VmVm …

Learning Simplified Expectation Maximization (EM) Algorithm with partial data Initialize CPTs with random values subject to the following constraints: )Pr(c c  )| | cv kcv k  1   C c  1 |   k k v cv 

Datasets MovieLens –943 users; 1682 titles; 100,000 votes (1..5); explicit voting MS Web – website visits –610 users; 294 titles; 8,275 votes (0,1) : null votes => 0 : 179,340 votes; implicit voting

Learning curve for MovieLens Dataset

Protocols User database is divided into: 80% training set and 20% test set. –One-by-one select a user from the test set to be the active user. –Predict some of their votes based on remaining votes

All-But-One Given-{Two, Five, Ten} Qee IaIa eeeeeeeeeee Qee IaIa QQQQQQQQQQQ eeeeeQ IaIa QQQQQQQQeeeeQe IaIa QQQeeeee

Evaluation Metric Average Absolute Deviation Ranked Scoring

Results Experiments were run 5 times and averaged Movielens AlgorithmGiven-TwoGiven-FiveGiven-TenAll-But-One Correlation VecSim BC(9)

MS Web AlgorithmGiven-TwoGiven-FiveGiven-TenAll-But-One Correlation VecSim BC(9)

Computational Issues Prediction time: (Memory-based) 10 minutes per experiment; (Model-based) 2 minutes Learning time: 20 minutes per iteration n: number of data point; m: number of titles; w: number of votes per title; |C| number of personality types AlgorithmPrediction TimeLearning TimeSpace Memory-basedO(n*m)N/AO(n*m) Model-basedO(|C|*m)O(n*m*|C|*w)O(|C|*m*w)

Demo of SamIam Building networks: –Nodes, Edges –CPTs Inference: –Posterior marginals –MPE –MAP Learning: EM Sensitivity Engine