Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI-05-192,

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

Lecture 6  Calculating P n – how do we raise a matrix to the n th power?  Ergodicity in Markov Chains.  When does a chain have equilibrium probabilities?
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,
Graphs, Node importance, Link Analysis Ranking, Random walks
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Entropy Rates of a Stochastic Process
Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Lecture 21: Spectral Clustering
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
The Shortest Path Problem
More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.
The Erdös-Rényi models
Entropy Rate of a Markov Chain
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala.
Applied Discrete Mathematics Week 13: Boolean Algebra
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Feature Matching Longin Jan Latecki. Matching example Observe that some features may be missing due to instability of feature detector, view change, or.
Methods of Computing the PageRank Vector Tom Mangan.
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Random Walk with Restart (RWR) for Image Segmentation
1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Adaptive On-Line Page Importance Computation Serge, Mihai, Gregory Presented By Liang Tian 7/13/2010 1Adaptive On-Line Page Importance Computation.
Computing Eigen Information for Small Matrices The eigen equation can be rearranged as follows: Ax = x  Ax = I n x  Ax - I n x = 0  (A - I n )x = 0.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.
2 2.1 © 2012 Pearson Education, Inc. Matrix Algebra MATRIX OPERATIONS.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
CS 590 Term Project Epidemic model on Facebook
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Ultra-high dimensional feature selection Yun Li
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Semi-Supervised Learning with Graph Transduction Project 2 Due Nov. 7, 2012, 10pm EST Class presentation on Nov. 12, 2012.
Online Social Networks and Media Absorbing random walks Label Propagation Opinion Formation.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Intrinsic Data Geometry from a Training Set
Markov Chains Mixing Times Lecture 5
PageRank Random Surfers on the Web Transition Matrix of the Web Dead Ends and Spider Traps Topic-Specific PageRank Jeffrey D. Ullman Stanford University.
Link-Based Ranking Seminar Social Media Mining University UC3M
DTMC Applications Ranking Web Pages & Slotted ALOHA
Degree and Eigenvector Centrality
Image Retrieval Longin Jan Latecki.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
3.5 Minimum Cuts in Undirected Graphs
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Graph Operations And Representation
Graph and Link Mining.
Presentation transcript:

Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI , May 2005 Page, Lawrence and Brin, Sergey and Motwani, Rajeev and Winograd, Terr.y The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab, Ya Xu. SEMI-SUPERVISED LEARNING ON GRAPHS– A STATISTICAL APPROACH. PhD Thesis, STANFORD UNIVERSITY, May 2010

Random walk propagates information on a graph based on similarity of graph nodes Graph of web pages Graphs of hand written digits

Structure of real graphs One of the most essential characteristics is sparsity. The total number of edges in real graphs, particularly in large real graphs, usually grows linearly with the number of nodes n (instead of n 2 as in a fully connected graph). As we can see in the table in all four real graphs the average degrees are less than ten, even though there are thousands of nodes. The sparsity property makes it possible to scale many prediction algorithms to very large graphs. Another key property is the degree distribution, which gives the probability that a randomly picked node has a certain degree. It follow a power-law distribution, which means there is usually only a very small fraction of nodes that have very large degrees.

Semi-Supervised Learning Large amounts of unlabeled training data Some limited amounts of labeled data

Graph Construction Example Vertecies are objects The edge weights are determined by shape similarity

Random walk propagates similarities on the data manifold

10 Key idea of random walk label propagation Pair-wise similarity lacks accuracy in some cases The similarity propagated among objects is helpful for revealing the intrinsic relation. The two horses are very different so that the dog looks more similar to the left horse. However, their similarity is reviled in the context of other horses. Random walk allows us to find paths that link them.

Problem Setup {(x 1, y 1 )... (x l, y l )} be the labeled data, y in {1...C}, and {x l+1... x l+u } the unlabeled data, usually l << u. Hence we have total of n = l + u data points. We will often use L and U to denote the sets of labeled and unlabeled data. We assume the number of classes C is known, and all classes are present in the labeled data. We study the transductive problem of finding the labels for U. The inductive problem of finding labels for points outside of L and U will not be discussed. Intuitively we want data points that are similar to have the same label. We create a graph where the nodes are all the data points, both labeled and unlabeled. The edge between nodes i, j represents their similarity. We assume the graph is fully connected with the following weights: where α is a bandwidth hyperparameter.

Label Propagation Algorithm We assume that we only have two label classes C = {0, 1}. Y L is l × 1 label of ones. f is n × 1 vector of label probabilities, e.g., entry f(i) is the probability of y i =1. We want that f(i) = 1 if i belongs to L. The initialization of f for unlabeled points is not important. The label propagation algorithm: or equivalently:

Random Walk Interpretation Imagine a random walk on the graph. Starting from an unlabeled node i, we move to a node j with probability P ij after one step. The walk stops when we hit a labeled node. Then f(i) is the probability that the random walk, starting from node i, hits a labeled node with label 1. Here the labeled node is viewed as an absorbing state. This interpretation requires that the random walk is irreducible and aperiodic.

Random Walk Hitting Probability According to slide 30 in Markov Process we need to solve:Markov Process Since we know that f L is composed of ones for label points, we only need to find f U : The fact that the matrix inverse exists follows from the next slides.

For Label Propagation algorithm we want to solve: Since P is row normalized, and P UU is a sub-matrix of P, it follows: Hence 0 and we obtain a closed form solution:

The identity holds if and only if the maximum of the absolute values of the eigenvalues of A is smaller than 1. Since we assume that A has nonnegative entries, this maximum is smaller than or equal to the maximum of the row-wise sums of matrix A. Therefore, it is sufficient to show that the sum of each row of A is smaller than 1.

Energy Minimization Interpretation The solution f minimizes the following energy function over all functions under the constraint that f(i) = 1 if i belongs to L. This energy can be equivalently written as where is graph Laplacian and D is a diagonal matrix with The solution f also minimizes the following for large λ 0, where Y (0) =Y L :

More on Graph Laplacian Given is an undirected graph G=(V, E, W), where V is a set of n vertices, E is set of edges, and W is a symmetric n x n weight or (affinity) matrix with all nonnegative entries such that w(i,j) = 0 if and only if there is no edge between nodes i and j. We consider two different definitions of graph Laplacian :

Example of a Graph Laplacian Computer the Laplacian for the following graph:

Graph Laplacian has following properties:

Proof: a) (b) follows directly from (a). (c) is a trivial result of (a) and the following