Download presentation
Presentation is loading. Please wait.
Published byJeffrey Heath Modified over 9 years ago
1
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and Technology Tsinghua University, China
2
An Example
3
Instances Correlation
4
An Example Instances Correlation ? ? ? ? ? ? Classify each instance into {+1, -1}
5
An Example Instances Correlation +1 ? +1 ? ?
6
An Example Instances Correlation +1 ? +1 ? ? Query for label
7
An Example Instances Correlation +1 ? +1 ?
8
Problem: Active Learning for Networked Data Instances Correalation +1 ? +1 ? ? Challenge It is expensive to query for labels! Questions Which instances should we select to query? How many instances do we need to query, for an accurate classifier?
9
Challenges Active Learning for Networked Data How to leverage network correlation among instances? How to query in a batch mode?
10
Batch Mode Active Learning for Networked Data Given a graph Unlabeled instances Features Matrix Labeled instances Labels of labeled instances Edges Our objective is Subject to A subset of unlabeled instances The utility function Labeling budget
11
Factor Graph Model ? ? ? ? ? ? Variable Node Factor Node
12
Factor Graph Model The joint probability Local factor function Edge factor function Log likelihood of labeled instances
13
Factor Graph Model Learning Gradient descent Calculate the expectation: Loopy Belief Propagation (LBP) Message from variable to factor Message from factor to variable
14
Question: How to select instances from Factor graph for active learning?
15
Basic principle: Maximize the Ripple Effects ? ? ? ? ? ?
16
Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated
17
Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated
18
Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated Statistical bias is propagated How to model the propagation process in a unlabeled network?
19
Diffusion Model Linear Threshold Model Progressive Diffusion Model Non-Progressive Diffusion Model Linear Threshold
20
Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated Statistical bias is propagated Will it be dominated by labeling information (active) or statistical bias (inactive)? Based on non-progressive diffusion model Maximize the number of activated instances in the end We aim to activate the most uncertain instances!
21
Instantiate the Problem Active Learning Based on Non-Progressive Diffusion Model, The number of activated instances With constraints Initially activate all queried instances We activate the most uncertain instances Based on the non-progressive diffusion
22
Reduce the Problem The original problem The reduced problem Constraints are inherited. Reduction procedure
23
Algorithm The reduced problem The key idea
24
Algorithm
25
Theoretical Analysis Convergence Lemma 1 The algorithm will converge within time. Correctness Approximation Ratio
26
Experiments Datasets #Variable node#Factor node Coauthor6,09624,468 Slashdot3701,686 Mobile314513 Enron100236 Comparison Methods Batch Mode Active Learning (BMAL), proposed by Shi et al. Influence Maximization Selection (IMS), proposed by Zhuang et al. Maximum Uncertainty (MU) Random (RAN) Max Coverage (MaxCo), our method
27
Experiments Performance
28
Related Work Active Learning for Networked Data Actively learning to infer social ties H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang Batch mode active learning for networked data L. Shi, Y. Zhao and J. Tang Towards active learning on graphs: an error bound minimization approach Q. Gu and J. Han Integreation of active learing in a collaborative crf O. Martinez and G. Tsechpenakis Diffusion Model On the non-progressive spread of influence through social networks M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad Maximizing the spread of influence through a social network D. Kempe, J. Kleinberg and E. Tardos
29
Conclusion Connect active learning for networked data to non-progressive diffusion model, and precisely formulate the problem Propose an algorithm to solve the problem Theoretically guarantee the convergence, correctness and approximation ratio of the algorithm Empirically evaluate the performance of the algorithm on four datasets of different genres
30
Future work Consider active learning for networked data in a streaming setting, where data distribution and network structure are changing over time
31
About Me Zhilin Yang kimiyoung@yeah.net 3 rd year undergraduate at Tsinghua Univ. Applying for PhD programs this year Data Mining & Machine Learning
32
Thanks! kimiyoung@yeah.net
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.