1 Heat Diffusion Model and its Applications Haixuan Yang Term Presentation Dec 2, 2005.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
ECG Signal processing (2)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Fast Algorithms For Hierarchical Range Histogram Constructions
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Data Mining Classification: Alternative Techniques
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Patch to the Future: Unsupervised Visual Prediction
Lecture 3 Nonparametric density estimation and classification
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
1 DiffusionRank: A Possible Penicillin for Web Spamming Haixuan Yang Group Meeting Jan. 16, 2006.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning Models on Random Graphs Haixuan Yang Supervisors: Prof. Irwin King and Prof. Michael R. Lyu June 20, 2007.
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah Supervisor: Dr. Sid Ray.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Experimental Evaluation
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Data Processing with Missing Information Haixuan Yang Supervisors: Haixuan Yang Supervisors: Prof. Irwin King & Prof. Michael R. Lyu.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Diffusion Maps and Spectral Clustering
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Predictive Ranking -H andling missing data on the web Haixuan Yang Group Meeting November 04, 2004.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
MA354 An Introduction to Math Models (more or less corresponding to 1.0 in your book)
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
MA354 Math Modeling Introduction. Outline A. Three Course Objectives 1. Model literacy: understanding a typical model description 2. Model Analysis 3.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof Michael R. Lyu Term Presentation 2006.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Ch8: Nonparametric Methods
Unsupervised Riemannian Clustering of Probability Density Functions
Outline Parameter estimation – continued Non-parametric methods.
Jianping Fan Dept of CS UNC-Charlotte
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Hidden Markov Models Part 2: Algorithms
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Topological Signatures For Fast Mobility Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

1 Heat Diffusion Model and its Applications Haixuan Yang Term Presentation Dec 2, 2005

2 Introduction Heat Diffusion Model Heat Diffusion Classifiers Heat Diffusion Ranking Predictive Random Graph Ranking Experiments Conclusions and Future Work Outline

3 Heat diffusion is a physical phenomena. In a medium, heat always flow from position with high temperature to position with low temperature. Heat kernel is used to describe the amount of heat that one point receives from another point. The way that heat diffuse varies when the underlying geometry varies. Introduction - heat diffusion

4 Introduction - related work Kondor & Lafferty (NIPS2002)  Construct a diffusion kernel on a graph  Handle discrete attributes  Apply to a large margin classifier  Achieve goof performance in accuracy on 5 data sets from UCI Lafferty & Kondor (JMLR2005)  Construct a diffusion kernel on a special manifold  Handle continuous attributes  Restrict to text classification  Apply to SVM  Achieve good performance in accuracy on WEbKB and Reuters Belkin & Niyogi (Neural Computation 2003)  Reduce dimension by heat kernel and local distance Tenenbaum et al (Science 2000)  Reduce dimension by local distance

5 Introduction – the ideas adopted Similarity between heat diffusion and density. Heat diffuses in the same way as Gaussian density in the ideal case when the manifold is the Euclidean space. The way heat diffuses on a manifold can be understood as a generalization of the Gaussian density from Euclidean space to manifold. Local information is relatively accurate in a nonlinear manifold. Learn local information by k nearest neighbors. Direct distance may not be accurate The curve may better measure the distance

6 Introduction – different ideas Unknown manifold in most cases. Unknown solution for the known manifold. The explicit form of the approximation to the heat kernel in (Lafferty & Lebanon JMLR2005) is a rare case. Establish the heat diffusion equation directly on a graph that is either the K nearest neighbor graph or the link graph. The K nearest neighbor graph or the link graph is considered as an approximation to the unknown manifold. Always have an explicit form in any case. Form a classifier by the solution directly in the application of classification. Apply the heat kernel for ranking on the Web pages.

7 Heat Diffusion Model - Notations G=(V,E), a given directed graph, where V={1,2,…,n}, E={(i,j): if there is an edge from i to j}, f i (t): the heat at node i at time t. RH(i,j,t,Δt): amount of heat that at time t, i receives from its antecedent j during a period of Δt. DH(i,t,Δt): amount of heat that at time t, i diffuses to its subsequent nodes.

8 Heat Diffusion Model - assumptions RH(i,j,t, Δt) is proportional to the time period Δt. RH(i,j,t, Δt) is proportional to the heat at node j. RH(i,j,t, Δt) is zero if there is no link from j to i. DH(i,j,t, Δt) is proportional to the time period Δt. DH(i,j,t, Δt) is proportional to the heat at node i. RH(i,j,t, Δt) is proportional to its outdegree.

9 The heat difference f i (t+Δt) and f i (t) can be expressed as: It can be expressed as a matrix form: where we let for simplicity. Let Δt tends to zero, the above equation becomes: Especially, we have Heat Diffusion Model - solution

10 For weighted graphs, the heat difference f i (t+Δt) and f i (t) can be expressed as The solution is expressed as Heat Diffusion Model – weighted graph

11 Heat Diffusion Classifiers - Illustration The first heat diffusion The second heat diffusion NHDC: Non- propagating Heat Diffusion Classifier PHDC: Propagating Heat Diffusion Classifier

12 Heat Diffusion Classifiers - Illustration

13 Heat Diffusion Classifiers - Illustration

14 Heat Diffusion Classifiers - Illustration Heat received from A class: Heat received from B class: Heat received from A class: Heat received from B class: 0.08

15 Heat Diffusion Classifiers - algorithm - Step 1 [Construct neighborhood graph]  Define graph G over all data points both in the training data set and in the test data set.  Add edge from j to i if j is one of the K nearest neighbors of i.  Set edge weight w(i,j)=d(i, j) if j is one of the K nearest neighbors of i, where d(i, j) be the Euclidean distance between point i and point j.

16 Heat Diffusion Classifiers - algorithm - Step 2 [Compute the Heat Kernel]  Computing H for NHDC using  Computing for PHDC using the equation

17 Heat Diffusion Classifiers - algorithm - Step 3 [ Compute the Heat Distribution ] For each class c,  Set f(0)  nodes labeled by class c, has an initial unit heat at time 0, all other nodes have no heat at time 0.  Compute the heat distribution  In PHDC, use equation to compute the heat distribution.  In NHDC, use equation

18 Heat Diffusion Classifiers - algorithm - Step 4 [ Classify the nodes ]  By last step, we get the heat distribution for each class k, then, for each node in the test data set, classify it to the class from which it receives most heat.

19 Heat Diffusion Classifiers - Connections with other models The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. It is a non-parametric method for probability density estimation: The class-conditional density for class k Assign x to a class whose value is maximal. Using Bayes rule For each class k

20 Heat Diffusion Classifiers - Connections with other models The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. In our model, let K=n-1, then the graph constructed in Step 1 will be a complete graph. The matrix H will be Heat that x p receives from the data points in class k Using the heat equation f(t)=Hf(0)

21 Heat Diffusion Classifiers - Connections with other models KNN is a special case of the NHDC. KNN  For each test data, assign it to the class that has the maximal number in its K nearest neighbors.

22 Heat Diffusion Classifiers - Connections with other models KNN is a special case of the NHDC. In our model, let β tend to infinity, then the matrix H becomes Heat that x p receives from the data points in class k The number of the cases in class q in its K nearest neighbor. Using the heat equation f(t)=Hf(0)

23 Heat Diffusion Classifiers - Connections with other models PHDC can approximate NHDC. If γis small, then Since the identity matrix has no effect on the heat distribution, PHDC and NHDC has similar classification accuracy when γ is small.

24 Heat Diffusion Classifiers - Connections with other models PHDC NHDC KNNPWA When γ is small When β is infinityWhen k=n-1

25 Heat Diffusion Ranking - motivation The Web pages are considered to be drawn from an unknown manifold. The link structure forms a directed graph, which is considered as an approximation to the unknown manifold. The heat kernel established on the Web graph is considered as the representation of relationship between Web pages. When there are more paths from page j to page i, i will receive more heat from j; When the path length from j to i is shorter, i will receive more heat form j.

26 Let V be the set of the Web pages. If there is a link from j to i, we say there is edge (j,i). The graph is a static graph. Compute the Matrix H Compute or The i-row j-column element means the amount of heat that i can receive from j from time 0 to 1, and is used to measure the similarity from j to i. If the graph is a random graph, which is generated by the first stage of the Predictive Random graph Ranking, then Compute the Matrix R Compute or Heat Diffusion Ranking - algorithm The algorithm is called DiffusionRank

27 Heat Diffusion Ranking - advantages Its solution has two forms, both of which are closed form. Its solution is not symmetric, which better models the nature of relativity of similarity. It can be naturally employed to detect group-group relation. It can be used to anti-manipulation.

28 Predictive Random Graph Ranking - motivation To improve the accuracy of DiffusionRank, we need to model the Web graph accurately—random graph. The web is dynamic The observer is partial Links are different The random graph model can also improve other ranking algorithms, and hence is called predictive random graph ranking framework.

29 Predictive Random Graph Ranking - framework Random Graph Generation Stage Engages the temporal, spatial and local link information to construct a random graph. Random Graph Ranking Stage Takes the random graph output and then calculates the ranking result based on a candidate ranking algorithm.

30 Predictive Random Graph Ranking – first stage The web is dynamic Predict the early Web structure as a random graph– Temporal Web Prediction Model The observer is partial Different Web graph G i = (V i,E i ) are obtained by N different observers (or crawlers). A random graph RG=(V,P) is constructed by n(i,j) is the number of the graphs where the link (i,j) appears. Links are different As an example, a random graph RG=(V,P) can be constructed by where j is the k(i, j)-th out-link from i

31 From the viewpoint of a crawler, the web is dynamic, and there are many dangling nodes (pages that either have no out-link or have no known out-link) Classify dangling nodes Dangling nodes of class 1 (DNC1) – those that have been found but have not been visited. Dangling nodes of class 2 (DNC2) – those that have been tried but not visited successfully. Dangling nodes of class 3 (DNC3) – those that have been visited successfully but from which no out-link is found. Predictive Random Graph Ranking – Temporal Web Prediction Model

32 Suppose that all the nodes V can be partitioned into three subsets:. denotes the set of all non-dangling nodes (that have been crawled successfully and have at least one out-link); denotes the set of all dangling nodes of class 3; denotes the set of all dangling nodes of class 1; For each node v in V, the real in-degree of v is not known. Predictive Random Graph Ranking – Temporal Web Prediction Model

33 We predict the real in-degree of v by the number of found links from C to v. Assumption: the number of found links from C to v is proportional to the real number of links from V to v. The difference between real in-degree and the predicted in- degree is distributed uniformly to the nodes in. Predictive Random Graph Ranking – Temporal Web Prediction Model

34 Predictive Random Graph Ranking – Temporal Web Prediction Model Models the missing information from unvisited nodes to nodes in V: from D 2 to V. Model the known link information as Page (1998): from C to V. Model the user’s behavior as Kamvar (2003) when facing dangling nodes of class 3: from D 1 to V. n : the number of nodes in V; m: the number of nodes in C; m 1 : the number of nodes in D 1.

35 Predictive Random Graph Ranking – second stage On a random graph RG=(V,P) DiffusionRank

36 Predictive Random Graph Ranking – second stage On a random graph RG=(V,P) PageRank Common Neighbor Jaccard’s Coeffient SimRank

37 Experiments – Heat Diffusion Classifiers 2 artificial Data sets and 6 datasets from UCI Spiral-100 Spiral-1000 Compare with Parzen window (The window function takes the normal form), KNN. The result is the average of the ten-fold cross validation.

38 Experiments - Heat Diffusion Classifiers Experimental Setup Experimental Environments  Hardware: Nix Dual Intel Xeon 2.2GHz  OS: Linux Kernel smp (RedHat 7.3)  Developing tool: C Data Description In Credit-g, the 13 discrete variables are ignored since we only consider the continuous variables. Dataset Cases ClassesVariable Spiral Spiral Credit-g100027* Diabetes76828 Glass21469 Iris15034 Sonar Vehicle846418

39 Experiments - Heat Diffusion Classifiers Parameters Setting AlgorithmNHDCPHDCKNNPWA K1/βK γK Spiral Spiral Credit-g Diabetes Glass Iris Sonar Vehicle

40 Experiments - Heat Diffusion Classifiers Results AlgorithmNHDCPHDCKNNPWA Spiral Spiral Credit-g Diabetes Glass Iris Sonar Vehicle

41 Experiments – Predictive Random Graph Ranking Data Synthetic Web Graph  Follow a power law Real Web Graph  Within cuhk.edu.hk t V(t) T(t) t V(t) T(t)

42 Experiments – Predictive Random Graph Ranking Methodology For each algorithm A, we have two versions denoted by A and PreA.  A – the original version  PreA -- the version with the Temporal Web Prediction Model For each data series and for each algorithm A, we obtain 22 ranking results: A 1, A 2, …, A 11 PreA 1, PreA 2, …, PreA 11 Compare the early results with the final result A 11.  Value Difference  Order Difference

43 Experiments – Predictive Random Graph Ranking Set Up For PageRank and PrePageRank,  α=0.85,  g is the uniform distribution For DiffusionRank and PreDiffusionRank  Use the discrete diffuse kernel  σ=1, N=20

44 Experiments – PageRank – synthetic data

45 Experiments – PageRank – real data

46 Experiments – DiffusionRank – synthetic data

47 Experiments – DiffusionRank – real data

48 Conclusions Both NHDC and PHDC outperform KNN and Parzen Window Approach in accuracy on these 8 datasets. PHDC outperforms NHDC in accuracy on these 8 datasets. DiffusionRank is another candidate of ranking algorithm. Temporal Web Prediction Model in effective in PageRank and DiffusionRank. The Predictive Random Graph Ranking framework extends the scope of some original ranking techniques.

49 Future Work Approximate the manifold more accurately. Apply the non-symmetric heat kernel to SVM. Further investigate on partial observers and weighted links.

50 Q & A