Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005

Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005
NHDC and PHDC: Non-propagating and Propagating Heat Diffusion Classifiers Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005

Outline Introduction Graph Heat Diffusion Model
NHDC and PHDC Algorithms Connections with Other Models Experiments Conclusions and Future Work This is the outline. Firstly, we give an introduction: we discuss some related work on Heat Diffusion and Manifold Learning, and show the ideas that we inherit and the ideas that we think differently. After presenting two illustrations, we talk about the Graph Heat Diffusion Model. Based on the Graph Heat Diffusion Model, we show two classifiers, namely, NHDC (Non-propagating Heat Diffusion Classifier and Propagating Heat Diffusion Classifier). In NHDC, the heat diffuses once; while in PHDC, the heat diffuses infinitely many times. Following the two classifiers, we show the connections with other models. In fact, PHDC is a generalization of NHDC while NHDC generalize both KNN and Parzen window approach when the window function take the normal form. After that, we present the experimental results and finally draw a conclusion, and show the future work.

Introduction Kondor & Lafferty (NIPS2002) Lafferty & Kondor (JMLR2005)
Construct a diffusion kernel on a graph Handle discrete attributes Apply to a large margin classifier Achieve goof performance in accuracy on 5 data sets from UCI Lafferty & Kondor (JMLR2005) Construct a diffusion kernel on a special manifold Handle continuous attributes Restrict to text classification Apply to SVM Achieve good performance in accuracy on WEbKB and Reuters Belkin & Niyogi (Neural Computation 2003) Reduce dimension by heat kernel and local distance Tenenbaum et al (Science 2000) Reduce dimension by local distance Heat diffusion a phsical phenomena. In a medium, heat always from place with high temperature to place with low temperature. Recently, it is used in constructing kernel and dimension freduction. In Nips2002, Kondao & Laferty Construct a diffusion kernel on a graph. It is used to handle discrete attributes, and is applied to a large margin classifier. It achieves goof performance in accuracy on 5 data sets from UCI. In JMLR2005, they Construct a diffusion kernel on a special manifold. It is used to handle continuous attributes, and is applied to SVM but restrict to text classification problem. It achieves good performance in accuracy on WEbKB and Reuters. In Neural Computation 2003, Belkin & Niyogi propose to reduce dimension by heat kernel and local distance. In Science2000, Tenenbaum et al reduce dimension by local distance without using heat kernel.

Introduction The ideas we inherit
The curve may better measure the distance Direct distance may not be accurate The ideas we inherit Similarity between heat diffusion and density. Heat diffuses in the same way as Gaussian density in the ideal case when the manifold is the Euclidean space. The way heat diffuses on a manifold can be understood as a generalization of the Gaussian density from Euclidean space to manifold. Local information is relatively accurate in a nonlinear manifold. Learn local information by k nearest neighbors. We inherit three ideas from literature. The first one is the similarity between heat diffusion and density. Heat diffuses in the same way as Gaussian density in the ideal case when the manifold is the Euclidean space. It is shows in the formula. Where x and y are two points, t is the time, the formula describes the amount of heat y will receive from a unit heat at point x during a time period t. Gaussian density take the similar form. Besides, we an intuition: the point where heat diffuses rapidly is one that has high density. The second one is concerned with local information. A nonlinear manifold is a complex geometry structure, we can simply imagine it as a twisted surface. The local information is relatively accurate in a nonlinear manifold. For example, in a sphere, the direct distance between two points may not the best way to measure the difference between x and y, the path along the surface may be better. When the distance between x any is small, there is litter difference between the direct distance and the curve distance. See the manifold in the picture, the direct distance may not be accurate, the curve may better measure the distance. The third one is that we also learn local information by nearest neighbors as Belkin & Niyogi and Tenenbaum et al do.

Introduction Different ideas we think: Unknown manifold in most cases.
Unknown solution for the known manifold. The explicit form of the approximation to the heat kernel in (Lafferty & Lebanon JMLR2005) is a rare case. Establish the heat diffusion equation directly on a graph that is formed by K nearest neighbors. Always have an explicit form in any case. Form a classifier by the solution directly. There are five different considerations in our paper: In most cases, the underlying manifold is unknown; Even the manifold is known, it is difficult to find the heat kernel or approximate the heat kernel. The explicit form of the approximation to the heat kernel in work in (Lafferty & Lebanon JMLR2005) is rare case. These are the two point. The third point is that we establish the heat diffusion equation directly on a graph that is formed by K nearest neighbors. The neighborhood graph is considered as an approximation of the unknown manifold. See the picture, the graph may approximate the manifold, the heat diffusion on the graph is considered as the approximation of the heat diffusion on the maniflod. The fourth point is that we always have an explicit form to the heat kernel. The last point is we construct the classifier directly instead of applying it to a large margin classifier.

Illustration This is an illustration for our classifier PHDC. There are 13 circles represent 13 data point. There two classes. The data of class dark are filled with small dark spots, the data of class blue are filled with small blue spots. When a new data comes with no label. It receives heat from its three nearest neighbors. The heat diffuse goes on from high temperature to low temperature. High temperature here is represented by more small spots. Finally, at a given time, the new data receive 7 black spots, 3 blue spots, and we classify it into black class. The first heat diffusion The second heat diffusion

Illustration This is another illustration. There are 4 data points with unknown label. Class A is lable with the circle, Class B is belled with diamond.

Illustration This is the graph constructed by the 3-nearest neighbors.

Illustration Heat received from A class: 0.018
Heat received from B class: 0.016 Heat received from A class: 0.002 Heat received from B class: 0.08 Finally, the right unknown data receive heat from A class is , receives heat from B class: 0.016, and we classified it into class A. The left unknown data receive heat from A class is , receives heat from B class: 0.08, and we classified it into class B, which is wrong labelled.

Graph Heat Diffusion Model-Notations
G=(V,E,W), a given directed weighted graph, where V={1,2,…,n}, E={(i,j): if there is an edge from i to j}, W=( w(i,j) ) is the weight matrix. f(i,t): the heat at node i at time t. M(i,j,t,Δt): amount of heat that at time t, i receives from its neighbor j during a period of Δt. Now we begin to establish the Graph Heat Diffusion Model. First we introduce some notations. G=(V,E,W) is a given directed weighted graph, where V contains n nodes from 1 to n; E is the set of edges; W=( w(i,j) ) is the weight matrix. The edge (i,j) is imagined as a pipe that connects i and j, w(i,j) is the pipe length. Let f(i,t) be the heat at node i at time t. At time t, i receives M(i,j,t,dt) amount of heat from its neighbor j during a period of dt.

Graph Heat Diffusion Model-Assumption
Suppose that M(i,j,t, Δt) is proportional to the time period dt. Suppose that M(i,j,t, Δt) is proportional to the heat difference f(j,t)-f(i,t). Moreover, the heat flows from j to i through the pipe and the heat diffuses in the pipe in the same way as it does in the Euclidean space as described before. We have there assumptions. Firstly, we suppose that M(i,j,t,dt) is proportional to the time period Δt. Secondly, suppose that M(i,j,t,dt) is proportional to the heat difference f(j,t)-f(i,t). Finally, we assume that the heat flows from j to i through the pipe, and assume that the heat diffuses in the pipe in the same way as it does in the Euclidean space as described before. Then we obatin the equation.

Graph Heat Diffusion Model-Solution
The heat difference f(i,t+dt) and f(i,t) can be expressed as: It can be expressed as a matrix form: Let dt tends to zero, the above equation becomes: We have three steps to find the solution. The heat difference at node i from time t to t+dt is equal to the heat that it receives from all its antecedents. This is shown in the first equation. Then we express it as a matrix form, as shown in the second equation. Finally, let dt tends to zero, we obtain a solution, as shown in the third equation. Where f(t) is the heat distribution at time t, f(0) is the initial heat distribution. alpha, gamma, t are parameters in the Model.

NHDC and PHDC algorithm - Step 1
[Construct neighborhood graph] Define graph G over all data points both in the training data set and in the test data set. Add edge from j to i if j is one of the K nearest neighbors of i. Set edge weight w(i,j)=d(i, j) if j is one of the K nearest neighbors of i, where d(i, j) be the Euclidean distance between point i and point j. Then we describe the two classifiers: NHDC and PHDC. They contains 4 steps. In step 1, we construct the neighborhood graph. Its details is described here: Define graph G over all data points both in the training data set and in the test data set. Add edge from j to i if j is one of the K nearest neighbors of i. Set edge weight w(i,j)=d(i, j) if j is one of the K nearest neighbors of i, where d(i, j) be the Euclidean distance between point i and point j.

[Compute the Heat Kernel] Computing H for NHDC using Computing for PHDC using the equation This is the second step. Using the equations to construct the heat kernel.

[Compute the Heat Distribution] For each class c, Set f(0) nodes labeled by class c, has an initial unit heat at time 0, all other nodes have no heat at time 0. Compute the heat distribution In PHDC, use equation to compute the heat distribution. In NHDC, use equation This is the third step. For each class c, It contains two substeps: Firstly, we set the initial heat distribution, That is, for each class c, nodes labeled by class c, has an initial unit heat at time 0, all other nodes have no heat at time 0. Secondly, use the solution to find the heat distribution at time t. For PHDC, we use the first equation. For NHDC, we use the second equation.

[Classify the nodes] By last step, we get the heat distribution for each class k, then, for each node in the test data set, classify it to the class from which it receives most heat. Finally, classify the test data to the class from which it receives most heat.

Connections with other models
The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. It is a non-parametric method for probability density estimation: For each class k The class-conditional density for class k Next we show the connections with other model. First we show the connection between Parzen window approach and NHDC. We want to show that the Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. In Parzen window approach, the probability density is estimated in the first equation. Specially, the class-conditional density for class k is estimated in the second equation. Using the Bayes rule, the conditional density p(c_k|x) is expressed in the third equation. Then assign x to the class k such that the conditional density p(c_k|x) is maximal among all the classes. Using Bayes rule Assign x to a class whose value is maximal.

The Parzen window approach (when the window function takes the normal form) is a special case of the NHDC. In our model, let K=n-1, then the graph constructed in Step 1 will be a complete graph. The matrix H will be In NHDC, let k=n-1, where n is the number of data, then matrix H in NHDC is expression in the first equation. By NHDC,x_p receives heat from the data points in class k, the amount of which is expressed in the second equation. It has similar expression as the Parzen window approach except a constant, the constant has no effect on the classifer. So we can say, when the window function takes the normal form , the Parzen window approach is a special case of the NHDC. Using the heat equation f(t)=Hf(0) Heat that xp receives from the data points in class k

KNN is a special case of the NHDC. KNN For each test data, assign it to the class that has the maximal number in its K nearest neighbors. Here we show the connection between KNN and NHDC. In KNN, assign the test data to the class that has the maximal number in its K nearest neighbors.

KNN is a special case of the NHDC. In our model, let β tend to infinity, then the matrix H becomes In NHDC, if beta tend to be infinity, then the matrix H is shown in the first equation. The amount of heat that x_p will receive from class k is shown in the second equation. It has the same classification ability as KNN because KNN classifies data according to K_q. Using the heat equation f(t)=Hf(0) The number of the cases in class q in its K nearest neighbor. Heat that xp receives from the data points in class k

PHDC can approximate NHDC. If γis small, then Since the identity matrix has no effect on the heat distribution, PHDC and NHDC has similar classification accuracy when γ is small. Here we show the connection between PHDC and NHDC. If gamma is small then we have the approximation. Since the identity matrix has no effect on the heat distribution, PHDC and NHDC has similar classification accuracy when gamma is small.

PHDC When γ is small NHDC When β is infinity When k=n-1 After the above discussion, we find that PHDC generalize both KNN and PWA, while PHDC generalize NHDC. KNN PWA

Experiments 2 artificial Data sets and 6 datasets from UCI
Spiral Spiral-1000 Compare with Parzen window (The window function takes the normal form), KNN. The result is the average of the ten-fold cross validation. For experiment, we use 2 artificial datasets and 6 datasets from UCI. The right figure shows the 1000 data points on two spirals. There are two classes, the red one and the blue one. The left figure shows 100 data points drawn from the 1000 data points in the right figure. We Compare NHDC, PHDC with Parzen window (The window function takes the normal form), and KNN. The results shown later is the average of the ten-fold cross validation.

Experiments Experimental Setup Data Description
Experimental Environments Hardware: Nix Dual Intel Xeon 2.2GHz OS: Linux Kernel smp (RedHat 7.3) Developing tool: C Data Description In Credit-g, the 13 discrete variables are ignored since we only consider the continuous variables. Dataset Cases Classes Variable Spiral-100 100 2 3 Spiral-1000 1000 Credit-g 7* Diabetes 768 8 Glass 214 6 9 Iris 150 4 Sonar 208 60 Vehicle 846 18 We show the Experimental Setup and the Data.

Experiments Parameters Setting Algorithm NHDC PHDC KNN PWA K 1/β γ
Spiral-100 8 150 0.01 7 100 Spiral-1000 5 0.10 250 Credit-g 13 11 0.02 31 50 Diabetes 33 34 0.05 300 Glass 40 1750 38 1500 0.27 3 7500 Iris 15 0.47 350 Sonar 24 1650 1200 0.41 1150 Vehicle 10 600 0.11 650 We show the parameters setting in this slide. We obtain these parameters, which can achieve the best ten-fold cross validation results. We search K between 3 and 40. We search 1/beta between 0 and at a step 50 We search gamma between 0 and 0.5 at a step 0.01

Experiments Results Algorithm NHDC PHDC KNN PWA Spiral-100 84 67 83
99.6 99.8 99.3 99.7 Credit-g 76.1 76.06 75.59 72.35 Diabetes 76.3 76.22 75.78 74.96 Glass 72.99 73.12 70.64 71.56 Iris 97.36 97.79 97.07 Sonar 88.75 89.07 82.86 88.28 Vehicle 72.90 72.93 71.41 72.45 They are the experimental results. The first column is the name of the datasets, the first row is the name of the algorithms. The results shown is the average of the ten-fold cross validation. For each dataset, the best result is shown in red. From this table, we can see PHDC performs best in most 6 datasets.

Conclusions and Future Work
Both NHDC and PHDC outperform KNN and Parzen Window Approach in accuracy on these 8 datasets. PHDC outperforms NHDC in accuracy on these 8 datasets. Future Work Approximate the manifold more accurately. Apply it to SVM. Both NHDC and PHDC outperform KNN and Parzen Window Approach in accuracy on these 8 datasets. PHDC outperforms NHDC in accuracy on these 8 datasets. However, there is still space to develop it further. For example, we need to approximate the manifold more accurately. And apply it to SVM.

Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005

Similar presentations

Presentation on theme: "Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005

Similar presentations

Presentation on theme: "Haixuan Yang, Irwin King, & Michael R. Lyu ICONIP2005"— Presentation transcript:

Similar presentations

About project

Feedback