Graph Analysis by Persistent Homology

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Introduction to Bioinformatics
Taming Dynamic and Selfish Peers “Peer-to-Peer Systems and Applications” Dagstuhl Seminar March 26th-29th, 2006 Stefan Schmid Distributed Computing Group.
Novel Self-Configurable Positioning Technique for Multihop Wireless Networks Authors : Hongyi Wu Chong Wang Nian-Feng Tzeng IEEE/ACM TRANSACTIONS ON NETWORKING,
To install the TDA package on a PC: install.packages("TDA") To install the TDA package on a Mac: install.packages("TDA", type = "source") XX = circleUnif(30)
Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004.
Distance Indexing on Road Networks A summary Andrew Chiang CS 4440.
Clustering Unsupervised learning Generating “classes”
A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Complex Network Analysis of the Washoe County Water Distribution System Presentation By: Eric Klukovich Date: 11/13/2014.
All that remains is to connect the edges in the variable-setters to the appropriate clause-checkers in the way that we require. This is done by the convey.
Graph-based Segmentation. Main Ideas Convert image into a graph Vertices for the pixels Vertices for the pixels Edges between the pixels Edges between.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Science: Graph theory and networks Dr Andy Evans.
Fundamental Data Structures and Algorithms (Spring ’05) Recitation Notes: Graphs Slides prepared by Uri Dekel, Based on recitation.
CS654: Digital Image Analysis
Clustering.
Creating a simplicial complex Step 0.) Start by adding 0-dimensional vertices (0-simplices)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Slides are modified from Lada Adamic
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Graph-based Segmentation
Network (graph) Models
Spectral Methods for Dimensionality
Link-Level Internet Structures
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Deep Feedforward Networks
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Creating a cell complex = CW complex
Data Mining, Neural Network and Genetic Programming
Project 1: hybrid images
Instance Based Learning
Dec 4, 2013: Hippocampal spatial map formation
Haim Kaplan and Uri Zwick
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Face detection using Random projections
K Nearest Neighbor Classification
CS223 Advanced Data Structures and Algorithms
Clustering Via Persistent Homology
Last lecture Configuration Space Free-Space and C-Space Obstacles
Discrete Mathematics for Computer Science
Apache Spark & Complex Network
Instructor: Shengyu Zhang
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Graph Operations And Representation
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Katz Centrality (directed graphs).
An Algorithm for Bayesian Network Construction from Data
On Convolutional Neural Network
Ying Dai Faculty of software and information science,
CSSE463: Image Recognition Day 18
NonLinear Dimensionality Reduction or Unfolding Manifolds
Linear Discrimination
Chapter 5: Morse functions and function-induced persistence
Sanguthevar Rajasekaran University of Connecticut
Chapter 9 Graph algorithms
Introduction to Machine Learning
Presentation transcript:

Graph Analysis by Persistent Homology Dingkang Wang

Backgroud Objectives Preprocessing Distance Metrics Simplices Extraction & Comparison Results

Background What is graph analysis? Commonly used methods: Characterize structures of a large graph in terms of nodes and edges. Make it possible to compare two large graphs. Commonly used methods: Degree distribution Diameter, shortest path distance distribution Community structures In my project: Persistent Homology Data Analysis or Network Analysis is a big area of large research interest, previously, most of the network analysis is based on traditional metrics, e.g., the degree distribution, graph diameter and so on. However, these metrics are vulnerable to noise data points, and may not be able to reveal the hidden dierence. Topological analysis by persistence homology is a new excellent tool for network analysis, although most of the time, it's used for analyzing data embedded in Euclidean space, however, for node-link network, we can still make use of it after some distance calculation. This method is stable, which means a little perturbation won't make a signicant eect on the output diagram, in addition, it can nd out the topological features of a graph, which cannot be shown through other methods.

Objectives Part I Part II Characterize the structure of graphs in different categories. Part II Find out the “interesting” year in senate voting graphs. stanford network analysis project Slashdot Social network Arxiv High Energy Physics paper citation network Gnutella peer to peer network from August 31 2002

Preprocessing Denoising by Jaccard Index: Landmark Sampling when input graph is too large: Pick the first landmark randomly Pick the next landmark, which is farthest from the chosen landmarks until you get enough landmarks Other nodes will be assigned to one of the landmarks based on the distance Two landmarks will have connection while their communities have connections In my project, I defined the Jaccard Index of an edge (u; v), or equivalently, for two nodes u; v with a connection as follows: where N(u) represents the neighbor nodes of node u. Now we can set a threshold, when the Jaccard Index of an edge is less than the threshold, we can consider it as a noise edge and remove that edge. Intuitively, that means the similarity between the two endpoints are too low, so there should not be an edge between these two nodes.

Distance Metrics Shortest Path Distance Diffusion Distance Easier to calculate More sensitive Lack of variety Diffusion Distance Step 1 Step 2 Step 3 we define a pairwise similarity matrix between points, for example using a Gaussian kernel with width σ2 a diagonal normalization matrix, make it a transition probability matrix Calculate the distance with the specific choice w(y)=1/φ0(y) for the weight function, which takes into account the (empirical) local density of the points, and puts more weight on low density points. This distance is robust to noise, since the distance between two points depends on all possible paths of length  between the points. Parameters are hard to choose, so we try different parameters and find a suitable one.

Simplices Extraction & Diagram Comparison Using Rips Complex Using Phat to generate persistent diagrams Using two different metrics, bottleneck distance and Wasserstein distance In this step, we will only extract simplicial complexes up to 3-dimension, i.e, we only focus on nodes, edges, triangles and tetrahedrons, which is due to the limit of time complexity. Also, we need to include the birth date of these complexes, more specically, we use Rips complex. For Rips complex, we add a k-simplex when its boundary simplices are all in, e.g, there are 3 nodes, a, b, c, and their corresponding edges ab, bc, ac have weight 4; 5; 6, we add these edges at time 4; 5; and 6. Once you all all the edges, the triangle abc will be added, so triangle abc will be in at time 6. First the bottleneck distance used in my project can be dened as follows: In other words, it will nd out a bijection between two node set, mapping x in X to (x) in Y, and try to minimize the maximum of the pairwise distances. When number of two node For wasserstein distance, which is more stable than the bottleneck distance, can be dened as follows: We can see that the wasserstein distance uses the sum instead of the maximum of all pairwise distances, so in some sense, it will be less sensitive to the outliners.

Results

Results

Results

Thank you! Any questions?