Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

Neural networks Introduction Fitting neural networks

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

CS590M 2008 Fall: Paper Presentation

Advanced topics.

Analysis and Modeling of Social Networks Foudalis Ilias.

Robust Multi-Kernel Classification of Uncertain and Imbalanced Data

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

GAUSSIAN PROCESS REGRESSION FORECASTING OF COMPUTER NETWORK PERFORMANCE CHARACTERISTICS 1 Departments of Computer Science and Mathematics, 2 Department.

Image Denoising and Inpainting with Deep Neural Networks Junyuan Xie, Linli Xu, Enhong Chen School of Computer Science and Technology University of Science.

Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Machine Learning CS 165B Spring 2012

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Graph Theory in Computer Science

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

AAAI 2011, San Francisco Trajectory Regression on Road Networks Tsuyoshi Idé (IBM Research – Tokyo) Masashi Sugiyama (Tokyo Institute of Technology)

KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.

COMP24111: Machine Learning Ensemble Models Gavin Brown

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Collaborative Deep Learning for Recommender Systems

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Semi-Supervised Clustering

Deep Feedforward Networks

Gyan Ranjan University of Minnesota, MN

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Matt Gormley Lecture 16 October 24, 2016

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

COMP61011 : Machine Learning Ensemble Models

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

Community detection in graphs

Hyper-parameter tuning for graph kernels via Multiple Kernel Learning

Vincent Granville, Ph.D. Co-Founder, DSC

Hybrid computing using a neural network with dynamic external memory

Department of Electrical and Computer Engineering

SDE: Graph Drawing Using Spectral Distance Embedding

Scale-Space Representation of 3D Models and Topological Matching

Estimating Networks With Jumps

Fine-Grained Complexity Analysis of Improving Traveling Salesman Tours

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Representation Learning with Deep Auto-Encoder

Tuning CNN: Tips & Tricks

Asymmetric Transitivity Preserving Graph Embedding

Scalable and accurate deep learning with electronic health

Neural networks (3) Regularization Autoencoder

Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.

Semi-Supervised Learning

2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.

Peng Cui Tsinghua University

Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.

Do Better ImageNet Models Transfer Better?

Presentation transcript:

Network Classification Using Adjacency Matrix Embeddings and Deep Learning Ke (Kevin) Wu1,2, Philip Watters1, Malik Magdon-Ismail1 1Department of Computer Science Rensselaer Polytechnic Institute Troy, New York 12180 2Quantcast Corporation 201 3rd Street, #2 San Francisco, California, 94103 kevinwu.work@yahoo.com magdon@gmail.com

A Natural Problem—Graph Classification Given a small piece of a large parent network, is it possible to identify the parent network? To what scale are different types of networks distinguishable? (or “Are all social networks structurally similar?” Hashmi et al. ASONAM 2012) What is the optimal method and features to do the classification?

Problem Formulation

Data Obtained five real world network graphs from the Stanford Network Analysis Project (SNAP) The five graph domains we focused on were Social Networks (Facebook), Citation Networks (HEP-PH), Web graphs, Road networks (PA roadNet), and Wikipedia networks (wikipedia) Sub-network size, obtained by random walk sampling

Examples (32 nodes) Citation Facebook Web Wikipedia Road Net

Graph Kernel vs Topological Features Graph kernels are widely used in graph classification, computation time for popular kernel function is from O(d^3) to O(d^6), d is the number of nodes(vertices). O(n^2) for training using kernel method. Feature-based method may have better scalability But topological features requires extra effort to design

Classic Topological Features

Classic Topological Features Logistic regression (LR) and random forest (RF) on each data set using classic features For both methods, we used a fixed set of hyperparameters (For LR, the regularization constant was 1.0 and for RF, 500 trees were used after a convergence test on the number of trees)

“Naïve Feature”—Adjacency Matrix Adjacency matrix contains all the information related to the network Contrary to topological features, which provides the “lossy” description of the object (network), adjacency matrix provides a complex but “lossless” description. It looks just like a picture. But, there are d! different adjacency matrices for a network. And networks may have different sizes

Adjacency matrices

BFS-based Ordering Scheme To form “better patterns” in an adjacency matrix Starts with the node with the highest degree, tie broken by the node with the largest k-neighborhood Once this node is decided, the next node in the ordering is the node with the shortest path to the first already ordered node, tie broken by the shortest path to the second already ordered node, and so on.

Properties of the Ordering Algorithm Nodes with the same parent must be adjacent in the adjacent matrix. Parent P and its first child C are separated in the adjacency matrix by a bounded range of [DPP , DPP + DP_cousin], ordered by DP , where DP , DPP and DP cousin are the degrees of P , P ’s parent and P ’s cousins.

Ordered Adjacency Matrices

Variable Sized Networks Topological features are to some extent scale insensitive, but there are no machine learning method so far can deal with different input dimension well.

Real World Networks

Deep Learning Stacked Denoising Autoencoder Use the corrupted input to reconstruct the original one Pretraining to provide better initial weights and fine- tuning for specialization

Autoencoder Developed by Bengio(2007) and Vincent(2008). Successfully used for pre-train deep learning and hierarchical feature learning Neural Network Autoencoder

Denoising Autoencoder Developed in 2008 by Vincent et al. Successfully used for deep learning and hierarchical feature learning Neural Network Denoising autoencoder

Result With no corruption rate, the performance of deep learning with adjacency matrix is the highest Deep learning performs better than classical methods when (1) networks are small (2) different sizes of networks are combined. Method Feature 8x8 16x16 32x32 16&32 (padding) 16&32 (resizing) 16&32 (Combined) DL(0.0) Adjmat 0.557 0.735 0.820 0.804 0.796 DL(0.2) 0.527 0.728 0.800 0.793 0.799 0.801 DL(0.5) 0.540 0.718 0.823 0.789 0.802 LR 0.542 0.705 0.780 0.771 0.768 0.798 RF 0.518 0.698 0.765 0.758 Classic 0.548 0.706 0.830 0.753 0.530 0.726 0.855

Performance Plateau Designing topological descriptors requires extra effort, and it is an “endless” process. New features are needed, but what are they?

Is this ordering better than others?

Conclusion We proposed a novel image embedding of adjacency matrices which can accommodate different sized graphs through padding or resizing (or a combination) Our results indicate that the classical feature approach, rich with domain expertise and the plug and play approach which uses our image embedding of the topology together with deep learning can perform comparably This is extremely promising for the application of our image embedding to network domains where domain expert features may not be available.

Future Directions Better ordering? Theoretic support on what kind(s) of ordering algorithm is/are better. Other applications (drug discovery/cheminformatics) Advanced deep learning algorithm A better denoising mechanism Corrupt the edge or corrupt the node? …

8x8 RF, classic feature C: citation F: facebook R: roadnet W: web P: wikipedia Predicted C F R W P 34 11 21 23 62 2 19 6 9 1 87 3 16 17 4 53 25 10 35 True

32x32 LR, classic feature C: citation F: facebook R: roadnet W: web P: wikipedia Predicted C F R W P 72 2 1 23 95 3 100 11 12 69 7 13 4 83 True

Thank you

8x8 LR, classic feature C: citation F: facebook R: roadnet W: web P: wikipedia A total of 1044 distinguishable graphs with 8 unlabeled nodes http://oeis.org/A000088 Predicted C F R W P 40 10 24 16 7 68 4 5 9 1 89 13 30 38 14 23 6 41 True

32x32 RF, classic feature C: citation F: facebook R: roadnet W: web P: wikipedia Predicted C F R W P 68 3 2 5 22 1 96 100 8 4 82 11 85 True

Important Features Even though each feature is designed with a mathematic definition, for problem with this level of complexity, it is still quite hard to interpret the result.

Adjacency Matrices for Real-World Networks