Algorithms for drawing large graphs

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Fast Algorithms For Hierarchical Range Histogram Constructions
PCA + SVD.
Force directed graph drawing Thomas van Dijk. The problem Given a set of vertices and edges, compute positions for the vertices. If the edges don’t have.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
Algebraic MultiGrid. Algebraic MultiGrid – AMG (Brandt 1982)  General structure  Choose a subset of variables: the C-points such that every variable.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
New Techniques for Visualisation of Large and Complex Networks with Directed Edges Tim Dwyer 1 Yehuda Koren 2 1 Monash University, Victoria, Australia.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
A New Force-Directed Graph Drawing Method Based on Edge- Edge Repulsion Chun-Cheng Lin and Hsu-Chen Yen Department of Electrical Engineering, National.
Motion Analysis Slides are from RPI Registration Class.
Radial Basis Functions
Fast, Multiscale Image Segmentation: From Pixels to Semantics Ronen Basri The Weizmann Institute of Science Joint work with Achi Brandt, Meirav Galun,
Dimensionality Reduction and Embeddings
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
SIMS 247: Information Visualization and Presentation jeffrey heer
ACE:A Fast Multiscale Eigenvectors Computation for Drawing Huge Graphs Yehunda Koren Liran Carmel David Harel.
Force Directed Algorithm Adel Alshayji 4/28/2005.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Force Directed Algorithm Adel Alshayji 4/28/2005.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Normal Estimation in Point Clouds 2D/3D Shape Manipulation, 3D Printing March 13, 2013 Slides from Olga Sorkine.
Radial Basis Function Networks
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
SVD(Singular Value Decomposition) and Its Applications
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
New method to optimize Force-directed layouts of large graphs Meva DODO, Fenohery ANDRIAMANAMPISOA, Patrice TORGUET, Jean Pierre JESSEL IRIT University.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
I NTRODUCTION TO G RAPH DRAWING Fall 2010 Battista, G. D., Eades, P., Tamassia, R., and Tollis, I. G Graph Drawing: Algorithms for the Visualization.
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Feature Extraction 主講人:虞台文.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Spectral Methods for Dimensionality
Algorithms and Networks
Principal Component Analysis (PCA)
SDE: Graph Drawing Using Spectral Distance Embedding
Jianping Fan Dept of CS UNC-Charlotte
Feature space tansformation methods
Generally Discriminant Analysis
Presentation transcript:

Algorithms for drawing large graphs Yehuda Koren The Weizmann Institute of Science

Graphs A graph consists nodes and edges The nodes model entities The edge set models a binary relationship on the nodes Edges may be weighted, reflecting similarities/distances between respected nodes

Graph Drawing Find an aesthetic layout of the graph that clearly conveys its structure Technically: Assign a location for each node and a route for each edge, so that the resulting drawing is “nice” V = {1,2,3,4,5,6} E = {(1,2),(2,3),(1,4), (1,5),(3,4),(3,5), (4,5),(4,6),(5,6)} Graph drawing

Drawing conventions Orthogonal Circular Force-Directed Hierarchical Edge oriented Clustering oriented Circular Orthogonal Hierarchical Force-Directed Node oriented Hierarchy oriented Pictures from: www.tomsawyer.com We concentrate on Force-Directed graph drawing (most general)

Force-directed graph drawing The graph drawing problem is ill-defined! Which layout is nicer? I have a clear structure! I am a colorful maze! Layout by Tom Sawyer Energy: 1.77x10321 Energy: 2.23x106 An energy model is associated with the graph layouts Low energy states correspond to nice layouts …now we have a well-defined problem

Force-directed graph drawing Graph drawing = Energy minimization Hence, the drawing algorithm is an iterative optimization process Convergence to global minimum is not guaranteed! Final (nice) layout Initial (random) layout Iteration 7: Iteration 8: Iteration 9: Iteration 6: Iteration 3: Iteration 2: Iteration 5: Iteration 4: Iteration 1: Layout Energy Aesthetical properties Proximity preservation: similar nodes are drawn closely Symmetry preservation: isomorphic sub-graphs are drawn identically No external influences: “Let the graph speak for itself”

Example of F.D. method: Spring Embedder [Eades84, Fruchterman-Reingold91] Replace edges with springs (zero rest length) --- attractive forces Replace vertices with electrically charged particles, repelling each other --- repulsive forces Start with a random placement of the vertices, then just let the system go…

[Kaufmann and Wagner, 2001] “let go”

Force directed methods in 3-D Drawing by Aaron Quigley

Should I show hierarchy? ACE [Carmel,Harel,Koren’02]

Sometimes drawing edges is not important… Visualization of odorous chemicals (300 measurements) (by ACE) Preservation of the clustering decomposition Outlier detection

Outline of this talk Hall Force directed Multi-scale ACE 100 101 102 Force directed methods and large graphs Multi-scale acceleration of force directed methods Hall’s graph drawing method (a particular force-directed method) ACE: a multi-scale acceleration of Hall’s method High dimensional embedding: a new approach to graph drawing Examples and comparison Hall High Embedding Force directed Multi-scale ACE 100 101 102 103 104 105 106 No. of nodes drawn in a minute

Scaling with large graphs Traditional force-directed methods are limited to a few hundred nodes Problems when drawing large graphs: Visualization issue: not enough drawing area Cures: dynamic navigation, clustering, fish-eye view, hyperbolic space,… Algorithmic issue: convergence to a nice layout is too slow We concentrate on the algorithmic issue, i.e., the computational complexity (mainly time).

Force-directed methods: complexity Complexity per single iteration is O(n2) Energy contains at least one term for each node pair (repulsive forces) Estimated number of iterations to convergence is O(n) Overall time complexity is ~ O(n3) Force directed methods do not scale up well to large graphs! A particularly interesting approach: Multi-scale graph drawing [Hadany-Harel 99, Harel-Koren 00] also: [Walshaw 00, Gajer-Goodrich-Kobourov 00]

Multi-Scale Graph Aesthetics A graph should be “nice” on all scales Large scale aesthetics refer to phenomena related to large areas of the picture, disregarding its micro structure Local aesthetics are limited to small areas of the drawing

A globally nice layout, or, maybe, a nice layout of coarse graph ?? Globally nice layout: vertices are allowed to deviate from their location in a nice layout only by a limited amount – express large scale aesthetics A globally nice layout can be generated from a nice layout by putting closely drawn vertices at the same location, thus coarsening the graph A globally nice layout, or, maybe, a nice layout of coarse graph ?? Both!!! A nice layout

Multi scale graph drawing Multi-scale representation of the graph: a series of coarse graphs that approximate the original graph Layout of a coarser graph is used as an initial layout for the finer graph Gain no. 1: Convergence within few iterations (<<O(n)) Global characteristics of the drawing were already determined in coarser graphs Only local refinement is needed We neglect long distance forces  Gain no. 2: fast execution of a single iteration (<<O(n2)) coarsen coarsen coarsen 1275 nodes 425 nodes 145 nodes 50 nodes extend extend extend

Choose edges to contract Coarsening Goal: reduce size of the graph while keeping its crucial structure Several possibilities in practice… A candidate is: Edge contraction Fine graph Choose edges to contract Coarse graph Contract edges

Properties of multi-scale F.D. graph drawing Running times are significantly improved: 104-node graphs are drawn in a around 1 minute Ability to converge to true global minimum is improved Convergence to global minimum is still not guaranteed

Hall’s model [K.M. Hall, 1970] The optimal layout minimizes: Euclidean distance between i and j (Weighted sum of squared edge lengths) Weight of edge (i,j) Heavier edges are shorter Subject to the constraints: Variance of the drawing is fixed – a global repulsive force All axes have equal variance Axes of the drawing are uncorrelated Balanced aspect ratio Complexity of Hall’s energy is linear (O(|E|)), compared with quadratic complexity (O(n2)) of traditional models

Advantages of Hall’s model Linear time for a single iteration of optimization process The global optimizer can be efficiently computed! Hall’s model facilitates a rigorous multi-scale process We need to define the Laplacian…

Laplacian Given a weighted graph with n nodes, with the wij being the weights The Laplacian of the graph is the matrix L, where:

Properties of the Laplacian: A symmetric matrix Sum of each row is 0 All eigenvalues are non-negative Zero eigenvalue with associated eigenvector (1,1,…,1)

Optimizer of Hall’s model For simplicity we assume a 2-D drawing The coordinates of node i, (xi ,yi ), are determined by two vectors: Claim The optimal layout of Hall’s model satisfies: is the eigenvector of the Laplacian with the smallest positive eigenvalue is the eigenvector of the Laplacian with the second smallest positive eigenvalue To draw the graph, we have to compute low eigenvectors of the Laplacian

The ACE Algorithm (joint work with L. Carmel and D. Harel) Regular eigen-solvers encounter real difficulties with 105-node graphs We propose a multi scale algorithm for computing low eigenvectors of the Laplacian: ACE – Algebraic Multigrid Computation of Eigenvectors Two orders of magnitude improvement over past multi-scale / force-directed methods

ACE algorithm Input: A graph with n nodes The graph is represented by its Laplacian, L If n is small enough: compute the low eigenvectors of L directly Otherwise…

ACE algorithm Input: A graph with n nodes Construct an interpolation operator: What is this ?? The interpolation operator is a way to derive a drawing of n nodes from a drawing of m nodes (m<n) Coarse drawing Fine drawing

ACE algorithm Input: A graph with n nodes Construct an interpolation operator: Create coarse graph of m nodes Typically, m = n / 2 More details later…

ACE algorithm Input: A graph with n nodes Construct an interpolation operator: Create coarse graph of m nodes Recursively, build layout of the coarse graph: Interpolate, yielding a layout of the fine graph: Final drawing is: Refine Smart initialization Refine using iterative solvers (Power-Iteration, RQI) that benefit from the smart initialization

interpolation operator How to coarsen The key component is the interpolation operator High quality interpolation operator Criteria for choosing interpolation operator: Interpolated drawings of high quality Fast interpolation In practice, interpolation operator is an matrix

optimal interpolated solution optimal coarse solution How to coarsen same costs optimal interpolated solution optimal coarse solution Important requirement: cost of coarse drawing = cost of its interpolated fine drawing Solution of coarse problem is the optimal drawing in a subspace of fine problem Achieved using a careful construction of coarse graph In practice, coarse graph is constructed using the interpolation operator, matrix multiplication and a “mass matrix”

Aesthetical properties of results Quality of results depends on the appropriateness of Hall's model Hall's model is distinguished by its simple form and also by its convergence to a global minimum For many graphs, traditional force directed methods will provide better drawings (e.g., trees) Preservation of global structure Excellent expression of symmetries

Results (4elt, |V | = 15606, |E| = 45878) Each node is placed around the weighted center of its neighbors Dense areas ACE Multi-scale f.d.

Results (Dwa512, |V | = 512, |E| = 1004) Shows the clustering structure of the drawing Symmetry preservation ACE Multi-scale f.d.

Guidelines for multi-scale graph drawing Define formally what is a nice graph Spring embedder, MDS, Hall,… Choose an optimization method Gradient descent, Gauss-Seidel, Simulated annealing Construct a method for coarsening and interpolation Optimize layout on multi scales

Graph Drawing by High-Dimensional Embedding (Joint work with D. Harel) A new approach: Graph Drawing by High-Dimensional Embedding (Joint work with D. Harel)

A New Approach to Graph-Drawing First stage: Embed the graph in a very high dimension (e.g., 50-D). Utilize the flexibility of the high dimension to simplify the layout creation Second stage: Project the graph onto the 2-D plane using PCA, a well known mathematical process

Advantages Running time is linear in the graph size. In practice, comparable to ACE. No iterative optimization process; insensitive to “initial placement” Simple implementation Side effect: provides excellent means for interactive exploration of large graphs 105-node graphs are drawn in 2-3 sec 106-node graphs are drawn in < 1 min

Embedding the Graph in a High Dimension First Stage: Embedding the Graph in a High Dimension

Choose m pivot nodes, uniformly distributed on the graph: Here, m=50, (this is a typical number, independent of |V|) 33x33 grid (1089 nodes)

How to Choose m Pivots “Uniformly” ? Choose first pivot, p1 , at random The i –th pivot, pi , is the node furthest a way from the already chosen pivots: {p1, p2, … , pi-1} This is a known 2-approximation to the k- Center problem

The m Dimensional Drawing Draw the graph in m dimensions by associating each axis with a pivot node Axis i shows the graph from the “viewpoint” of pi , the i –th pivot node 1 2 3 d node pi pi’s neighbors nodes whose graph-theoretic distance from pi is d The i-th axis: Thus, the i –th coordinate of node v is the graph-theoretic distance between v and pi

Projecting Onto a Low Dimension Second Stage: Projecting Onto a Low Dimension

Principal Components Analysis (PCA) A fast and straightforward procedure taken from multivariate analysis Data is projected in a way that maximizes its variance  minimize information loss Very useful for finding the “best viewpoint” for projecting the drawing

Demonstration of PCA First Principal Component

Results (Crack, |V | = 10240, |E| = 30380) ACE Multi-scale f.d. High Dim. Embedding

Zooming-in on Regions of Interest Change viewpoint for exploring local regions, by performing PCA on selected portion of the graph Reveal new properties that are hidden in the full drawing!!

High Dimensional Embedding Multi-scale force-directed ACE Multi-scale force-directed 106 nodes/minute 104 nodes/minute Running time in practice O(|V|+|E|) Convergence depends on graph’s structure Time complexity High Drawing quality Optimal up to randomization Optimal May converge to poor local min Drawing robustness Essentially same running time High dimensionality Available Zoom-in Good Excellent Symmetry Essentially balanced No guarantee Aspect ratio Impossible Difficult Trees Moderate Increases running time Not available No winner!!

The End