Visual Analysis of Large Graphs Using (X, Y)-clustering and Hybrid Visualizations V. Batagelj, W. Didimo, G. Liotta, P. Palladino, M. Patrignani (Univ.

Slides:



Advertisements
Similar presentations
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Advertisements

Graph Visualization and Navigation in Information Visualization: A Survey Ivan Herman, Guy Melaneon, M. Scott Marshall.
Orthogonal Drawing Kees Visser. Overview  Introduction  Orthogonal representation  Flow network  Bend optimal drawing.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Label Placement and graph drawing Imo Lieberwerth.
Information Networks Graph Clustering Lecture 14.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
On the complexity of orthogonal compaction maurizio patrignani univ. rome III.
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
Los Angeles September 27, 2006 MOBICOM Localization in Sparse Networks using Sweeps D. K. Goldenberg P. Bihler M. Cao J. Fang B. D. O. Anderson.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
The Analysis and Design of Approximation Algorithms for the Maximum Induced Planar Subgraph Problem Kerri Morgan Supervisor: Dr. G. Farr.
Lecture 6 Image Segmentation
Graph & BFS.
SIMS 247: Information Visualization and Presentation jeffrey heer
The Rectilinear Steiner Arborescence Problem is NP-Complete
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
A scalable multilevel algorithm for community structure detection
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
COMP4048 Incremental Graph Drawing Richard Webber National ICT Australia.
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
IAT Graphs ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]
V. V. Vazirani. Approximation Algorithms Chapters 3 & 22
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Lecture 12: Network Visualization Slides are modified from Lada Adamic, Adam Perer, Ben Shneiderman, and Aleks Aris.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
1 Smashing Peacocks Further: Drawing Quasi-Trees from Biconnected Components Daniel Archambault and Tamara Munzner, University of British Columbia David.
Trees and Distance. 2.1 Basic properties Acyclic : a graph with no cycle Forest : acyclic graph Tree : connected acyclic graph Leaf : a vertex of degree.
Fan-planar Graphs: Combinatorial Properties and Complexity results Carla Binucci, Emilio Di Giacomo, Walter Didimo, Fabrizio Montecchiani, Maurizio Patrignani,
Pajek – Program for Large Network Analysis Vladimir Batagelj and Andrej Mrvar.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Mathematics of Networks (Cont)
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Data Structures & Algorithms Graphs
Graph Visualization and Beyond … Anne Denton, April 4, 2003 Including material from a paper by Ivan Herman, Guy Melançon, and M. Scott Marshall.
Marina Drosou, Evaggelia Pitoura Computer Science Department
1/25 Visualizing Social Networks Ryan Yee. 2/25 Plan Introduction and terminology Vizster NodeTrix MatLink Applications to Multi-agent systems.
Complexity results for three-dimensional orthogonal graph drawing maurizio patrignani third university of rome graph drawing dagstuhl
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Role of Rigid Components in Protein Structure Pramod Abraham Kurian.
Visualizing LiveNet with ENCCON Model Quang Vinh Nguyen Computer Systems Department Faculty of Information Technology University of Technology, Sydney.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Computational Molecular Biology Non-unique Probe Selection via Group Testing.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
CSC 413/513: Intro to Algorithms
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Chapter 9: Graphs.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Kernel Bounds for Path and Cycle Problems Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch September 8 th 2011, Saarbrucken.
BY: Mark Gruszecki.  What is a Recursive Query?  Definition(s) and Algorithm(s)  Optimization Techniques  Practical Issues  Impact of each Optimization.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
Computing NodeTrix Representations of Clustered Graphs
Cohesive Subgraph Computation over Large Graphs
Design and Analysis of Algorithm
IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS
CS120 Graphs.
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Problem Solving 4.
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Minimum Spanning Trees
Presentation transcript:

Visual Analysis of Large Graphs Using (X, Y)-clustering and Hybrid Visualizations V. Batagelj, W. Didimo, G. Liotta, P. Palladino, M. Patrignani (Univ. Ljubljana, Univ. Perugia, Univ. Roma Tre) In Proc. IEEE Pacific Visualization 2010

Outline The problem of visualizing large graphs State of the art Our contribution Conclusions and open problems

The problem of visualizing large graphs Some major issues in the visualization of large graphs: Readability: optimization of aesthetic criteria Scalability: fast computation Visual complexity: interaction tools that allow users to limit the amount of information displayed on the screen — overview of the graph — details on demand — user’s mental map preservation

State of the art Readability: there are many effective algorithms that are computationally fast for relatively small and sparse graphs (see the graph drawing book of Di Battista, Eades, Tamassia, Tollis, 1999)

State of the art Scalability: there are some fast graph drawing algorithms based on physical or algebraic models; the drawings have high visual complexity and do not allow detailed views (see the survey of Hacul and Jünger, 2007)

State of the art Visual complexity: draw the whole graph and then interact with it; ex. focus+context techniques, like fisheye view or hyperbolic layouts; conceived for tree-like graphs (see the survey of Herman, Melançon, Marshall, 2000)

State of the art Interactive approaches for visualizing and exploring large graphs: – graph visualized incrementally or at different levels of details – strong interaction between the user and the drawing

Interactive Approaches Bottom-up strategies: the graph is visualized a piece at a time —topological window moving through canvas (Eades et al.,1997) —Limits: no overview, the user’s mental map preservation is difficult

Interactive Approaches Bottom-up strategies: the graph is visualized a piece at a time —incremental enhancement of the drawing (ex. Carmignani et al., 2002) —Limits: no overview, the user’s mental map preservation is difficult without readability degradation

Interactive Approaches Top-down approaches

Interactive Approaches Top-down approaches – the graph is clustered (vertices are grouped together)

Interactive Approaches Top-down approaches – the graph is clustered (vertices are grouped together) – a simplified view is shown (overview)

Interactive Approaches Top-down approaches – the graph is clustered (vertices are grouped together) – a simplified view is shown (overview) – the user interactively explores the clusters (detailed views)

Interactive Approaches Top-down strategies – the graph is clustered (vertices are grouped together) – a simplified view is shown – the user interactively explores the clusters Limits – someone/something has to define clustering rules – existing clustering algorithms do not guarantee properties on the graph of clusters

Our contribution A top-down approach with these ingredients: – a new clustering framework – new clustering algorithm within the framework – hybrid visualizations A system: VHyXY Some case studies

Basic Terminology: Clustering G=(V, E): graph with vertex set V and edge set E A cluster of G=(V, E) is a subset of V A clustering C of G is a set of disjoint clusters of G

Basic Terminology: Clustering The graph of clusters H(G, C) is the graph obtained by collapsing each cluster of C into a single vertex and by replacing multiple edges with a single one

Basic Terminology: Clustering The graph of clusters H(G, C) is the graph obtained by collapsing each cluster of C into a single vertex and by replacing multiple edges with a single one

A new clustering framework Clustering algorithms usually detect groups of highly connected vertices without taking care of the graph of clusters We adopt a new framework for the design of automatic clustering algorithms that guarantee: – desired properties for the clusters – desired properties for the graph of clusters

The (X,Y)-clustering X and Y are two classes of graphs with certain properties G is called an (X,Y)-graph if there exists a clustering of G such that: – each cluster induces a subgraph that belongs to Y – the graph of clusters belongs to X

(X,Y)-graph example Let X be the class of cycles and let Y be the class of K 4

(X,Y)-graph example Let X be the class of cycles and let Y be the class of K 4

(X,Y)-graph example The graph is a (cycle,K 4 )-graph Let X be the class of cycles and let Y be the class of K 4

Interesting combinations X is some class of sparse graphs: – planar graphs, cycles, trees, paths, … Y is some class of highly connected graphs: – cliques, subgraphs with high-degree vertices, … One can think of using different visual paradigms and algorithms for drawing the graph of clusters and the subgraph induced by each cluster (hybrid visualization)

Remark on (X,Y)-clustering (X, Y)-clustering was previously defined by Brandenburg (GD 1997), but his model requires that every vertex belongs to some cluster Our model does not have this requirement, which poses severe practical limitations

The (X,Y)-clustering problem Problem: Given a graph G and two desired classes X and Y, is G an (X,Y)-graph? NP This problem is NP-hard in general NP Theorem: Deciding whether G is a (planar, k-clique)-graph for desired k ≥ 5 is NP-hard This result motivates us to look for some relaxation of cliques

K-core components The subgraph induced by a cluster is a k-core component if it is a maximal connected subgraph such that every vertex has degree at least k 5-core component 4-core component

(Planar, K-core component)-graphs We investigate (X,Y)-graphs G such that: – X is the class of planar graphs – Y is the class of k-core components of G In particular, for a given k > 0, one can ask whether G is a (planar, k-core component)-graph – this decision problem can be solved in polynomial time – we give a polynomial-time algorithm that finds the maximum k for which G is a (planar, k-core component)- graph, and that computes the corresponding clustering

Properties of (planar, k-core component)-graphs The union of all k-core components of G is called the k-core of G (the k-core of G, if it exists, is unique) Property. If G has the k-core G k (for some k ≥ 1), then G has the (k−1)-core G (k−1) and G k ⊆ G (k−1) Lemma. If G is a (planar, k-core component)- graph then it is a (planar, (k−1)-core component)- graph

Proof of the lemma

V1V1 V2V2

u(V 1 ) H(G, C) u(V 2 ) V1V1 V2V2

Proof of the lemma H(G, C) u(V 1 ) u(V 2 ) V1’V1’ V2’V2’

Proof of the lemma u(V 1 ’) H(G, C’) H(G, C) u(V 2 ’) u(V 1 ) u(V 2 ) V1’V1’ V2’V2’

Proof of the lemma H(G, C) u(V 1 ) u(V 2 ) V1’V1’ V2’V2’ u(V 1 ’) u(V 2 ’) H(G, C’)

Clustering Algorithm Theorem: Let G be a graph with n vertices and m edges. There exists an O((n+m)log n)-time algorithm that computes the maximum k for which G is a (planar, k-core component)-graph, and the corresponding clustering Steps of the algorithm: 1.Compute core-numbers for the vertices 2.Perform a binary search on core-numbers 3.For each graph of clusters, test its planarity

Algorithm animation Compute the core number of each vertex, i.e., the maximum k for which there exists a k-core that contains the vertex

Algorithm animation Compute the core number of each vertex, i.e., the maximum k for which there exists a k-core that contains the vertex

Algorithm animation

Hybrid Visualizations The (X, Y)-clustering technique can be used to design hybrid visualizations – combination of different drawing conventions for different parts of the graph – Example: node-link representation for sparse subgraphs matrix-based representation for dense subgraphs – Highly readable drawings for the graph of clusters (which is always planar)

Matrix based representation Matrix-based representation – vertices are rows and columns – edges are cells The ordering of vertices in rows/columns may strongly affect the number of crossings in the drawing

Crossings minimization heuristic vertex1 vertex2 vertex3 vertex4 vertex5 vertex6 vertex7 vertex8 vertex10 vertex11 vertex12 vertex13 vertex14 vertex15 vertex16 vertex17 vertex18 vertex19 vertex10 vertex20

Crossings minimization heuristic vertex1 vertex2 vertex3 vertex4 vertex5 vertex6 vertex7 vertex8 vertex10 vertex11 vertex12 vertex13 vertex14 vertex15 vertex16 vertex17 vertex18 vertex19 vertex10 vertex20

Crossings minimization heuristic vertex1 vertex2 vertex3 vertex4 vertex5 vertex6 vertex7 vertex8 vertex10 vertex11 vertex12 vertex13 vertex14 vertex15 vertex16 vertex17 vertex18 vertex19 vertex10 vertex20

vertex12 vertex13 vertex14 vertex15 Crossings minimization heuristic vertex1 vertex2 vertex3 vertex4 vertex5 vertex6 vertex7 vertex8 vertex10 vertex11 vertex16 vertex17 vertex18 vertex19 vertex10 vertex20

Remark about hybrid visualizations A hybrid visualization that combines node-link and matrix-based representations was previously used in the literature (Henry et al., NodeTrix) Clusters are manually defined – no automatic clustering – no automatic ordering for rows-columns

The System VHyXY VHyXY VHyXY integrates the clustering algorithm and hybrid visualizations – X-class chooser (e.g., planar, forest) – Y-class chooser (e.g., k-core component) – Filters on edge weights – Specific drawing algorithms for each component

User interface

Case Study: Co-authorship networks DBLP DBLP: on-line database of publications in Computer Science VHyXY VHyXY allows user to query DBLP on a specific topic – It retrieves data about all papers on that topic (looking at the title of the papers) – It builds a network where authors are vertices there is an edge between two authors if they share a paper (edge’s weight = number of papers)

Co-authorship network for “orthogonal drawing”

Hybrid visualizations: a matrix and a circular in an orthogonal layout

Hybrid visualizations: a matrix and a circular inside an orthogonal

Larger network for “graph drawing” 114 vertices and 494 edges

Same network with edge filtering (weight > 2)

Clustering algorithm performance Index nameValue (0-1) Graph clustering0.62 Coverage0.56 Clustering performance 0.94 Clustering error0.06 Graph clustering – Property of a graph: the higher the value the better can be the clustering Coverage – How the computed clusters covers edges of the whole graph Performance – Counts the number of “correctly interpreted pairs of nodes” in a graph Error – 1-performance [Brandes et al. “Engineering graph clustering: Models and experimental evaluation” ACM Journal of Experimental Algorithmics 2007] Index nameValue (0-1) Graph clustering0.64 Coverage0.37 Clustering performance Clustering error

Open problems Explore additional X-classes or Y-classes for which polynomial-time clustering algorithms exist – X: forest, path, outerplanar, … – Y: relaxations of cliques, … Extend our techniques to – multi-level clustering (hierarchical clustering) – overlapping clusters Experiment the system on a larger set of application domains – biological networks, criminal networks, …