Visualization of Graph Data CS 4390/5390 Data Visualization Shirley Moore, Instructor October 6,
Graphs and Trees graph – a set of nodes (vertices) connected by links (edges) Links can be directed or undirected. Two nodes are adjacent if they are connected by a link. Two edges are adjacent if they share a common node. Nodes and links can both have attributes. A path from node a to node b is a sequence of adjacent edges from a to b. A cycle is a path that begins and ends at the same node. A graph is connected if there exists a path between any two nodes. A tree is a connected acyclic graph. If there are n nodes, what is the maximum number of links – in a directed graph? – in an undirected graph? – in a directed tree? – in an undirected tree? 2
Graph Analytics 3 Slide courtesy of John Feo, PNNL
Scientific Grids vs. Data Informatics Graphs 4 Slide courtesy of John Feo, PNNL
5 Slide courtesy of Mathieu Bastian
National Security 6 Slide courtesy of Mathieu Bastian
Public Health 7
Small Graphs 8
Medium Graphs 9
Large Graphs 10
Implicit vs. Explicit 11
Graph Analytics 12
Idiom Choices 13
Triangular-vertical node-link layout What: Tree dataset Why: Hierarchical relationships, topology analysis tasks How: Vertical spatial position shows depth in tree, horizontal spatial position is artifact of layout algorithm Scale: A few dozen nodes 14
Spline-radial Layout What: tree dataset How: – Depth encoded as distance from center of circle – Links drawn as smoothly curving splines – Reingold-Tilford layout algorithm Scale: A few hundred nodes Example written in D3: –
D3 Tree Layout d3js_11.html d3js_11.html Layout Layout Representative of the D3 hierarchy layout – Layout Layout Produces node-link diagrams of trees using the Reingold-Tilford “tidy” algorithm Can input data that is in JSON (JavaScript Object Notation) format 16
Brainstorming Exercise 1 How could we scale tree layouts to more than a few hundred nodes? – Possible strategy: use 3D Why or why not? 17
Collapsible Tree Layout Example in D3 –
Treemap 19 Examples:
General Graph Layouts Also called network layouts Do not directly use spatial position to encode attribute values Layout algorithms try to minimize number of edge crossings and node overlaps. May use size and color encodings for node and link attributes 20
Force-Directed Placement Widely used for node-link network layout Position network elements according to a simulation of physical forces – e.g., – Nodes push away from each other – Links act like springs that draw their endpoints closer Can start by placing nodes randomly and iterating to gradually improve layout Disadvantages – Clusters may be artifacts of algorithm – Layout may be nondeterministic – May get stuck in local minimum energy configuration – Doesn’t scale past a few hundred nodes 21
What-Why-How for Force-Directed Placement 22
Scalable Force-Directed Placement (sfdp) Multilevel approach that transforms network into hierarchy of successively simpler networks Algorithm: Layout coarsest network first, then improve layout with more and more complex versions Examples: Graphviz software: 23
Adjacency Matrix View 24 Example:
Characteristic Patterns in Node-link and Matrix Views 25
Brainstorming Exercise 2 Which graph analysis tasks are better supported by the node-link view, and which are better supported by the matrix view? How does the above answer change with increasing size of the graph? 26
Graph Visualization Tools Sigma.js JavaScript library Sigma.js Gephi open source graph viz platform Gephi Many more! 27
Preparation for Next Class Keep working with D3, use the tutorials on the D3.js wiki Implement interaction in your parallel coordinates visualization for Lab 3 Decide which datasets to use for Lab 3 Grad students and extra credit for undergrads: Research k-means clustering 28
Looking Ahead Quest (Quiz/Test) on Wed, Oct. 15 Course exam on Wed, Nov