James Abello, MSCS Director Computer Science Department Busch Campus Graph Analyti Graph Analytics Primer James Abello, MSCS Director Computer Science Department Busch Campus
What is a Combinatorial Graph or Network G = (V,E) ? Collection of Vertices V and “pairs” E of elements from V called Edges. Ex: V = {a,1,3,b, z} E = {{a,1}, {b,3}, {z,1}, {a,b}, {a,z} , {b,z} } Pictorially ? Graphs can have “weights”, “labels” or “time stamps” on the vertices and edges or more complicated meta-information. Edges may have directions.
Examples The Web, The Internet, Phone Calls, Maps, Co-Occurrence, Paragraphs in Books, Family Trees, Authors and Papers, Airports Flights, Clicks on Web Sites, Friendships networks, Social Media, Biological Networks, Events Collection, Diseases – Symptoms –Treatments, Images(2d,3d), …
Graph Sources News, Scientific Publications, Astronomic Observations, Pictures, Social Events, Biology, Physics, Mathematics, Social Sciences, Health Care Data, Medicine, …
A helpful methaphore Effectively and Efficiently How to go from point a to point b Effectively and Efficiently
Typical Tasks a. Create or Define Graphs of interest from your Data Sets. b. Data Access and Graph formation c. Identify the questions you want to answer d. Compute Graph Statistics: |V|, |E|, Connectivity, average degree, degree distribution, density, longest paths (Diameter), landmarks, most central vertexes, ….
Typical Tasks(cont) e. Define a “similarity” between the vertices f. Partition the vertices according to the similarity measure of interest (This has been called “Clustering” or sexier name today is “Unsupervised Learning” ) g. Interpret the clusters
Typical Tasks(cont) g. Interpret the clusters h. (Feedback Loop) Incorporate this information back into your data and run modified algorithms of interest. i. Summarize the findings, publish or incorporate them into processes of interest.
Main Issues Define typical scenarios b. How is the graph consumed by a user? Text? Visual Interface? On Demand? On a desktop? Special device? c. What are the interactivity requirements? d. How can we amplify a human user understanding of the graph data? (maps are good examples of successful stories) e. How do we access the “satellite” information associated with the graph data? f. I s the graph and its associated data public?
A system at work An example of current capabilities of graph manipulation systems that are in existence today. Go to Atlas demo https://fredhohman.com/papers/atlas
Atlas Local Graph Exploration in a Global Context Fred Hohman IUI 2019 Fred Hohman @fredhohman Georgia Tech James Abello Rutgers Varun Bezzam Georgia Tech Polo Chau Georgia Tech
Graph Sensemaking
Global View Graph Sensemaking Local View
Global View Free Exploration Graph Sensemaking Local View Targeted Exploration
Important Structure Graph Sensemaking Important Nodes
Important Structure Graph Sensemaking Important Nodes
Important Structure Graph Sensemaking Important Nodes
HCI Human-computer Data Mining Interaction Automatic User-driven, iterative Summarization, clustering, classification Interactive, visualization Millions of nodes Thousands of nodes
HCI Human-computer Data Mining Interaction Automatic User-driven, iterative Summarization, clustering, classification Interactive, visualization Millions of nodes Thousands of nodes The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Sahu, et al. VLDB, 2017.
HCI Human-computer Data Mining Interaction Automatic User-driven, iterative Summarization, clustering, classification Interactive, visualization Millions of nodes Thousands of nodes The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Sahu, et al. VLDB, 2017.
12
Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition 13
Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition separate graph into graph layers 13
Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition separate graph into graph layers reveal peculiar subgraph 13
Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition separate graph into graph layers reveal peculiar subgraph visualize local + global structure 13
1 3 4 5 1 5 4 1 2 2
peel = 1 1 3 4 5 1 5 4 1 2 2
peel = 1 3 4 2 5 4 2
peel = 2 3 4 2 5 4 2
peel = 2 3 3 3 3
peel = 3 3 3 3 3
peel = 3 3 3 3 3 3 3
peel = 1 1 1 5 1 2 1 1 2 2
peel = 1 1 2
peel = 1 2
peel = 2 2
peel = 2 2
peel = 1 1 1 5 1 1 1
peel = 1 1 1 1 1 1
3 1 1 3 3 1 3 3 1 1 2 3 2
graph layer 3 graph layer 1 1 3 3 3 1 graph layer 2 2 1 1 3 2
graph layer 3 graph layer 1 graph layer 2 vertex clones 1 3 3 3 1 2 1 3 3 1 graph layer 2 2 1 1 3 vertex clones 2 2
Graph Vertices Edges Time (s) Layers Google+ 24K 39K ~0 10 arXiv astro-ph 19K 198K 47 Amazon 335K 925K 6 US Patents 3.8M 17M 11 41 Wikipedia (German) 3.2M 82M 225 320 Orkut 3.1M 117M 92 91 32
Time complexity: O(#edges x #layers) Graph Vertices Edges Time (s) Layers Google+ 24K 39K ~0 10 arXiv astro-ph 19K 198K 47 Amazon 335K 925K 6 US Patents 3.8M 17M 11 41 Wikipedia (German) 3.2M 82M 225 320 Orkut 3.1M 117M 92 91 Time complexity: O(#edges x #layers) layers << edges 32
Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure Alok Tripathy, Fred Hohman, Duen Horng (Polo) Chau, Oded Green IEEE International Conference on Big Data. Seattle, WA, USA, 2018. 33
GPU + dynamic graph data structure -> 4x - 8x speed up over ParK Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure Alok Tripathy, Fred Hohman, Duen Horng (Polo) Chau, Oded Green IEEE International Conference on Big Data. Seattle, WA, USA, 2018. GPU + dynamic graph data structure -> 4x - 8x speed up over ParK 33
Demo: Understanding Word Embedding Graph Nodes: 66K words from Wikipedia Edges: 214K (connect words with small distance) families of birds caeciliidae caeciliidae worm-like amphibians families of sea snails 34
User Study Goal: use Atlas to spot interesting patterns, mimicking their own work Graph Analysts Researcher, Symantec Graphs Yelp Reviews Network Researcher, NASA Systems engineer, NASA All PhDs + use graphs daily or weekly SEC Insider Trading Graph GloVe Word Embed. Graph Intro questionnaire → Atlas tutorial → Study → Exit questionnaire
User Study Findings 38
User Study Findings 3D for overview, 2D for details 38
User Study Findings 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph 38
User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely 38
User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently 38
User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures 38
User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures Vertex clones as traversal mechanism between layers 38
User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures Vertex clones as traversal mechanism between layers Application to anomaly detection 38
User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures Vertex clones as traversal mechanism between layers Application to anomaly detection “…analysis (using [both] vertex clones and layers) naturally reveals potentially anomalous substructures and vertices. This is highly useful from a cybersecurity perspective.” 38
Future Work
Future Work Automatically suggest interesting layers
Future Work • Automatically suggest interesting layers Dynamic graph decomposition visualization
Future Work • Automatically suggest interesting layers Dynamic graph decomposition visualization Visual scalability (e.g., super-noding, edge bundling, graph motif)
Atlas bit.ly/atlas-iui Local Graph Exploration in a Global Context Thanks! Fred Hohman @fredhohman fredhohman@gatech.edu James Abello abelloj@cs.rutgers.edu bit.ly/atlas-iui families of birds Varun Bezzam varun.bezzam@gatech.edu caeciliidae caeciliidae worm-like amphibians Polo Chau polo@gatech.edu families of sea snails families of land creatures caeciliidae We thank the anonymous reviewers for their constructive feedback.