James Abello, MSCS Director Computer Science Department Busch Campus

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
© 2010 Artur Dubrawski 1 T-Cube Web Interface in RTBP: A Review of R&D Challenges Artur Dubrawski, Ph.D, M.Eng. Director, Auton Lab Senior Systems Scientist,
Small-World Graphs for High Performance Networking Reem Alshahrani Kent State University.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Search Engines and Information Retrieval
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
Data Mining – Intro.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Visual Analytics for Interactive Exploration of Large-Scale Documents via Nonnegative Matrix Factorization Jaegul Choo*, Barry L. Drake †, and Haesun Park*
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
Social Media Mining Graph Essentials.
1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Chapter No 3 ICT IN Science,Maths,Modeling, Simulation.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Search Engines and Information Retrieval Chapter 1.
GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.
Lecture 12: Network Visualization Slides are modified from Lada Adamic, Adam Perer, Ben Shneiderman, and Aleks Aris.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Structural Properties of Networks: Introduction Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark.
Complex Network Theory – An Introduction Niloy Ganguly.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
Complex Network Theory – An Introduction Niloy Ganguly.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Spring 2015 Mathematics in Management Science Network Problems Networks & Trees Minimum Networks Spanning Trees Minimum Spanning Trees.
Mining information from social media
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
© Vipin Kumar IIT Mumbai Case Study 2: Dipoles Teleconnections are recurring long distance patterns of climate anomalies. Typically, teleconnections.
Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: Department of Computer Science and Engineering Shanghai Jiao Tong.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
EE Faculty.
Cohesive Subgraph Computation over Large Graphs
Structural Properties of Networks: Introduction
Data Mining – Intro.
Structural Properties of Networks: Introduction
Personalized Social Image Recommendation
Current Issues or Challenges in Visual Analytics
Dieudo Mulamba November 2017
Assessing Hierarchical Modularity in Protein Interaction Networks
Structural Properties of Networks: Introduction
Network Visualization
Network Science: A Short Introduction i3 Workshop
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
CSE572, CBS598: Data Mining by H. Liu
Department of Computer Science University of York
CSE572, CBS572: Data Mining by H. Liu
Jiawei Han Department of Computer Science
SEG5010 Presentation Zhou Lanjun.
Graph and Link Mining.
CSE572: Data Mining by H. Liu
Representing Higher-order Dependencies in Networks: Hands-on Tasks
Presentation transcript:

James Abello, MSCS Director Computer Science Department Busch Campus Graph Analyti Graph Analytics Primer James Abello, MSCS Director Computer Science Department Busch Campus

What is a Combinatorial Graph or Network G = (V,E) ? Collection of Vertices V and “pairs” E of elements from V called Edges. Ex: V = {a,1,3,b, z} E = {{a,1}, {b,3}, {z,1}, {a,b}, {a,z} , {b,z} } Pictorially ? Graphs can have “weights”, “labels” or “time stamps” on the vertices and edges or more complicated meta-information. Edges may have directions.

Examples The Web, The Internet, Phone Calls, Maps, Co-Occurrence, Paragraphs in Books, Family Trees, Authors and Papers, Airports Flights, Clicks on Web Sites, Friendships networks, Social Media, Biological Networks, Events Collection, Diseases – Symptoms –Treatments, Images(2d,3d), …

Graph Sources News, Scientific Publications, Astronomic Observations, Pictures, Social Events, Biology, Physics, Mathematics, Social Sciences, Health Care Data, Medicine, …

A helpful methaphore Effectively and Efficiently How to go from point a to point b Effectively and Efficiently

Typical Tasks a. Create or Define Graphs of interest from your Data Sets. b. Data Access and Graph formation c. Identify the questions you want to answer d. Compute Graph Statistics: |V|, |E|, Connectivity, average degree, degree distribution, density, longest paths (Diameter), landmarks, most central vertexes, ….

Typical Tasks(cont) e. Define a “similarity” between the vertices f. Partition the vertices according to the similarity measure of interest (This has been called “Clustering” or sexier name today is “Unsupervised Learning” ) g. Interpret the clusters

Typical Tasks(cont) g. Interpret the clusters h. (Feedback Loop) Incorporate this information back into your data and run modified algorithms of interest. i. Summarize the findings, publish or incorporate them into processes of interest.

Main Issues Define typical scenarios b. How is the graph consumed by a user? Text? Visual Interface? On Demand? On a desktop? Special device? c. What are the interactivity requirements? d. How can we amplify a human user understanding of the graph data? (maps are good examples of successful stories) e. How do we access the “satellite” information associated with the graph data? f. I s the graph and its associated data public?

A system at work An example of current capabilities of graph manipulation systems that are in existence today. Go to Atlas demo https://fredhohman.com/papers/atlas

Atlas Local Graph Exploration in a Global Context Fred Hohman IUI 2019 Fred Hohman @fredhohman Georgia Tech James Abello Rutgers Varun Bezzam Georgia Tech Polo Chau Georgia Tech

Graph Sensemaking

Global View Graph Sensemaking Local View

Global View Free Exploration Graph Sensemaking Local View Targeted Exploration

Important Structure Graph Sensemaking Important Nodes

Important Structure Graph Sensemaking Important Nodes

Important Structure Graph Sensemaking Important Nodes

HCI Human-computer Data Mining Interaction Automatic User-driven, iterative Summarization, clustering, classification Interactive, visualization Millions of nodes Thousands of nodes

HCI Human-computer Data Mining Interaction Automatic User-driven, iterative Summarization, clustering, classification Interactive, visualization Millions of nodes Thousands of nodes The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Sahu, et al. VLDB, 2017.

HCI Human-computer Data Mining Interaction Automatic User-driven, iterative Summarization, clustering, classification Interactive, visualization Millions of nodes Thousands of nodes The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Sahu, et al. VLDB, 2017.

12

Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition 13

Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition separate graph into graph layers 13

Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition separate graph into graph layers reveal peculiar subgraph 13

Atlas interactive graph exploration via scalable edge decomposition bit.ly/atlas-iui interactive graph exploration via scalable edge decomposition separate graph into graph layers reveal peculiar subgraph visualize local + global structure 13

1 3 4 5 1 5 4 1 2 2

peel = 1 1 3 4 5 1 5 4 1 2 2

peel = 1 3 4 2 5 4 2

peel = 2 3 4 2 5 4 2

peel = 2 3 3 3 3

peel = 3 3 3 3 3

peel = 3 3 3 3 3 3 3

peel = 1 1 1 5 1 2 1 1 2 2

peel = 1 1 2

peel = 1 2

peel = 2 2

peel = 2 2

peel = 1 1 1 5 1 1 1

peel = 1 1 1 1 1 1

3 1 1 3 3 1 3 3 1 1 2 3 2

graph layer 3 graph layer 1 1 3 3 3 1 graph layer 2 2 1 1 3 2

graph layer 3 graph layer 1 graph layer 2 vertex clones 1 3 3 3 1 2 1 3 3 1 graph layer 2 2 1 1 3 vertex clones 2 2

Graph Vertices Edges Time (s) Layers Google+ 24K 39K ~0 10 arXiv astro-ph 19K 198K 47 Amazon 335K 925K 6 US Patents 3.8M 17M 11 41 Wikipedia (German) 3.2M 82M 225 320 Orkut 3.1M 117M 92 91 32

Time complexity: O(#edges x #layers) Graph Vertices Edges Time (s) Layers Google+ 24K 39K ~0 10 arXiv astro-ph 19K 198K 47 Amazon 335K 925K 6 US Patents 3.8M 17M 11 41 Wikipedia (German) 3.2M 82M 225 320 Orkut 3.1M 117M 92 91 Time complexity: O(#edges x #layers) layers << edges 32

Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure Alok Tripathy, Fred Hohman, Duen Horng (Polo) Chau, Oded Green IEEE International Conference on Big Data. Seattle, WA, USA, 2018. 33

GPU + dynamic graph data structure -> 4x - 8x speed up over ParK Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure Alok Tripathy, Fred Hohman, Duen Horng (Polo) Chau, Oded Green IEEE International Conference on Big Data. Seattle, WA, USA, 2018. GPU + dynamic graph data structure -> 4x - 8x speed up over ParK 33

Demo: Understanding Word Embedding Graph Nodes: 66K words from Wikipedia Edges: 214K (connect words with small distance) families of birds caeciliidae caeciliidae worm-like amphibians families of sea snails 34

User Study Goal: use Atlas to spot interesting patterns, mimicking their own work Graph Analysts Researcher, Symantec Graphs Yelp Reviews Network Researcher, NASA Systems engineer, NASA All PhDs + use graphs daily or weekly SEC Insider Trading Graph GloVe Word Embed. Graph Intro questionnaire → Atlas tutorial → Study → Exit questionnaire

User Study Findings 38

User Study Findings 3D for overview, 2D for details 38

User Study Findings 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph 38

User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely 38

User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently 38

User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures 38

User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures Vertex clones as traversal mechanism between layers 38

User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures Vertex clones as traversal mechanism between layers Application to anomaly detection 38

User Study Findings • 3D for overview, 2D for details 3D useful for intro to new data → get a “feel” for the graph Graph Ribbon + Layers view used more precisely Show nearest neighbors used frequently Identifying and linking meaningful graph substructures Vertex clones as traversal mechanism between layers Application to anomaly detection “…analysis (using [both] vertex clones and layers) naturally reveals potentially anomalous substructures and vertices. This is highly useful from a cybersecurity perspective.” 38

Future Work

Future Work Automatically suggest interesting layers

Future Work • Automatically suggest interesting layers Dynamic graph decomposition visualization

Future Work • Automatically suggest interesting layers Dynamic graph decomposition visualization Visual scalability (e.g., super-noding, edge bundling, graph motif)

Atlas bit.ly/atlas-iui Local Graph Exploration in a Global Context Thanks! Fred Hohman @fredhohman fredhohman@gatech.edu James Abello abelloj@cs.rutgers.edu bit.ly/atlas-iui families of birds Varun Bezzam varun.bezzam@gatech.edu caeciliidae caeciliidae worm-like amphibians Polo Chau polo@gatech.edu families of sea snails families of land creatures caeciliidae We thank the anonymous reviewers for their constructive feedback.