Mining Networks through Visual Analytics

Slides:



Advertisements
Similar presentations
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
Advertisements

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
PARTITIONAL CLUSTERING
Decision Tree Induction in Hierarchic Distributed Systems With: Amir Bar-Or, Ran Wolff, Daniel Keren.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
TorusVis ND : Unraveling High- Dimensional Torus Networks for Network Traffic Visualizations Shenghui Cheng, Pradipta De, Shaofeng H.-C. Jiang* and Klaus.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Live Re-orderable Accordion Drawing (LiveRAC) Peter McLachlan, Tamara Munzner Eleftherios Koutsofios, Stephen North AT&T Research Symposium August, 2007.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
ExaSphere Network Analysis Engine © 2006 Joseph E. Johnson, PhD
ACE:A Fast Multiscale Eigenvectors Computation for Drawing Huge Graphs Yehunda Koren Liran Carmel David Harel.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
DIDS part II The Return of dIDS 2/12 CIS GrIDS Graph based intrusion detection system for large networks. Analyzes network activity on networks.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
Models of Influence in Online Social Networks
1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
By LaBRI – INRIA Information Visualization Team. Tulip 2010 – version Tulip is an information visualization framework dedicated to the analysis.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
1/27 Ensemble Visualization for Cyber Situation Awareness of Network Security Data Lihua Hao 1, Christopher G. Healey 1, Steve E. Hutchinson 2 1 North.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Lecture 12: Network Visualization Slides are modified from Lada Adamic, Adam Perer, Ben Shneiderman, and Aleks Aris.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Fall 2002CS/PSY Information Visualization Picture worth 1000 words... Agenda Information Visualization overview  Definition  Principles  Examples.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
1 Smashing Peacocks Further: Drawing Quasi-Trees from Biconnected Components Daniel Archambault and Tamara Munzner, University of British Columbia David.
Visualization Blaz Zupan Faculty of Computer & Info Science University of Ljubljana, Slovenia.
Metro Transit-Centric Visualization for City Tour Planning Pio Claudio and Sung-Eui Yoon.
A Graph-based Friend Recommendation System Using Genetic Algorithm
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
© 2009 IBM Corporation 1 Space, Time, and Antony Space, Time and Antony Visualizing Then and Now, Here and There.
Graph Visualization and Beyond … Anne Denton, April 4, 2003 Including material from a paper by Ivan Herman, Guy Melançon, and M. Scott Marshall.
Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Network Community Behavior to Infer Human Activities.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Cohesive Subgraph Computation over Large Graphs
Data Mining – Intro.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Visualizing Complex Software Systems
Control Choices and Network Effects in Hypertext Systems
Some tools and a discussion.
Recent trends in estimation methodologies
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Gedas Adomavicius Jesse Bockstedt
Software Engineering and Best Practices
Collaboration Spotting: Visualisation of LHCb process data
DEMON A Local-first Discovery Method For Overlapping Communities
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Gephi Gephi is a tool for exploring and understanding graphs. Like Photoshop (but for graphs), the user interacts with the representation, manipulate the.
Current Issues or Challenges in Visual Analytics
Hybrid Ray Tracing of Massive Models
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Information Visualization Picture worth 1000 words...
Network Visualization
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
Introduction to Geographic Information Science
CSc4730/6730 Scientific Visualization
Connecting the Dots Kate Petersen ESnet Science Engagement
Object oriented analysis and design
Integrating Deep Learning with Cyber Forensics
Resource Allocation in a Middleware for Streaming Data
CHAPTER 7: Information Visualization
CHAPTER 14: Information Visualization
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Mining Networks through Visual Analytics Incremental Hypothesis Building and Validation Thanks. I thought the workshop was a good opportunity to present our work on graph visualization. We are not mining persons, althgough we do contribute to graph mining as part of the visualization process. David Auber Romain Bourqui Guy Melançon CNRS LaBRI UMR 5800 & INRIA Futurs – GRAVITÉ Bordeaux, France

peacokmaps.com

Chinese proverb (?) “A picture is worth a thousand words” InfoVis CyberInfraStructure – Pajek “A picture is worth a thousand words” Chinese proverb (?)

Il y aurait encore le dessin le plus bête qui consiste à placer chaque feuille sur une coordonnée différente en abscisse. C’est l’économie que font Wetherell/Reingold et Tilford/Walker. Tulip – BubbleTree

“It’s all visual” R. Feynman (Nobel prize in Physics) Graph Viz Framework Tulip “It’s all visual” R. Feynman (Nobel prize in Physics)

Internet traffic

“The purpose of computing is insight not numbers ” Voronoï Treemaps “The purpose of computing is insight not numbers ” R. Hamming (1973)

Cushion Treemaps

“Visualization uses computer graphics to help provide insight on complicated problems, models or systems” “Scientific visualization is exploring data and information graphically, gaining understanding and insights into the data” R.A. Earnshaw (a pioneer in computer graphics, 1973) Munzner’s Hyperbolic Browser

Tulip – Sugiyama Layout

Visualize? Inselberg – « creator » of parallel coordinates « Insight through images » « Goal: Visual Model to Help our Intuition » « Involves: Geometry, Cognition, Art ? »

Visualize?

Visual graph mining related to security issues “Recognize” structural properties Identify key actors Identify their neighborhood Community structure Connectivity between communities … “Chess players recognize patterns” I guess the issues I will discuss relate to security just as Tishby’s.

Example from NCTC data Extracted about 8000 incidents from WITS Identified terrorists groups when possible (directly or through AFP) Identified countries where incidents took place Added territorial information (continents, world regions) to help organize the overall map

Example from NCTC data About 8000 incidents 9419 nodes 18486 edges Layout is time consuming Does not provide clue about structure Filter out incidents with no identified group

Example from NCTC data Interactivity Tulip Graph Viz Framework « Play » with network Apply various metrics Attribute-based node filtering Tulip Graph Viz Framework Opensource Plug-in architecture www.tulip-software.org

Massive data Information big bang - Projet « How much information », Berkeley University In 2001, about 1 exabyte (1 million terabytes) of data is generated annually worldwide, including 99.997% available only in digital form In 2003 : each individual produces about 800 megabytes per year We want to go for larger, more complex datasets. I certainly don’t have to bring in arguments to convince you of those needs. The figures reported by the Berkeley project or from Keim speak for themselves.

Massive data 100 million FedEx transactions / day 150 million VISA transactions / day 300 millions long distance calls / day over ATT’s network 35 billions e-mails / day over the world 600 billions IP packets / day over DE-CIX backbone Keim, VIEW Workshop 2006

Visualization and Moore’s law Daniel Keim - Keynote Address, VIEW 2006

Visualization and Moore’s law Issues that won’t be solved by hardware only Design interaction together with visualization Understand how and why visualization pays Collaborate with other fields Integrate visualization together with other technology NIH-NSF Visualization Research Challenges Report, 2006

Added value of visual and interactive mining KDD Panel «  The Perfect Data Mining Tool » [Ankerst 2002] The human eye is an excellent tool for spotting natural patterns Getting rid of the human in the loop? Wrong decision! Increase human participation through visualization in the data exploration and knowledge discovery processes

« Sense making loop » J. Thomas – Visual Analytics Initiative

« Visualization mantras » Visual Information Seeking Mantra Overview, Zoom-in / Filter, and Details on Demand (Shneiderman, 1996) Visual Analytics Mantra Analyse first, Show the Important, Zoom, filter and analyse, Details on demand (Keim 2006)

Visualization “pipeline” A designer’s view on the visualization process

Protein interaction network (yeast); Barabàsi 2000 Visualize? Protein interaction network (yeast); Barabàsi 2000

Organize data prior to visualization Layer or hierarchize data based on: node/edge metrics (eigenvalues, centralities, …) topological feature detection Use relevant drawing methods Combine with interaction

Case study: ITA 2000 passenger air traffic Cities connect through direct flights Edge weights: number of passengers Questions: Read motivations of carriers through organization of the network? Territorial logic? Political? Economical?

Case study: ITA 2000 passenger air traffic Cities connect through direct flights Edge weights: number of passengers Questions: Read motivations of carriers through organization of the network? Territorial logic? Political? Economical?

TopoLayout – (Topological) Feature-based Hierarchization Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like « Clusters »

TopoLayout – (Topological) Feature-based Hierarchization Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like « Clusters »

TopoLayout – (Topological) Feature-based Hierarchization Search the graph for components of growing complexity Subtrees Biconnected components Grid-like « Clusters »

TopoLayout – (Topological) Feature-based Hierarchization Search the graph for components of growing complexity Subtrees Biconnected components Grid-like « Clusters » Need to identify articulation points (“pivots”) The graph builds into a “tree of biconnected components”

TopoLayout – (Topological) Feature-based Hierarchization Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like (eigenvalues) « Clusters »

TopoLayout – (Topological) Feature-based Hierarchization Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like (eigenvalues) « Clusters »

TopoLayout Components naturally organize as a hierarchy through the search process

TopoLayout + interaction: Grouse Explore the graph by unfolding/folding the hierarchy The user’s navigation triggers layout of components Higher level graphs (quotient graphs) are built from metanodes Improve readability / Less visual elements Faster layout, based on topology of quotient graph Grouse

TopoLayout + interaction: Grouse Multilevel hierarchy: recursive grouping of metanodes

TopoLayout + interaction: Grouse Multilevel hierarchy: recursive grouping of metanodes

TopoLayout + interaction: Grouse Multilevel Hierarchy for Abstraction: Cut

Multilevel navigation of small world networks Small world networks: social networks, web graphs, transportation networks (ITA), … Small world networks organize into several levels (hierarchy) [Adamic, Huberman] Idea: capture the hierarchy and use it as a navigation paradigm

Small world networks Centralities Bottleneck passageways Network organizes around those « pivots » nodes

Small world networks Centralities Betweenness centrality has high computational cost (global) Betweenness centrality Eigenvalue centrality Prefer local index Degree Edge strength

Small world networks Edge strength: proportion of cycles containing an edge (length 3 and 4) Mu = Nu\Nv Mv = Nv\Nu Wuv e u v (Jaccard 1912) (Tanimoto 1958) Auber et al. 2003 Raddichi et al. 2004

Small world networks Edge strength Costs linear time if degree is bounded, otherwise quadratic … Mu = Nu\Nv Mv = Nv\Nu Wuv e u v

Small world networks Edge strength Cost yet lower than most centralities (local versus global indices) Incremental: local modification of graphs require local recomputation Mu = Nu\Nv Mv = Nv\Nu Wuv e u v

Community structure of small world networks Filter out weak edges Capture components Infer quotient graph (metanodes) Recurse over each component

Community structure of small world networks Filter out weak edges Capture components Infer quotient graph (metanodes) Recurse over each component

Community structure of small world networks Filter out weak edges: Q. What threshold to choose? A. Best possible one (!) Use quality criteria MQ (modularity quality)

“Quality” criteria MQ … … C = (C1, C2, …, Cp) is a clustering of a graph G MQ(C; G) = C1 C2 … Cp C1 C2 … Cp

MQ / Nice properties MQ varies over a bounded interval [-1, 1] MQ behaves like a Gaussian distribution MQ(C; G) =

MQ / Nice properties MQ behaves like a Gaussian distribution MQ(C; G) =

Challenge: find the best possible clustering (according to MQ) Exhaustive search: intractable Optimization, search algorithms (hill climbing, genetic algorithms, bio-mimetics, …): costy Heuristic: exploit node/edge centralities Filter out weak edges Tickmark possible values for edges Find threshold with best MQ

Filter / Threshold 1 2 5 4 3 1 2 3 4 5

Filter / Threshold 1 2 5 4 3 1 2 3 4 5 

Filter / Threshold 1 2 3 4 5  C1 C2 C3 C4 C5 C()

Hierarchical organization of the network The procedure can be iterated to produce a hierarchy of clusters Strength of edges is recomputed at each stage Threshold is locally chosen for each component

MQ / Extension To take into account the relative size of clusters (MQ also naturally extends to fuzzy clustering) MQ(C; G) =

MQ / Extension Extend to various classes of graphs (where F stands for any adequate edge density function) MQ(C; G) =

Conclusion – Future work MQ / Extension to graph hierarchies

MQ / Extension to graph hierarchies Inspired from attribute grammars

Conclusion – Future work Study dynamic network Streamed / Time-stamped network Incremental/local computation/adjustment of edge metrics (local metrics) MQ (or other possible quality criteria)

Conclusion Interaction is the real added value of visualization Must combine with other mining techniques Insert combination in “sense making loop”

Conclusion We are opened and interested to collaborate with colleagues from other areas, adopting different perspectives Learning / Mining / … Experts / Corporate organizations / Final users Any idea for a different multilevel clustering criteria/approach?

Conclusion We are opened and interested to collaborate with colleagues from other areas, with other perspectives Learning / Mining / … Experts / Corporate organizations / Final users Go visit Tulip’s website and download the software (I’m here until Friday if you need a coach !) www.tulip-software.org Guy.Melancon@labri.fr

Credits LaBRI UMR 5800, Bordeaux -- Equipe GRAVITÉ / INRIA Futurs Guy Melançon Maylis Delest David Auber Patrick Mary Tulip Graph Viz Framework www.tulip-software.org R. Bourqui, U Bx, FR D. Archambault, UBC, CA T. Munzner, UBC, CA @labri.fr