Download presentation
Presentation is loading. Please wait.
1
Mining Networks through Visual Analytics
Incremental Hypothesis Building and Validation Thanks. I thought the workshop was a good opportunity to present our work on graph visualization. We are not mining persons, althgough we do contribute to graph mining as part of the visualization process. David Auber Romain Bourqui Guy Melançon CNRS LaBRI UMR 5800 & INRIA Futurs – GRAVITÉ Bordeaux, France
2
peacokmaps.com
3
Chinese proverb (?) “A picture is worth a thousand words” InfoVis
CyberInfraStructure – Pajek “A picture is worth a thousand words” Chinese proverb (?)
4
Il y aurait encore le dessin le plus bête qui consiste à placer chaque feuille sur une coordonnée différente en abscisse. C’est l’économie que font Wetherell/Reingold et Tilford/Walker. Tulip – BubbleTree
5
“It’s all visual” R. Feynman (Nobel prize in Physics)
Graph Viz Framework Tulip “It’s all visual” R. Feynman (Nobel prize in Physics)
6
Internet traffic
7
“The purpose of computing is insight not numbers ”
Voronoï Treemaps “The purpose of computing is insight not numbers ” R. Hamming (1973)
8
Cushion Treemaps
9
“Visualization uses computer graphics to help provide insight on complicated problems, models or systems” “Scientific visualization is exploring data and information graphically, gaining understanding and insights into the data” R.A. Earnshaw (a pioneer in computer graphics, 1973) Munzner’s Hyperbolic Browser
10
Tulip – Sugiyama Layout
11
Visualize? Inselberg – « creator » of parallel coordinates
« Insight through images » « Goal: Visual Model to Help our Intuition » « Involves: Geometry, Cognition, Art ? »
12
Visualize?
13
Visual graph mining related to security issues
“Recognize” structural properties Identify key actors Identify their neighborhood Community structure Connectivity between communities … “Chess players recognize patterns” I guess the issues I will discuss relate to security just as Tishby’s.
14
Example from NCTC data Extracted about 8000 incidents from WITS
Identified terrorists groups when possible (directly or through AFP) Identified countries where incidents took place Added territorial information (continents, world regions) to help organize the overall map
15
Example from NCTC data About 8000 incidents 9419 nodes 18486 edges
Layout is time consuming Does not provide clue about structure Filter out incidents with no identified group
16
Example from NCTC data Interactivity Tulip Graph Viz Framework
« Play » with network Apply various metrics Attribute-based node filtering Tulip Graph Viz Framework Opensource Plug-in architecture
17
Massive data Information big bang - Projet « How much information », Berkeley University In 2001, about 1 exabyte (1 million terabytes) of data is generated annually worldwide, including % available only in digital form In 2003 : each individual produces about 800 megabytes per year We want to go for larger, more complex datasets. I certainly don’t have to bring in arguments to convince you of those needs. The figures reported by the Berkeley project or from Keim speak for themselves.
18
Massive data 100 million FedEx transactions / day
150 million VISA transactions / day 300 millions long distance calls / day over ATT’s network 35 billions s / day over the world 600 billions IP packets / day over DE-CIX backbone Keim, VIEW Workshop 2006
19
Visualization and Moore’s law
Daniel Keim - Keynote Address, VIEW 2006
20
Visualization and Moore’s law
Issues that won’t be solved by hardware only Design interaction together with visualization Understand how and why visualization pays Collaborate with other fields Integrate visualization together with other technology NIH-NSF Visualization Research Challenges Report, 2006
21
Added value of visual and interactive mining
KDD Panel « The Perfect Data Mining Tool » [Ankerst 2002] The human eye is an excellent tool for spotting natural patterns Getting rid of the human in the loop? Wrong decision! Increase human participation through visualization in the data exploration and knowledge discovery processes
22
« Sense making loop » J. Thomas – Visual Analytics Initiative
23
« Visualization mantras »
Visual Information Seeking Mantra Overview, Zoom-in / Filter, and Details on Demand (Shneiderman, 1996) Visual Analytics Mantra Analyse first, Show the Important, Zoom, filter and analyse, Details on demand (Keim 2006)
24
Visualization “pipeline”
A designer’s view on the visualization process
25
Protein interaction network (yeast); Barabàsi 2000
Visualize? Protein interaction network (yeast); Barabàsi 2000
26
Organize data prior to visualization
Layer or hierarchize data based on: node/edge metrics (eigenvalues, centralities, …) topological feature detection Use relevant drawing methods Combine with interaction
27
Case study: ITA 2000 passenger air traffic
Cities connect through direct flights Edge weights: number of passengers Questions: Read motivations of carriers through organization of the network? Territorial logic? Political? Economical?
28
Case study: ITA 2000 passenger air traffic
Cities connect through direct flights Edge weights: number of passengers Questions: Read motivations of carriers through organization of the network? Territorial logic? Political? Economical?
29
TopoLayout – (Topological) Feature-based Hierarchization
Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like « Clusters »
30
TopoLayout – (Topological) Feature-based Hierarchization
Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like « Clusters »
31
TopoLayout – (Topological) Feature-based Hierarchization
Search the graph for components of growing complexity Subtrees Biconnected components Grid-like « Clusters »
32
TopoLayout – (Topological) Feature-based Hierarchization
Search the graph for components of growing complexity Subtrees Biconnected components Grid-like « Clusters » Need to identify articulation points (“pivots”) The graph builds into a “tree of biconnected components”
33
TopoLayout – (Topological) Feature-based Hierarchization
Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like (eigenvalues) « Clusters »
34
TopoLayout – (Topological) Feature-based Hierarchization
Search the graph for components of growing complexity Subtrees Biconnected components (« blocks ») Grid-like (eigenvalues) « Clusters »
35
TopoLayout Components naturally organize as a hierarchy through the search process
36
TopoLayout + interaction: Grouse
Explore the graph by unfolding/folding the hierarchy The user’s navigation triggers layout of components Higher level graphs (quotient graphs) are built from metanodes Improve readability / Less visual elements Faster layout, based on topology of quotient graph Grouse
37
TopoLayout + interaction: Grouse
Multilevel hierarchy: recursive grouping of metanodes
38
TopoLayout + interaction: Grouse
Multilevel hierarchy: recursive grouping of metanodes
39
TopoLayout + interaction: Grouse
Multilevel Hierarchy for Abstraction: Cut
40
Multilevel navigation of small world networks
Small world networks: social networks, web graphs, transportation networks (ITA), … Small world networks organize into several levels (hierarchy) [Adamic, Huberman] Idea: capture the hierarchy and use it as a navigation paradigm
41
Small world networks Centralities Bottleneck passageways
Network organizes around those « pivots » nodes
42
Small world networks Centralities
Betweenness centrality has high computational cost (global) Betweenness centrality Eigenvalue centrality Prefer local index Degree Edge strength
43
Small world networks Edge strength: proportion of cycles containing an edge (length 3 and 4) Mu = Nu\Nv Mv = Nv\Nu Wuv e u v (Jaccard 1912) (Tanimoto 1958) Auber et al. 2003 Raddichi et al. 2004
44
Small world networks Edge strength
Costs linear time if degree is bounded, otherwise quadratic … Mu = Nu\Nv Mv = Nv\Nu Wuv e u v
45
Small world networks Edge strength
Cost yet lower than most centralities (local versus global indices) Incremental: local modification of graphs require local recomputation Mu = Nu\Nv Mv = Nv\Nu Wuv e u v
46
Community structure of small world networks
Filter out weak edges Capture components Infer quotient graph (metanodes) Recurse over each component
47
Community structure of small world networks
Filter out weak edges Capture components Infer quotient graph (metanodes) Recurse over each component
48
Community structure of small world networks
Filter out weak edges: Q. What threshold to choose? A. Best possible one (!) Use quality criteria MQ (modularity quality)
49
“Quality” criteria MQ … …
C = (C1, C2, …, Cp) is a clustering of a graph G MQ(C; G) = C1 C2 … Cp C1 C2 … Cp
50
MQ / Nice properties MQ varies over a bounded interval [-1, 1]
MQ behaves like a Gaussian distribution MQ(C; G) =
51
MQ / Nice properties MQ behaves like a Gaussian distribution
MQ(C; G) =
52
Challenge: find the best possible clustering (according to MQ)
Exhaustive search: intractable Optimization, search algorithms (hill climbing, genetic algorithms, bio-mimetics, …): costy Heuristic: exploit node/edge centralities Filter out weak edges Tickmark possible values for edges Find threshold with best MQ
53
Filter / Threshold 1 2 5 4 3
54
Filter / Threshold 1 2 5 4 3
55
Filter / Threshold C1 C2 C3 C4 C5 C()
56
Hierarchical organization of the network
The procedure can be iterated to produce a hierarchy of clusters Strength of edges is recomputed at each stage Threshold is locally chosen for each component
57
MQ / Extension To take into account the relative size of clusters
(MQ also naturally extends to fuzzy clustering) MQ(C; G) =
58
MQ / Extension Extend to various classes of graphs (where F stands for any adequate edge density function) MQ(C; G) =
59
Conclusion – Future work
MQ / Extension to graph hierarchies
60
MQ / Extension to graph hierarchies
Inspired from attribute grammars
61
Conclusion – Future work
Study dynamic network Streamed / Time-stamped network Incremental/local computation/adjustment of edge metrics (local metrics) MQ (or other possible quality criteria)
62
Conclusion Interaction is the real added value of visualization
Must combine with other mining techniques Insert combination in “sense making loop”
63
Conclusion We are opened and interested to collaborate with colleagues from other areas, adopting different perspectives Learning / Mining / … Experts / Corporate organizations / Final users Any idea for a different multilevel clustering criteria/approach?
64
Conclusion We are opened and interested to collaborate with colleagues from other areas, with other perspectives Learning / Mining / … Experts / Corporate organizations / Final users Go visit Tulip’s website and download the software (I’m here until Friday if you need a coach !)
65
Credits LaBRI UMR 5800, Bordeaux -- Equipe GRAVITÉ / INRIA Futurs
Guy Melançon Maylis Delest David Auber Patrick Mary Tulip Graph Viz Framework R. Bourqui, U Bx, FR D. Archambault, UBC, CA T. Munzner, UBC, CA @labri.fr
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.