Weighted Flow graphs for statistics Edwin de Jonge NTTS February 2009
Statistics and flows Many official statistics are flow data –Demography –Migration –International trade But also balance systems: –System of National Accounts (SNA) –Energy balance
Statistics and visualisation Visualisation exploits visual system to: –Reveal and highlight patterns in data (trends, correlation, distribution) Most common visualisations – line and bar charts –scatter and bubble plots –Cartographic choropleth
Flow visualization Many official statistics are flow data –But not presented as flows! Flow diagram is weighted directed graph –G = (V,E,w) –Not many visualisation research for weighted directed graphs
Flow visualisation (2) Options –Standard node and edge visualisation –Not real option: does not encode the weights (= data) –Sankey diagrams –Very good for energy statistics etc.! –Cartographic flows –Arrows on a cartographic map
Cartographic flows Flow maps: –Many are hand made –Flow routing is hard –Number of flows is limited to 50 –Most are unidirectional Computer generated cartographic flow layout is still scarce
Experiment: large flow map Most statistical datasets are large! Experiment to visualise –Thousands of flows, that are bidirectional, every flow may have a counter flow It should: –give overview of all flows –show main flows –reveal flow patterns
Experiment: Internal migration Migration between 459 municipalities in the Netherlands Migration is matrix M(i,j) i, j = 1..N m ij = migration from i to j Large number of flows and bidirectional
Experiment: Internal migration Data summary: –60,000 movements (of the 210,000) –Mean = 10, Max = 2880, Median = 2 = Skewed! Technology: –Google Earth, KML file –Generate arrows as polygons in KML
Naïve implementation Too many arrows Visual clutter: –no overview –no main flows –no flow patterns
Naive implementation 2
Visual encoding Use visual encoding to reduce clutter –Arrow –Width: logarithmic scale –Encodes size of flows –Transparency: logarithmic scale –Reduces visual clutter –Height: linear scale –Focus on main flows
User interaction / Results Use user interaction to filter data –user can select regions (no flows) Results Clear overview of overall flows Main flows are visible Non local flows are also visible But no other patterns!
Discussion Result is ok, but should be further improved –Better user interaction –GE user interaction very limited –Select and filter for flows –Reveal patterns in flow data –Use cluster techniques to group flows –User cluster techniques to group regions