Scaling Charts with Design and GPUs Leo Meyerovich CEO of Graphistry.com | UC Berkeley 1
Visibility 2
Visibility through design + speed 3
Histogram of Voter Turnout by Town 4 0%25%50% 75%100% Voter Turnout # Towns Most towns had ~40% people vote ballot box stuffing?
5
Opposition Incumbent Tiny square shows town size (area) and vote (color) 6
Filter for towns w/ high turnout 7
Tag suspicious with black 8
9
For visibility, speed design 10
Problem: Plot 10+ Time Series Signals 11
Design 200 Time Series Signals 100 s 0 s 12
Speed Pan/Zoom Interactions 38 s 37 s 13
CPU Bottlenecks: naïve and offline Transform Parse Layout Render 0ms 1600ms real- time is 30 ms 14
Prep Optimize Binary Data, Multicore Layout, GPU Render Layout Render 15 0ms 1600ms Real-time interaction Stream from server 12MB+/s
Graphs: Placing Nodes and Edges 16
17 Direct Feedback on Settings
Uber: Trip Start to End 18
Direct Edge Placement: Overplotting 19
Speed Design Edge Bundling 20
21
22 web
Bare Metal in the Browser Sequential Multicore GPU 5 X 4+ cores 1024 lanes SIMD 4 lanes 23
S UPERCONDUCTOR : Parallel JS Viz Engine HTML data CSS styling JS script Pixels Parser Selectors Layout Renderer JavaScript VM Renderer.GL webpage 24 Layout.CL Selectors.CL GPU data styling widgets data viz Compiler Parser.js BROWSERSUPERCONDUCTOR.js
Leaf Layout as Parallel Tree Traversals w,h x,y … 1. Works for all data sets 2. Compiler: CSS Schedule logical joins logical spawns Parallelism in each traversal! 25
parallel for loop level synchronous GPU Traversals: Flat & Level-Synchronous level 1 Tree level n w h x y Nodes in arrays flat Array per attribute Compiler handles transform of code & data 26
More Scalable Designs Immens (Stanford) Nanocubes (AT&T) MapD (MIT) Abstract Rendering (Continuum)Synerscope 27
28
Achieve data visibility through hardware-accelerated designs (and deploy on the web ) 29
Visualize Magnitudes More Data in the Browser Leo Meyerovich CEO of Graphistry.com | UC Berkeley 30
Leaf Layout as Parallel Tree Traversals w,h x,y … 1. Works for all data sets 2. Compiler: CSS Schedule logical joins logical spawns Parallelism in each traversal! 31
parallel for loop level synchronous GPU Traversals: Flat & Level-Synchronous level 1 Tree level n w h x y Nodes in arrays flat Array per attribute Compiler handles transform of code & data 32
L2: 1MB RAM: 2GB way SIMT GPGPU core 1 4-way SIMD L1d: 32KB Today’s Supercomputer-in-a-Pocket core 1 Prefetch Engine 1 33 Challenge: Parallelize Data Visualization Phone 16-lane CPU 1024-lane GPU
circ(…) Problem: Dynamic Memory Allocation on GPU? square(…) rect(…); … line(…); … rect(…); … oval(…) function circ (x,y,r) { buffer = new Array(r * 10) for (i = 0; i < r * 10; i++) buffer[i] = cos(i) } dynamic allocation
Dynamic Allocation as SIMD Traversals allocCirc(…) 4 allocRect(…) 6 allocLine(…) 6 allocRect(…) 7 fillCirc(…) fillRect(…) fillLine(…) fillRect(…) 1. Prefix sum for needed space 2. Allocate buffers 3. Distribute offsets & compute 4. Give OpenGL buffer pointer
CPU vs. GPU for Election Treemap: 5 traversals over 100K nodes 36 WebCL: 30X WebCL: 70X COMBINED: 54X !