Download presentation
Presentation is loading. Please wait.
1
1 Visualizing the Legislature Howard University - Systems and Computer Science October 29, 2010 Mugizi Robert Rwebangira
2
2 Technological Trends More Processing Power (Moore’s Law) More Bandwidth More Storage
3
3 BIG DATA 1,200 billion terabytes of data generated in 2008 More than generated in first 6000 years of human history Growing at 60% per year Source: The Economist, “The Data Deluge” February 26,2010
4
4 Problem Storing this Data Processing this Data Understanding this Data
5
5 High n, High d Large n Large d
6
The Challenge Classical Statistics Geared Towards “Large n, small d” case “Curse of dimensionality” – many algorithms are exponential in the number of dimensions 6
7
One Solution Data Visualization – try to picture the data in a way convenient for humans Related to ‘Dimensionality Reduction” – Go from large d to small d without sacrificing accuracy of representation 7
8
8 APPLICATION: POLITICS Now easily downloadable on government web site Getting data on congressional votes used to be difficult For example US Senate takes about 600 votes a session Question: how do we present this situation in a useful way?
9
The Data Senator Clinton YNANYYAYAYNANN… Senator Obama YAYNAYYAYNNANN… Senator McCain NANYANNANYYAYY… 9 Y – Aye N – Nay A - Absent
10
10 Solution: Math We want to PROJECT these points into 2 dimensions while preserving the features of the dataset (i.e similar senators should be close together) Yes =>1, No => -1, Absent = 0 So we get 100 {-1,0,1} vectors in 600 dimensional space Senator Clinton – 1 -1 0 -1 1 1 0… Senator Obama – 1 0 1 -1 0 1 1 0… Senator McCain – -1 0 -1 1 0 -1 -1…
11
11 Dataset Senate Roll Call 110 th Senate: 2007 - 2008 d = 634 votes n = 102 senators Data can be obtained from here: http://voteview.com/senate110.htm http://voteview.com/senate110.htm
12
Random Linear Projection Very simple technique with sophisticated mathematical underpinnings. Johnson-Lindenstrauss Lemma – Points in n dimensional space can be randomly projected into log(n) dimensions without “too much” distortion. “Concentration of Measure” phenomenon 12
13
Random Linear Projection Take n x d data matrix X Create d x 2 matrix R of random numbers drawn from N(0,1) distribution. Create Projection Y = X * R VERY fast 13
14
14
15
15 Principle Component Analysis Let X be the n x d data matrix where n = number of senators and d = number of votes We want to compute a matrix n x 2 matrix Y PCA computes the Y such that each dimension is maximally informative (in some sense) Can be computed by Singular Value Decomposition Can be thought of as the “optimal” LINEAR projection.
16
16 Principle Component Analysis (cont.) Take X = (V) (E) (V T ) (Singular Value Decomposition) Then Y = (V T )(X) Can be computed quickly O(n^3)
17
17 Results
18
Multi-dimensional Scaling Compute “optimal projection” that preserves Euclidean distances. Computationally intractable to do exactly – but we can do approximation. 18
19
19
20
Related Work Persi Diaconis – Using MDS on legislator data 20
21
21 Other Ideas 3-D projections? Immersive projections? (Minority Report!)
22
Conclusions Computational visualization techniques will only become more important Many application e.g. in business intelligence, bioinformatics 22
23
23 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.