FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park School of Computational Science and Engineering Georgia Institute of Technology FODAVA Review Meeting, Dec. 9, 2010
Challenges in Analyzing High Dimensional Massive Data on Visual Analytics System Screen Space and Visual Perception: low dim and number of available pixels fundamentally limiting constraints High dimensional data: Effective dimension reduction Large data sets: Informative representation of data Speed: necessary for real-time, interactive use Scalable algorithms Adaptive algorithms Development of Fundamental Theory and Algorithms in Data Representations and Transformations to enable Visual Understanding
Dimension Reduction Dimension reduction with prior info/interpretability constraints Manifold learning Informative Presentation of Large Scale Data Sparse recovery by L 1 penalty Clustering, semi-supervised clustering Multi-resolution data approximation Fast Algorithms Large-scale optimization/matrix decompositions Adaptive updating algorithms for dynamic and time-varying data, and interactive vis. Data Fusion Fusion of different types of data from various sources Fusion of different uncertainty level Integration with DAVA systems Testbed, Jigsaw, iVisClassifier, iVisClustering,.. FODAVA-Lead Research Topics
FODAVA-Lead Research Presentation H. Park – Overview of the FODAVA-lead research, FODAVA Test-bed; Two stage method for 2D/3D representation of clustered data, InteractiveVisualClassifier, InteractiveVisualClustering, Info space alignments for information fusion (multi-language document analysis) A. Gray – Nonlinear dimension reduction (manifold learning), Fast computation of neighborhood graphs, Fast optimizations for SVMs V. Koltchinskii – Low rank matrix estimation and kernel learning on graphs, Sparse recovery, Multiple kernel learning and fusion of data with heterogeneous types (multi language document analysis) J. Stasko – Improved analytical capabilities in JIGSAW, Interplay between math/comp and interactive visualization R. Monteiro – Sparse Principal Component Analysis and Feature selection based on L1 regularized optimization (POSTER)
FODAVA Research Test Bed for High Dimensional Massive Data Open source software Integrates foundational results from FODAVA teams as well as other widely utilized methods (e.g. PCA) Easily accessible to a wide community of researchers Makes methods/algorithms readily available to VA research community and relevant to applications Identifies effective methods for specific problems (evaluation) A base for specialized VA systems (e.g. iVisClassifier, iVisClustering) FODAVA Fundamental Research Applications Test Bed
Vector Rep. of Raw Data Text Image Audio … Informative Representation and Transformation Visual Representation Dimension Reduction (2D/3D) Temporal Trend Uncertainty Anomaly/Outlier Causal relationship Zoom in/out by dynamic updating … Clustering Summarization Regression Multi-Resolution Data Reduction Multiple Kernel Leaning … Label Similarity Density Missing value … Interactive Analysis Modules in FODAVA Test Bed
iVisClassifier [VAST10] (J. Choo, H. Lee, J. Kim, HP) Interactive visual classification system using supervised dimension reduction –Biometric recognition –Text classification –Search space reduction iVisClustering (H. Lee, J. Kihm, J. Choo, J. Stasko, HP) Interactive visual clustering system using topic modeling (LDA) for text clustering
Two-stage Linear Discriminant Analysis for 2D/3D Representation of Clustered Data and Computational Zooming in/out [VAST09, J. Choo, S. Bohn, HP] max (G T S b G)min (G T S w G) & max trace ((G T S w G) -1 (G T S b G)) Regularization in LDA Small regularization Large regularization
2D Visualization of Clustered Image and Audio Data Spoken Letters (Audio)Handwritten Digits (Image) PCA Rank-2 LDA PCA Rank-2 LDA
iVisClassifier: Computational Zoom-in LDA scatter plot, Cluster level PC, Bases view and Heat Map Applying LDA recursively on the selected subset of data
iVisClassifier: Cooperative Filtering (Poster and Demo) Utilizing brushing-and-linking
Fusion based on Information Space Alignment (J. Choo, S. Bohn, G. Nakamura, A. White, HP) Want: Unified vector representations of heterogeneous data sets Utilize: Reference correspondence information between data pairs, cluster correspondence, etc. Multi-lingual iVisClassifier Two conflicitng criteria: maximize alignment and minimize deformation Data set A (English)Data set B (Spanish) Fused data sets Existing methods: Constrained Laplacian Eigenmap, Parafac2, Procrustes analysis, …
Graph Embedding Approach 1.Represent each data matrix as a graph 2. Add zero-length edges between reference point pairs 3. Apply graph embedding algorithm Data setsSimilarity graph Fused data Matrix representation of graphs e.g., Nonmetric multidimensional scaling (preserving rank order of distances) min ∑(d f A (i,j)- ḋ A (i,j)) 2 + ∑(d f B (i,j)- ḋ B (i,j)) 2 + µ∑(d f AB (r,r)- ḋ AB (r,r)) 2 subject to ḋ AB (r,r)< ḋ A (i,j), ḋ AB (r,r)< ḋ B (i,j) for 1 ≤ r ≤ R and i ≠ j, ḋ : rank orders (POSTER)
Evaluation: Cross-domain Retrieval English-Spanish Documents Document(Eng)-Phoneme Data Deformation Alignment Parafac2 Nonmetric MDS Metric MDS Laplacian Eig. Procrustes K in K-NN in fused space
Summary / Future Research Informative 2D/3D Representation of Data Clustered Data: Two-stage dimension reduction methods effective for a wide range of problems Interpretable Dimension Reduction for nonnegative data: NMF Customized Fast Algorithms for 2D/3D Reduction needed Dynamic Updating methods for Efficient and Interactive Visualization Visual Analytic Methods for Foundational Problems Classification Information Fusion by Space Alignment Clustering Information Fusion via Space Alignment FODAVA Research Test bed and VA System Development Sparse methods with L1 regularization Sparse Solution for Regression Sparse PCA (with Renato Monteiro)