FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and.

FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology FODAVA Review Meeting, Dec. 2009

Challenges in Analyzing High Dimensional Massive Data on Visual Analytics System Screen Space and Visual Perception: low dim and number of available pixels fundamentally limiting constraints High dimensional data: Effective dimension reduction Large data sets: Informative representation of data Speed: necessary for real-time, interactive use Scalable algorithms Adaptive algorithms Development of Fundamental Theory and Algorithms in Data Representations and Transformations to enable Visual Understanding

Dimension Reduction Dimension reduction with prior info/interpretability constraints Manifold learning Informative Presentation of Large Scale Data Sparse recovery by L 1 penalty Clustering, semi-supervised clustering Multi-resolution data approximation Fast Algorithms Large-scale optimization/matrix decompositions Adaptive updating algorithms for dynamic and time-varying data, and interactive vis. Data Fusion Fusion of different types of data from various sources Fusion of different uncertainty level Integration with DAVA systems FODAVA-Lead Research Topics

FODAVA-Lead Presentations H. Park – Overview of proposed FODAVA research, Introduction to FODAVA Test-bed, dimension reduction of clustered data for effective representation, application to text, image, and audio data sets A. Gray – Nonlinear dimension reduction (manifold learning), fast data analysis algorithms, formulation of problems as large scale optimization problems (SDP) V. Koltchinskii – Multiple kernel learning method for fusion of data with heterogeneous types, sparse representation R. Monteiro – Convex optimization, SDP, novel approach for dimension reduction, compressed sensing and sparse representation J. Stasko – Visual Analytics System demo, interplay between math/comp and interactive visualization

Test Bed for Visual Analytics of High Dimensional Massive Data Open source software Integrates results from mathematics, statistics, numerical algorithms/optimization across FODAVA teams Easily accessible to a wide community of researchers Makes theory/algorithms relevant and readily available to VA and applications community Identifies effective methods for specific problems (evaluation) FODAVA Fundamental Research Applications Test Bed

Data Representation & Transformation Tasks Classification Clustering Regression Dimension reduction Density estimation Retrieval of similar items Automatic summarization … Mathematical, Statistical, and Computational Methods Modules in Data and Visual Analytics System for High Dimensional Massive Data Visual Representation and Interaction Raw Data Data in Input Space Analytical Reasoning

Vector Rep. of Raw Data Text Image Audio … Informative Representation and Transformation Visual Representation Dimension Reduction (2D/3D) Temporal Trend Uncertainty Anomaly/Outlier Causal relationship Zoom in/out by dynamic updating … Clustering Summarization Regression Multi- Resolution Data Reduction … Label Similarity Density Missing value … Interactive Analysis Modules in FODAVA Test Bed

Research in Data Representations and Transformations (by H. Park’s group) 2D/3D Representation of Data with Prior Information (J. Choo, J. Kim, K. Balasubramanian) Clustered Data: Two-stage dimension reduction for clustered data Nonnegative Data: Nonnegative Matrix Factorization (NMF) Nonnegative Tensor Factorization (NTF) Clustering and Classification (J. Kim, D. Kuang) New clustering algorithms based on NMF Semi-supervised clustering based on NMF Sparse Representation of Data (J. Kim, V. Koltchinskii, R. Monteiro) Sparse Solution for Regression Sparse PCA FODAVA Testbed Development (J. Choo, J. Kihm, H. Lee)

Nonnegativity Preserving Dim. Reduction Nonnegative Matrix Factorization (NMF) (Paatero&Tappa 94, Lee&Seung NATURE 99, Pauca et al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim and Park 06 Bioinformatics, Kim and Park 08 SIAM Journal on Matrix Analysis and Applications, …) AW H ~=~=  min || A – WH || F W>=0, H>=0 Why Nonnegativity Constraints? Better Approx. vs. Better Representation/Interpretation Nonnegative Constraints often physically meaningful Interpretation of analysis results possible Fastest Algorithm for NMF, with theoretical convergence (J. Kim and H. Park, IDCM08) NMF/ANLS: Iterate the following with Active Set-type Method fixing W, solve min H>=0 || W H –A|| F fixing H, solve min W>=0 || H T W T –A T || F Sparse NMF can be used as a clustering algorithm

2D Representation Utilize Cluster Structure if Known 2D representation of 700x1000 data with 7 clusters: LDA vs. SVD vs. PCA LDA+PCA(2)SVD(2)PCA(2)

High quality clusters have small trace(S w ) & large trace(S b ) Want: F s.t. min trace(F T S w F) & max trace(F T S b F) max trace ((F T S w F) -1 (F T S b F))  LDA (Fisher 36, Rao 48), LDA/GSVD (Park,..), max trace (F T S b F) with F T F=I  Orthogonal Centroid (Park et al. 03) max trace (F T (S w +S b )F) with F T F=I  PCA (Hotelling 33) F T F=I max trace (F T A A T F) with F T F=I  LSI (Deerwester et al. 90) Can easily be non-linearized using Kernel functions Optimal Reduced Dimension >> 3 in general Optimal Dimension Reducing Transformation trace (S b )trace (S w )

Two-stage Dimension Reduction for 2D Visualization of Clustered Data LDA + LDA = Rank2 LDA LDA + PCA OCM + PCA OCM + Rank-2 PCA on S F b = Rank-2 PCA on S b (In-Spire) (J. Choo, S. Bohn, H.Park, VAST 09)

2D Visualization: Newsgroups 2D visualization of Newsgroups Data (21347 dimension, 770 items, 11 clusters) g: talk.politics.guns p: talk.politics.misc c: soc.religion.christian r: talk.religion.misc p: comp.sys.ibm.pc.hardware a: comp.sys.mac.hardware y: sci crypt d: sci.med e: sci.electronics f: misc.forsale b: rec.sport.baseball Rank-2 LDALDA + PCA OCM + PCARank-2 PCA on S b

2D Visualization of Clustered Text, Image, Audio Data Medline Data (Text) LDA+PCA h : heart attack c : colon cancer o : oral cancer d : diabetes t : tooth decay PCA Rank-2 LDA PCA Rank-2 LDA PCA Facial Data (Image)Spoken Letters (Audio)

Weizmann Face Data (352 * 512 pixels each) x (28 persons * 52 images each) Significant variations in angle, illumination, and facial expression Visual Facial Recognizer: A Test Bed Application Problem No data analytic algorithm alone is perfect. e.g., Accuracy comparison Accuracy PCA60% LDA75% TensorFaces69% h-LDA81%

Visually reduce human’s search space → Efficiently utilize human visual recognition e.g., Test bed visualization of Weizmann images using Rank-2 LDA Visual Facial Recognizer: A Test Bed Application

Summary / Future Research Informative 2D/3D Representation of Data Clustered Data: Two-stage dimension reduction methods effective for a wide range of problems Interpretable Dimension Reduction for nonnegative data: NMF New clustering algorithms based on NMF Semi-supervised clustering based on NMF Extension to Tensors for Time-series Data Customized Fast Algorithms for 2D/3D Reduction needed Dynamic Updating methods for Efficient and Interactive Visualization Sparse methods with L1 regularization Sparse Solution for Regression Sparse PCA FODAVA Test bed Development

FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and.

Similar presentations

Presentation on theme: "FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and.

Similar presentations

Presentation on theme: "FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and."— Presentation transcript:

Similar presentations

About project

Feedback