FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

1 Manifold Alignment for Multitemporal Hyperspectral Image Classification H. Lexie Yang 1, Melba M. Crawford 2 School of Civil Engineering, Purdue University.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Component Analysis (Review)
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
Dimensionality Reduction PCA -- SVD
Face Recognition By Sunny Tang.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
AGE ESTIMATION: A CLASSIFICATION PROBLEM HANDE ALEMDAR, BERNA ALTINEL, NEŞE ALYÜZ, SERHAN DANİŞ.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization Haesun Park Division of Computational Science and Engineering College.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Clustering and Dimensionality Reduction Brendan and Yifang April
Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
New Geometric Methods of Mixture Models for Interactive Visualization PIs: Jia Li, Xiaolong (Luke) Zhang, Bruce Lindsay Department of Statistics College.
Principal Component Analysis
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Multivariate Data Analysis Chapter 10 - Multidimensional Scaling
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Visual Analytics for Interactive Exploration of Large-Scale Documents via Nonnegative Matrix Factorization Jaegul Choo*, Barry L. Drake †, and Haesun Park*
Representative Previous Work
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Enhancing Tensor Subspace Learning by Element Rearrangement
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
This week: overview on pattern recognition (related to machine learning)
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.
Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Manifold learning: MDS and Isomap
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
H. Lexie Yang1, Dr. Melba M. Crawford2
Non-Linear Dimensionality Reduction
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
Discriminant Analysis
June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Spectral Methods for Dimensionality
Shuang Hong Yang College of Computing, Georgia Tech, USA Hongyuan Zha
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
School of Computer Science & Engineering
Unsupervised Riemannian Clustering of Probability Density Functions
Machine Learning Dimensionality Reduction
Introduction PCA (Principal Component Analysis) Characteristics:
Dimension reduction : PCA and Clustering
Feature space tansformation methods
Nonlinear Dimension Reduction:
Using Manifold Structure for Partially Labeled Classification
Presentation transcript:

FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park School of Computational Science and Engineering Georgia Institute of Technology FODAVA Review Meeting, Dec. 9, 2010

Challenges in Analyzing High Dimensional Massive Data on Visual Analytics System Screen Space and Visual Perception: low dim and number of available pixels fundamentally limiting constraints High dimensional data: Effective dimension reduction Large data sets: Informative representation of data Speed: necessary for real-time, interactive use Scalable algorithms Adaptive algorithms Development of Fundamental Theory and Algorithms in Data Representations and Transformations to enable Visual Understanding

Dimension Reduction Dimension reduction with prior info/interpretability constraints Manifold learning Informative Presentation of Large Scale Data Sparse recovery by L 1 penalty Clustering, semi-supervised clustering Multi-resolution data approximation Fast Algorithms Large-scale optimization/matrix decompositions Adaptive updating algorithms for dynamic and time-varying data, and interactive vis. Data Fusion Fusion of different types of data from various sources Fusion of different uncertainty level Integration with DAVA systems Testbed, Jigsaw, iVisClassifier, iVisClustering,.. FODAVA-Lead Research Topics

FODAVA-Lead Research Presentation H. Park – Overview of the FODAVA-lead research, FODAVA Test-bed; Two stage method for 2D/3D representation of clustered data, InteractiveVisualClassifier, InteractiveVisualClustering, Info space alignments for information fusion (multi-language document analysis) A. Gray – Nonlinear dimension reduction (manifold learning), Fast computation of neighborhood graphs, Fast optimizations for SVMs V. Koltchinskii – Low rank matrix estimation and kernel learning on graphs, Sparse recovery, Multiple kernel learning and fusion of data with heterogeneous types (multi language document analysis) J. Stasko – Improved analytical capabilities in JIGSAW, Interplay between math/comp and interactive visualization R. Monteiro – Sparse Principal Component Analysis and Feature selection based on L1 regularized optimization (POSTER)

FODAVA Research Test Bed for High Dimensional Massive Data Open source software Integrates foundational results from FODAVA teams as well as other widely utilized methods (e.g. PCA) Easily accessible to a wide community of researchers Makes methods/algorithms readily available to VA research community and relevant to applications Identifies effective methods for specific problems (evaluation) A base for specialized VA systems (e.g. iVisClassifier, iVisClustering) FODAVA Fundamental Research Applications Test Bed

Vector Rep. of Raw Data Text Image Audio … Informative Representation and Transformation Visual Representation Dimension Reduction (2D/3D) Temporal Trend Uncertainty Anomaly/Outlier Causal relationship Zoom in/out by dynamic updating … Clustering Summarization Regression Multi-Resolution Data Reduction Multiple Kernel Leaning … Label Similarity Density Missing value … Interactive Analysis Modules in FODAVA Test Bed

iVisClassifier [VAST10] (J. Choo, H. Lee, J. Kim, HP) Interactive visual classification system using supervised dimension reduction –Biometric recognition –Text classification –Search space reduction iVisClustering (H. Lee, J. Kihm, J. Choo, J. Stasko, HP) Interactive visual clustering system using topic modeling (LDA) for text clustering

Two-stage Linear Discriminant Analysis for 2D/3D Representation of Clustered Data and Computational Zooming in/out [VAST09, J. Choo, S. Bohn, HP] max (G T S b G)min (G T S w G) & max trace ((G T S w G) -1 (G T S b G)) Regularization in LDA Small regularization Large regularization

2D Visualization of Clustered Image and Audio Data Spoken Letters (Audio)Handwritten Digits (Image) PCA Rank-2 LDA PCA Rank-2 LDA

iVisClassifier: Computational Zoom-in LDA scatter plot, Cluster level PC, Bases view and Heat Map Applying LDA recursively on the selected subset of data

iVisClassifier: Cooperative Filtering (Poster and Demo) Utilizing brushing-and-linking

Fusion based on Information Space Alignment (J. Choo, S. Bohn, G. Nakamura, A. White, HP) Want: Unified vector representations of heterogeneous data sets Utilize: Reference correspondence information between data pairs, cluster correspondence, etc. Multi-lingual iVisClassifier Two conflicitng criteria: maximize alignment and minimize deformation Data set A (English)Data set B (Spanish) Fused data sets Existing methods: Constrained Laplacian Eigenmap, Parafac2, Procrustes analysis, …

Graph Embedding Approach 1.Represent each data matrix as a graph 2. Add zero-length edges between reference point pairs 3. Apply graph embedding algorithm Data setsSimilarity graph Fused data Matrix representation of graphs e.g., Nonmetric multidimensional scaling (preserving rank order of distances) min ∑(d f A (i,j)- ḋ A (i,j)) 2 + ∑(d f B (i,j)- ḋ B (i,j)) 2 + µ∑(d f AB (r,r)- ḋ AB (r,r)) 2 subject to ḋ AB (r,r)< ḋ A (i,j), ḋ AB (r,r)< ḋ B (i,j) for 1 ≤ r ≤ R and i ≠ j, ḋ : rank orders (POSTER)

Evaluation: Cross-domain Retrieval English-Spanish Documents Document(Eng)-Phoneme Data Deformation Alignment Parafac2 Nonmetric MDS Metric MDS Laplacian Eig. Procrustes K in K-NN in fused space

Summary / Future Research Informative 2D/3D Representation of Data Clustered Data: Two-stage dimension reduction methods effective for a wide range of problems Interpretable Dimension Reduction for nonnegative data: NMF Customized Fast Algorithms for 2D/3D Reduction needed Dynamic Updating methods for Efficient and Interactive Visualization Visual Analytic Methods for Foundational Problems Classification Information Fusion by Space Alignment Clustering Information Fusion via Space Alignment FODAVA Research Test bed and VA System Development Sparse methods with L1 regularization Sparse Solution for Regression Sparse PCA (with Renato Monteiro)