FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and.

Slides:



Advertisements
Similar presentations
Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology
Advertisements

Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Active Appearance Models
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization Haesun Park Division of Computational Science and Engineering College.
Data Visualization STAT 890, STAT 442, CM 462
Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Principal Component Analysis
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Learning the parts of objects by nonnegative matrix factorization D.D. Lee from Bell Lab H.S. Seung from MIT Presenter: Zhipeng Zhao.
3D Geometry for Computer Graphics
A Theory of Locally Low Dimensional Light Transport Dhruv Mahajan (Columbia University) Ira Kemelmacher-Shlizerman (Weizmann Institute) Ravi Ramamoorthi.
Effective Dimension Reduction with Prior Knowledge Haesun Park Division of Computational Science and Eng. College of Computing Georgia Institute of Technology.
A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU)
Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Visual Analytics for Interactive Exploration of Large-Scale Documents via Nonnegative Matrix Factorization Jaegul Choo*, Barry L. Drake †, and Haesun Park*
Joint Image Clustering and Labeling by Matrix Factorization
Graph-based Analytics
Enhancing Tensor Subspace Learning by Element Rearrangement
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Non Negative Matrix Factorization
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Enhancing Interactive Visual Data Analysis by Statistical Functionality Jürgen Platzer VRVis Research Center Vienna, Austria.
SAND C 1/17 Coupled Matrix Factorizations using Optimization Daniel M. Dunlavy, Tamara G. Kolda, Evrim Acar Sandia National Laboratories SIAM Conference.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project.
A Convergent Solution to Tensor Subspace Learning.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Non-negative Matrix Factorization
Multimedia Systems and Communication Research Multimedia Systems and Communication Research Department of Electrical and Computer Engineering Multimedia.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
2D-LDA: A statistical linear discriminant analysis for image matrix
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
SketchVisor: Robust Network Measurement for Software Packet Processing
Multiplicative updates for L1-regularized regression
School of Computer Science & Engineering
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Machine Learning Basics
Outline Multilinear Analysis
Principal Nested Spheres Analysis
Introduction PCA (Principal Component Analysis) Characteristics:
Dimension reduction : PCA and Clustering
Research Institute for Future Media Computing
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Non-Negative Matrix Factorization
Presentation transcript:

FODAVA-Lead Research Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology FODAVA Review Meeting, Dec. 2009

Challenges in Analyzing High Dimensional Massive Data on Visual Analytics System Screen Space and Visual Perception: low dim and number of available pixels fundamentally limiting constraints High dimensional data: Effective dimension reduction Large data sets: Informative representation of data Speed: necessary for real-time, interactive use Scalable algorithms Adaptive algorithms Development of Fundamental Theory and Algorithms in Data Representations and Transformations to enable Visual Understanding

Dimension Reduction Dimension reduction with prior info/interpretability constraints Manifold learning Informative Presentation of Large Scale Data Sparse recovery by L 1 penalty Clustering, semi-supervised clustering Multi-resolution data approximation Fast Algorithms Large-scale optimization/matrix decompositions Adaptive updating algorithms for dynamic and time-varying data, and interactive vis. Data Fusion Fusion of different types of data from various sources Fusion of different uncertainty level Integration with DAVA systems FODAVA-Lead Research Topics

FODAVA-Lead Presentations H. Park – Overview of proposed FODAVA research, Introduction to FODAVA Test-bed, dimension reduction of clustered data for effective representation, application to text, image, and audio data sets A. Gray – Nonlinear dimension reduction (manifold learning), fast data analysis algorithms, formulation of problems as large scale optimization problems (SDP) V. Koltchinskii – Multiple kernel learning method for fusion of data with heterogeneous types, sparse representation R. Monteiro – Convex optimization, SDP, novel approach for dimension reduction, compressed sensing and sparse representation J. Stasko – Visual Analytics System demo, interplay between math/comp and interactive visualization

Test Bed for Visual Analytics of High Dimensional Massive Data Open source software Integrates results from mathematics, statistics, numerical algorithms/optimization across FODAVA teams Easily accessible to a wide community of researchers Makes theory/algorithms relevant and readily available to VA and applications community Identifies effective methods for specific problems (evaluation) FODAVA Fundamental Research Applications Test Bed

Data Representation & Transformation Tasks Classification Clustering Regression Dimension reduction Density estimation Retrieval of similar items Automatic summarization … Mathematical, Statistical, and Computational Methods Modules in Data and Visual Analytics System for High Dimensional Massive Data Visual Representation and Interaction Raw Data Data in Input Space Analytical Reasoning

Vector Rep. of Raw Data Text Image Audio … Informative Representation and Transformation Visual Representation Dimension Reduction (2D/3D) Temporal Trend Uncertainty Anomaly/Outlier Causal relationship Zoom in/out by dynamic updating … Clustering Summarization Regression Multi- Resolution Data Reduction … Label Similarity Density Missing value … Interactive Analysis Modules in FODAVA Test Bed

Research in Data Representations and Transformations (by H. Park’s group) 2D/3D Representation of Data with Prior Information (J. Choo, J. Kim, K. Balasubramanian) Clustered Data: Two-stage dimension reduction for clustered data Nonnegative Data: Nonnegative Matrix Factorization (NMF) Nonnegative Tensor Factorization (NTF) Clustering and Classification (J. Kim, D. Kuang) New clustering algorithms based on NMF Semi-supervised clustering based on NMF Sparse Representation of Data (J. Kim, V. Koltchinskii, R. Monteiro) Sparse Solution for Regression Sparse PCA FODAVA Testbed Development (J. Choo, J. Kihm, H. Lee)

Nonnegativity Preserving Dim. Reduction Nonnegative Matrix Factorization (NMF) (Paatero&Tappa 94, Lee&Seung NATURE 99, Pauca et al. SIAM DM 04, Hoyer 04, Lin 05, Berry 06, Kim and Park 06 Bioinformatics, Kim and Park 08 SIAM Journal on Matrix Analysis and Applications, …) AW H ~=~=  min || A – WH || F W>=0, H>=0 Why Nonnegativity Constraints? Better Approx. vs. Better Representation/Interpretation Nonnegative Constraints often physically meaningful Interpretation of analysis results possible Fastest Algorithm for NMF, with theoretical convergence (J. Kim and H. Park, IDCM08) NMF/ANLS: Iterate the following with Active Set-type Method fixing W, solve min H>=0 || W H –A|| F fixing H, solve min W>=0 || H T W T –A T || F Sparse NMF can be used as a clustering algorithm

2D Representation Utilize Cluster Structure if Known 2D representation of 700x1000 data with 7 clusters: LDA vs. SVD vs. PCA LDA+PCA(2)SVD(2)PCA(2)

High quality clusters have small trace(S w ) & large trace(S b ) Want: F s.t. min trace(F T S w F) & max trace(F T S b F) max trace ((F T S w F) -1 (F T S b F))  LDA (Fisher 36, Rao 48), LDA/GSVD (Park,..), max trace (F T S b F) with F T F=I  Orthogonal Centroid (Park et al. 03) max trace (F T (S w +S b )F) with F T F=I  PCA (Hotelling 33) F T F=I max trace (F T A A T F) with F T F=I  LSI (Deerwester et al. 90) Can easily be non-linearized using Kernel functions Optimal Reduced Dimension >> 3 in general Optimal Dimension Reducing Transformation trace (S b )trace (S w )

Two-stage Dimension Reduction for 2D Visualization of Clustered Data LDA + LDA = Rank2 LDA LDA + PCA OCM + PCA OCM + Rank-2 PCA on S F b = Rank-2 PCA on S b (In-Spire) (J. Choo, S. Bohn, H.Park, VAST 09)

2D Visualization: Newsgroups 2D visualization of Newsgroups Data (21347 dimension, 770 items, 11 clusters) g: talk.politics.guns p: talk.politics.misc c: soc.religion.christian r: talk.religion.misc p: comp.sys.ibm.pc.hardware a: comp.sys.mac.hardware y: sci crypt d: sci.med e: sci.electronics f: misc.forsale b: rec.sport.baseball Rank-2 LDALDA + PCA OCM + PCARank-2 PCA on S b

2D Visualization of Clustered Text, Image, Audio Data Medline Data (Text) LDA+PCA h : heart attack c : colon cancer o : oral cancer d : diabetes t : tooth decay PCA Rank-2 LDA PCA Rank-2 LDA PCA Facial Data (Image)Spoken Letters (Audio)

Weizmann Face Data (352 * 512 pixels each) x (28 persons * 52 images each) Significant variations in angle, illumination, and facial expression Visual Facial Recognizer: A Test Bed Application Problem No data analytic algorithm alone is perfect. e.g., Accuracy comparison Accuracy PCA60% LDA75% TensorFaces69% h-LDA81%

Visually reduce human’s search space → Efficiently utilize human visual recognition e.g., Test bed visualization of Weizmann images using Rank-2 LDA Visual Facial Recognizer: A Test Bed Application

Summary / Future Research Informative 2D/3D Representation of Data Clustered Data: Two-stage dimension reduction methods effective for a wide range of problems Interpretable Dimension Reduction for nonnegative data: NMF New clustering algorithms based on NMF Semi-supervised clustering based on NMF Extension to Tensors for Time-series Data Customized Fast Algorithms for 2D/3D Reduction needed Dynamic Updating methods for Efficient and Interactive Visualization Sparse methods with L1 regularization Sparse Solution for Regression Sparse PCA FODAVA Test bed Development