1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

An Active contour Model without Edges
ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
Fast Algorithms For Hierarchical Range Histogram Constructions
A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.
An Introduction of Support Vector Machine
Computer vision: models, learning and inference Chapter 8 Regression.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
“Random Projections on Smooth Manifolds” -A short summary
Principal Component Analysis
Curve-Fitting Regression
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Isolated-Word Speech Recognition Using Hidden Markov Models
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Randomized Algorithms for Bayesian Hierarchical Clustering
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Multidimensional Scaling By Marc Sobel. The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
CGH Data BIOS Chromosome Re-arrangements.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Curve Simplification under the L 2 -Norm Ben Berg Advisor: Pankaj Agarwal Mentor: Swaminathan Sankararaman.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Intelligent and Adaptive Systems Research Group A Novel Method of Estimating the Number of Clusters in a Dataset Reza Zafarani and Ali A. Ghorbani Faculty.
Data Transformation: Normalization
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Clustering (3) Center-based algorithms Fuzzy k-means
Fitting Curve Models to Edges
Hidden Markov Models Part 2: Algorithms
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Feature space tansformation methods
Generally Discriminant Analysis
Geology 491 Spectral Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College

2 What is Neighboring Feature Clustering Given an m × n matrix M,where m denotes m samples and n denotes n (ordered) dimensional features, the goal is to find a intrinsic partition of the features based on their characteristics such that each cluster is a continuous piece of features. We assume there is a natural ordering of features that has relevance to the problem being solved – E.g., in spectral datasets, such characteristics could be correlations – For example, if we decide feature 1 and 10 belong to a cluster, feature 2 to 9 should also belong to that cluster. – ZHIFENG: PLEASE IMPROVE THIS SLIDE, PROVIDE AN INTUITIVE DIAGRAM

3 MR spectral features and DNA Copy Number??? MR spectral features are highly redundant suggesting that the data lie in some low-dimensional space (ZHIFENG: WHAT DO YOU MEAN BY LOW DIMENSIONAL SPACE - CLARIFY) Neighboring spectral features of MR spectra are highly correlated Using NFC, we can partition the features into clusters. A cluster can be represented by a single feature, hence reducing the dimensionality. This idea can be applied to DNA copy number analysis too. Zhifeng: Yuhang said these two are not related!! Please explain how these are related.

4 Use MDL method to solve NFC Reduce NFC into a one dimensional piece-wise linear approximation problem. Given a sequence of n one dimensional points, find the optimal step function-like line segments that can be fitted to the points Fig. 1. Piecewise linear approximation [3] [4] is usually 2D. Here we use its concept for a 1D situation. We use minimum description length (MDL) method [2] to solve this reduced problem. Zhifeng: define and explain MDL

5 Minimum Description Length (MDL) Zhifeng, please provide a slide to define this EXPLAIN HOW THE TRANSFORMATION IS DONE (AS IN [1]) TO GIVE 1D piece-wise linear approximation. Represent all the points by two line segments. Trade-off between approximation accuracy and number of line segments. A compromise can be made using MDL. ??? Zhifeng: it is all very cryptic, pieces of explanation are missing!

6 Outline The problem – Spectral data – The abstract problem Related work – HMM based, partial sum based, maximum likelihood based Our approach – Problem reduced to 1D linear approximation – MDL approach

7 Reducing NFC to 1D Piece-Wise Linear Approximation Problem 1 Let correlation coefficient matrix of M be denoted as C. LetC ∗ be the strictly upper triangular matrix derived from 1−|C| (entries near 0 imply high correlation between the corresponding two features). For features from i to j (1 ≤ i ≤ j ≤ n), the submatrix C ∗ i:j,i:j depicts pairwise correlations. We use its entries (excluding lower and diagonal entries) as the points to be explained by a line in the 1D piece-wise linear approximation problem. The objective is to find the optimal piece-wise line segments to fit those created points. Points near 0 mean high correlation. We need to force high correlations among a set. Thus the points are always approximated by 0.

8 example For example, suppose we have a set with points all around 0.3. In piece-wise linear approximation, it is better to use 0.3 as the approximation. However in NFC, we should penalize the points that stray away from 0. So we still use 0 as the approximation. Unlike usual 1D piece-wise linear approximation problem, the reduced problem has dynamic points (because they are created on the fly). Zhifeng: provide figure to illustrate above example

9 Spectral data MR spectral data – High dimensional data points – Spectral features are highly redundant (high correlation) – Find neighboring features with high correlation in a spectral dataset, such as a MR spectral dataset. frequeny intensit Fig. 1 high dimensional data points Fig. 2 correlation coefficient matrix Both axes are the features or the number of dimensions

10 Problem Finding a low-dimensional space - – zhifeng: define low dimensional space – Curse of dimensionality We extract an abstract problem: Neighboring Feature Clustering (NFC) – Features are ordered. Each cluster contains only neighboring features. – Find an optimal clusters according to certain criteria

11 Another application (with variation) Array Comparative Genomic Hybridization to detect copy number alterations. aCGH data are noisy – Smoothing – Segmentation Fig. 3 aCGH technology Fig. 4 aCGH data (smoothed). The X axis is log ratio Fig.5 aCGH data(segmented). The X axis is log ratio

12 Related work An algorithm trying to solve a similar problem – Baumgartner, et al, “Unsupervised feature dimension reduction for classification of MR spectra”, Magnetic Resonance Image, 22: ,2004 An extensive literature on the reduced problem – The, et al, “On the detection of dominant points on digital curves”, IEEE PAMI, 11(8) , 1989 – Statistical methods… Fig. 6 1D piece-wise approximation

13 Related work: statistical methods HMM based – Fridlyand, et al, “Hidden Markov models approach to the analysis of array CGH data”, J. Multivariate Anal., 90, Partial sum based – Lipson etc., ‘”Efficient Calculation of Interval Scores for DNA copy Number Data analysis”, RECOMB 2005 Maximum likelihood based – Picard, etc., “A statistical approach for array CGH data analysis”, BMC Bioinformatics, 6:27,2005

14 Framework of the method proposed 3. MDL code length (revised) frequency intensity 1. Correlation coefficient matrix 13 2 n-1 n C 1, 2 C 2, 3 … C 3, n-1 C 2, n-1 C 1, n-1 C n- 1,n C 3, n C 2, n C 1,2 2. For each pair of features 4. Code length matrix 5. Shortest path (dynamic programming) intensity frequency Fig. 7 our approach

15 Minimum Description Length Information Criterion – A model selection scheme – Common information criteria are Akaike Information Criterion(AIC), Bayesian Information Criterion (BIC), and Minimum Description Length (MDL) – MDL is to encode the model and the data given the model. The balance is achieved in terms of code length Fig. 6 1D piece-wise approximation

16 Encoding model and data given model For each pair (n*(n-1)/2 in total) of features – Encoding model Cluster boundary, Gaussian parameter (standard deviation) – Encoding data given model d Fig.8 encoding the data given model for each feature pair

17 Minimize the code length Code length matrix Shortest path – Recursive function – Dynamic programming 13 2 n-1 n C 1,2 C 2,3 … C 3,n-1 C 2,n-1 C 1,n-1 C n-1,n C 3,n C 2,n C 1,2 Fig. 9 alternative representation of matrix C Fig. 10 Recursive function for the shortest path

18 Results We test on simulated data. Fig. 11 the revised correlation matrix and the computed code length matrix