Clustering Features in High-Throughput Proteomic Data Richard Pelikan (or what’s left of him) BIOINF 2054 April 29 2005.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Machine Learning Homework

ECG Signal processing (2)

Outlines Background & motivation Algorithms overview

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Dimension reduction (1)

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Visual Recognition Tutorial

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Principal Component Analysis

Lecture 5: Learning models using EM

4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini

Dimensional reduction, PCA

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Independent Component Analysis (ICA) and Factor Analysis (FA)

Expectation Maximization Algorithm

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Unsupervised Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Objectives of Multiple Regression

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Gaussian Mixture Model and the EM algorithm in Speech Recognition

by B. Zadrozny and C. Elkan

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Quantification of Membrane and Membrane- Bound Proteins in Normal and Malignant Breast Cancer Cells Isolated from the Same Patient with Primary Breast.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Spectrum Sensing In Cognitive Radio Networks

Data Mining and Decision Support

Feature Selection and Extraction Michael J. Watts

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Review of statistical modeling and probability theory Alan Moses ML4bio.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

Principal Components Analysis ( PCA)

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Support Feature Machine for DNA microarray data

PREDICT 422: Practical Machine Learning

LECTURE 11: Advanced Discriminant Analysis

Classification of unlabeled data:

Principal Component Analysis (PCA)

Data Mining Lecture 11.

Latent Variables, Mixture Models and EM

Lecture 14 PCA, pPCA, ICA.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Generally Discriminant Analysis

Somi Jacob and Christian Bach

Canonical Correlation Analysis and Related Techniques

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

Clustering Features in High-Throughput Proteomic Data Richard Pelikan (or what’s left of him) BIOINF 2054 April

Outline Brief overview of clinical proteomics What do we intend to achieve with machine learning? Modelling profiles through mixture models Evaluation Conclusions

What is proteomics anyway? Proteomics – The study of Proteins and how they affect one’s state of health.  Think Genomics, but with proteins instead of genes.  It may be much more difficult to map the human proteome than it was to map the human genome.  A relatively new field of research. Lots of techniques, lots of ideas, only 25 hours in a day.

Why is proteomics useful? Primary reason: Efficient, early detection and diagnosis of disease  Invasive techniques such as biopsies are relatively high-risk. Not to mention, expensive!  Proteomic profiling allows for a non or minimally-invasive way of detecting a malady in a patient.  More affordable (for now), allowing for more opportunities for screening. Alternative reason: Prediction of response or non- response to a treatment  Often times, getting the treatment is worse than simply living with the disease.  Allows for a screening process to determine which treatment is best for a particular patient.

Vacuum Tube Laser Detector Lens Spectral View Chip Spots OK, I’m interested. How does proteomics work? It’s Spectrometry, my dear Watson. Crocodile

Proteomic Profiles Some examples from Pancreatic cancer patients. In this dataset:  57 Healthy patients (controls)  59 Cancerous patients (cases) Dataset is from UPCI Control Case

Feature Reduction Proteomic profiles can have anywhere from 15,000 to 370,000 intensities reported.  The pancreatic dataset has 60,270 m/z values  Too much for a statistical ML model to parameterize each intensity. The goal of feature reduction is to select the parts of profiles which are the most informative about class membership.  Feature = an individual intensity measurement.  Some features may be redundant  Some features may be noise

Feature Construction As opposed to the feature filtering approaches above, a new set of features can be constructed to represent the profiles. Techniques such as Principal Component Analysis (PCA) or Independent Component Analysis (ICA) are suited towards this task. PCA finds projections of the high-dimensional proteomic data into a low- dimensional subspace. The variance retained in the projection is maximal, so that there is a greater amount of dispersion between classes in which a decision boundary can be drawn. An additional benefit of PCA is that it identifies orthogonal sets of correlated features and constructs new features (components) that are uncorrelated, yet explain most of the variance in the data.

Creating clustered relations

Mixture Models Let X = {x 1,…, x n } be a set of n datapoints. Assume each x is generated from a mixture of m components M = {c 1,…,c m }, so that This is a mixture model with m components.

Determining component responsibility Using Bayes’ theorem, Interpret P(c j ) as prior probability of component j being “turned on” Interpret P(x|c j ) as a basis function which describes the behavior of x given by the component c j

Component Responsibility = Clustering Idea: Use the component responsibilities as features

Changing the basis functions Easy thing to do: Say x is computed as a confluence of m Gaussians Plug it back into the mixture model equation “Mixture of Gaussians” model

Mixture of Gaussians Computation of the posterior P(c j |x) is dependant on μ j and Σ j  May not assign proper “credit” to the jth component. Solution: Incorporate a hidden indicator variable z j, which indicates whether or not x was generated by component c j Interpretation:

Mixture of Gaussians & EM Algorithm Since z is unknown, we can use the EM algorithm to compute the values of z which maximize the ODL. In the M-step, we calculate the most likely values for the parameters of the m components.

Mixture of Gaussians: M-Step Mean (Co)Variance

Slight modification… Assume that the Gaussian components are all hyperspherical, that is, And let z c = The result? K-means algorithm The features? The values where I(c|x) = 1

ML Factor Analysis Now, let x be a linear combination of j factors z = {z 1,…,z j } + some noise u Columns of Λ represent sources from which x is generated  This is “normal” factor analysis.

Mixture of Factor Analyzers Let x be generated from the z factors, but allow the factors to spread across m loading matrices Here, the component c j is something of an indicator variable, so we search for E c,z (c j, z|x) The features are then computed as the weighted posteriors of c j conditoned on x, with P(z|x) as the weight.

Evaluation Step 1: Divide data into training/testing set Step 2: Compute clustered features on the training set Step 3: Reduce the samples in the testing set to the appropriate clusters Step 4: Classify the samples using an SVM

Mixture of Gaussians

K-means

Mixture of Factor Analyzers

Summary & Comparison PCA is given as a baseline for “good performance” Mixture of Gaussians does well, but still unsure about the behavior after adding features K-means is somewhat competitive MFA is likely too complicated for this task

Conclusions There are many ways you can cluster features in order to discover regulated sources  Sources can be examined for domain-specific importance  Choosing the number of sources is an open problem Still, the performance of these techniques were not substantially better than simple PCA.  Save yourself time and effort, go with a simple model