Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.
Hidden Markov Model in Biological Sequence Analysis – Part 2
CS479/679 Pattern Recognition Dr. George Bebis
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Expectation Maximization
Supervised Learning Recap
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Classification and risk prediction
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Gaussian Mixture Example: Start After First Iteration.
Expectation Maximization Algorithm
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Maximum Likelihood (ML), Expectation Maximization (EM)
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Isolated-Word Speech Recognition Using Hidden Markov Models
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Lecture 19: More EM Machine Learning April 15, 2010.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
HMM - Part 2 The EM algorithm Continuous density HMM.
CS Statistical Machine learning Lecture 24
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Hidden Markov Models BMI/CS 576
Chapter 3: Maximum-Likelihood Parameter Estimation
Hidden Markov Models - Training
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Multivariate Methods Berlin Chen, 2005 References:
Stochastic Methods.
Presentation transcript:

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph

Outline Introduction HMM for aligning time series gene expression –Generative training of HMM –Discriminative training of HMM

Introduction A growing number of expression datasets are measured as time series: –Utilize its unique features (temporal evolution of profiles) –Address its unique challenges (different response rates of patients in the same class)

Introduction We use the HMMs with less states less than time points leading to an alignment of the different patient response rates. We develop a discriminative HMM classifier instead of traditional generative HMM.

HMM for aligning time series gene expression Because of different and varying response rate of each patient, we align a patient’s time series gene expression to a common profile. For a classification task we generate two such HMMs, one for good responder and one for poor responders. To avoid overfitting, the covariance matrix is assumed to be diagonal.

HMM for aligning time series gene expression Three state space topologies A state I has transitions to i+1, i+2, …,i+J, where maximum jump step J The first and third topologies can be used to align patients by modifying their transition probabilities based on the observed expression data.

HMM for aligning time series gene expression Time series gene expressions of K patients We measure the expression of G genes for each patient at T time points. A HMM with multivariate Gaussian emission probability is trained for each class

HMM for aligning time series gene expression is the transition probability from state i to j. are the mean and SD for the Gaussian distribution of stat j. The mean and SD of gene g in state j is denoted as. the posterior probability of stat j at time t of observation. is the probability of a transition from state i to state j at time t of observation.

Generative training of HMM Given labeled expression data we can learn the parameters of a HMM using the Baum-Welch algorithm. Class assignment is based on maximum conditional likelihood. MLE is optimal if the true model is indeed the assumed HMM and there is infinite data. Focus on the differences between positive and negative data, rather than on the most visible features.

Discriminative training of HMM Discriminative models a class of models used in machine learning for modeling the dependence of an unobserved variable y on an observed variable x. Within a statistical framework, this is done by modeling the conditional probability distribution P(y|x), which can be used for predicting y from x. The HMMs for both classes are learned concurrently and parameters in one of the models are affected by the parameters estimated for the other model.

Discriminative training of HMM To model the difference between positive and negative examples, we need to optimize a discriminative criteria. We use MMIE objective function: C k is the class (1 or 2) of the patient k.

Discriminative training of HMM The denominator will be represented by the likelihood of a combined HMM,, such that is called the denominator model. During the training, the denominator model is constructed in each iteration after the HMM and are updated. While updating one class, the HMM for that class is called the numerator model.

Discriminative training of HMM E-step This estimation is similar to the ones in the Baum— Welch algorithm

Discriminative training of HMM M-step MMIE updates the parameters by moving them toward the positive examples and away from the denominator model.

Discriminative training of HMM –A smoothing constant D E and D T needs to be added to both the numerator terms and the denominator terms to avoid negative transition probabilities or negative variances in emission probabilities. –If the smoothing constants are too small, update may not increase the (discriminative) object function, but if they are too large, convergence will be too slow. –Empirically, twice the lower bound leads to fast convergence.

Gene selection for time series expression classification Gene selection is critical in clinical expression classification: –The number of patients is small compared to the number of genes, resulting in overfitting. –The small subset of genes that discriminate between the classes can lead to boimarker discovery.

Gene selection for time series expression classification There are two primary approaches: –The “wrapper” approach The wrapper approach evaluates the classifier on different feature subset, and searches in the space of all possible feature subsets using the –The “filter” approach The filter approach does not rely on the underlying classifier, but instead uses a simpler criteria to filter out irrelevant features.

Gene selection for time series expression classification Backward stepwise feature selection method that utilize the alignment to the HMM profiles based on recursive feature elimination (RFE) algorithm, termed HMM- RFE. –Train the classifier, eliminate the feature whose contribution to the discrimination is minimal, and repeat iteratively until the stopping criteria is met.

Gene selection for time series expression classification Since the covariance matrix is diagonal, gene- expression levels are independent given the hidden states. Thus, if the states are known, the likelihood can be decomposed into terms involving each gene separately. We define the contribution to log adds of a gene g, dg, as

Gene selection for time series expression classification

Results The average expressions of four genes in MS patient treated with IFNB

Results Simulated dataset –100 patients, 50 patients were in class 1(good responders) and 50 in class 2 (poor responders) –100 genes were measured for each patient, with a maximum of 8 time points per tatient. –For each gene g, generate the Class 1 response profile by randomly selecting a segment of a sine wave, of length between 0 to,denote as a function.

Results –10 out of the 100 genes to be differential, the other 90 where assigned the same valude for Class 2. a g is +5 or -5, and b g is uniformly selected at random between -0.1 and 0.3.

Results –Scaling value s k between 0.5 to 1.5 for patient k

Results

MS dataset