Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
An Introduction to Variational Methods for Graphical Models.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.
Hidden Markov Models.
Hidden Markov Models Modified from:
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Lecture 5: Learning models using EM
Phylogenetic Trees Presenter: Michael Tung
Conditional Random Fields
Comparative ab initio prediction of gene structures using pair HMMs
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Graphical models for part of speech tagging
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
John Lafferty Andrew McCallum Fernando Pereira
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Part of Speech Tagging in Context month day, year Alex Cheng Ling 575 Winter 08 Michele Banko, Robert Moore.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Hidden Markov Models BMI/CS 576
Learning Deep Generative Models by Ruslan Salakhutdinov
Constrained Hidden Markov Models for Population-based Haplotyping
Group Norm for Learning Latent Structural SVMs
Hidden Markov Models Part 2: Algorithms
Probabilistic Models with Latent Variables
Dimension reduction : PCA and Clustering
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Role of Stein’s Lemma in Guaranteed Training of Neural Networks
Presentation transcript:

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir), Kamalika Chaudhuri and Chicheng Zhang (UCSD)

Human Genome-wide Association Studies ~12,000 human disease SNPs known Follow-up studies of these SNPs are needed to advance personalized medicine ~90% of these disease SNPs are not in genes

Chromatin State Annotation via HMMs Input: a set of chromatin marks in a cell type Output: hidden states (i.e. colors) representing biological features such as enhancers, promoters, etc.

Major Human Epigenomics Projects Raw data = 3000X coverage of the human genome = 10 trillion bases

Overview of the Talk Spectral learning of HMMs for predicting chromatin states for a single cell type Hsu, Kakade, Zhang, COLT 2009 AGHKT, J. Machine Learning Research 2014 Song and Chen, Genome Biology, 2015 Extension to multiple cell types using HMMs with very large but structured state spaces Biesinger et al., BMC Bioinformatics 2013 Zhang, Song, Chaudhuri, Chen, NIPS 2015 Song, Zhang, Chaudhuri, Chen, in review

Input and Output Input: Single long sequence where at each position we are given a binary vector We’ll treat the binary vector as a single character for learning the parameters But we’ll display the parameters using the marginal probabilities of each component of the vector For Tree HMMs, we have multiple such sequences all related by a fixed tree Output: HMM parameters and segmentation of the sequence For Tree HMMs, multiple sets of HMM parameters and segmentations for each sequence

Spectral Learning for HMMs Expectation-Maximization is slow and only finds local optimum, so we use the Method of Moments The 2 nd moment is a matrix of counts of pairs of consecutive observations The 3 rd moment is a tensor of counts of triples of consecutive observations

Symmetrization, Orthogonalization and Tensor Decomposition Assuming the O i are full rank, we can find linear transformations to symmetrize the pairs matrix and triples tensor Orthogonalize the triples tensor with an SVD of the pairs matrix Perform orthogonal tensor decomposition

Running time comparisons The workstation version of Segway took 5 days to train parameters

Empirical Sample Complexity Theoretically, the sample complexity depends on the distribution of observations in the genome Our data is power law distributed with an exponent of ~2, which results in a low theoretical sample complexity Empirically, prediction accuracy plateaus after using ~50% of the genome for training

Comparison of Marginal Emission Matrices

Enrichment of Disease-Associated Variants There is higher enrichment of disease SNPs in chromatin state 20 which was found only by spectral learning

Using the spectral algorithm as an initializer to a local search algorithm (EM)

Precision-recall curve for experimentally validated biological features

HMMs with tree structured hidden states Next we consider state structured HMMs In our method Spectral-Tree, we introduce three algorithmic ideas Separating root-to-leaf paths Skeletensor construction Product projections

Idea 1 – Separating root-to-leaf paths Let D be the number of nodes in the tree (e.g. 9) The naïve algorithm runs an HMM with m D possible states and n D possible observations and has run time Ω (n D m D ) We consider each root-to-leaf path separately This gives run time O(DN m 3d + Dn 2d m d ) where d is length of the longest root-to-leaf path (e.g. 2 or 3)

Idea 2 - Skeletensor Construction We capture the effect of the root-to-leaf path and project it from a tensor of dimension n d into a tensor of dimension n (i.e. the size of one node), called the Skeletensor This is the form of the symmetrization matrices:

Idea 3 - Product Projections Avoid constructing the n dimensional (e.g. 256) tensors by projecting them to dimension m (e.g. 6) This brings the runtime down to O(DNm 2d + Dn 2 m + Dm 3d ) The idea is that the observation matrices at the nodes are conditionally independent given the hidden states These 3 algorithmic ideas work well when The tree has low depth Number of observations >> Number of hidden states

Comparison of running times The naïve exponential algorithm ran out of memory A Structured Mean Field (SMF) variational Expectation-Maximization method that additionally constrains each node to have the same parameters took 13 hours Our algorithm took 22 min

Comparison of Emission Parameters

Prediction accuracy of experimentally validated promoters (F1 scores)

Cell Type Specific Emission Parameters

Spectral Learning is well-suited for Genomics 1.Large sample sizes in Genomics One copy of the human genome = 3 billion bases Efficient algorithms are needed Enough samples to compute Method of Moments estimators 2.Biological interpretability Simple models (e.g. Hidden Markov Models) seem quite close to the biology Parameter estimation and interpretation is important Neural Networks or Observable Operator Models do not allow for this 3.Class imbalance is pervasive in Genomics Often, a few hidden states explain most of the data, i.e. high class imbalance Spectral Learning is more robust to class imbalance than maximum likelihood