Multi-task learning approaches to modeling context-specific networks

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Probabilistic analog of clustering: mixture models
CS479/679 Pattern Recognition Dr. George Bebis
Computer vision: models, learning and inference Chapter 8 Regression.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Integration of sensory modalities
The Simple Linear Regression Model: Specification and Estimation
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Today Wrap up of probability Vectors, Matrices. Calculus
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Review of statistical modeling and probability theory Alan Moses ML4bio.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Graph clustering to detect network modules
Spectral clustering of graphs
CS479/679 Pattern Recognition Dr. George Bebis
Hierarchical Agglomerative Clustering on graphs
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
Input Output HMMs for modeling network dynamics
Dynamics and context-specificity in biological networks
CH 5: Multivariate Methods
Menglong Li Ph.d of Industrial Engineering Dec 1st 2016
Linear Regression (continued)
Classification of unlabeled data:
Computer vision: models, learning and inference
Learning Sequence Motif Models Using Expectation Maximization (EM)
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
Hierarchical clustering approaches for high-throughput data
Finding modules on graphs
Hidden Markov Models Part 2: Algorithms
Bayesian Models in Machine Learning
Estimating Networks With Jumps
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Dynamics and context-specificity in biological networks
Seam Carving Project 1a due at midnight tonight.
Integration of sensory modalities
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Spectral methods for Global Network Alignment
Segmentation (continued)
Input Output HMMs for modeling network dynamics
Markov Random Fields Presented by: Vladan Radosavljevic.
Computing and Statistical Data Analysis / Stat 7
Boltzmann Machine (BM) (§6.4)
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Probabilistic Surrogate Models
Presentation transcript:

Multi-task learning approaches to modeling context-specific networks Sushmita Roy sroy@biostat.wisc.edu Computational Network Biology Biostatistics & Medical Informatics 826 Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu Oct 18th 2016

Strategies for capturing dynamics in networks Skeleton network-based approaches (Oct 13th) Dynamic Bayesian Networks (Oct 13/18th) Multi-task learning approaches (Oct 20th) Input/Output Hidden Markov Models (Oct 25th)

Goals for today Graphical Gaussian Models (GGMs) Define Multi-task learning for dynamic network inference Learning multiple GGMs for hierarchically related tasks Gene Network Analysis Tool (GNAT) Pierson et al., 2015, PLOS Computational Biology Applications to inference of tissue-specific networks

Recall the univariate Gaussian distribution The Gaussian distribution is defined by two parameters: Mean: Standard deviation:

A multi-variate Gaussian Distribution Extends the univariate distribution to higher dimensions (p in our case) As in the univariate case, we have two parameters Mean: a p-dimensional vector Co-variance: a p X p dimensional matrix Each entry of the matrix specifies the variance of co-variance between any two dimensions

A two-dimensional Gaussian distribution The mean The covariance matrix Probability density of a Gaussian with Co-variance Variance Image from Mathworks.com

Graphical Gaussian Models (GGMs) An undirected probabilistic graphical model Graph structure encode conditional independencies among variables The GGM assumes that X is drawn from a p-variate Gaussian distribution with mean and co-variance The graph structure specifies the zero pattern in the Zero entries in the inverse imply absence of an edge in the graph

Absence of edges and the zero-pattern of the precision matrix For example: Why is this? For this we need to know a few rules about matrix trace

Matrix trace properties Trace of a pXp square matrix M is the sum of the diagonal elements Trace of two matrices For a scalar a Trace is additive

Joint probability of a sample from a GGM It is easier to work with the log

Joint probability of a sample from a GGM (contd) The previous term can be re-written as This term is 0, when there is no contribution from the pair xi, xj

Data likelihood from a GGM Data likelihood of a dataset D={x1,..,xm} with m different samples from a GGM is

Rewriting the GGM likelihood with a trace Using the additivity of Trace This formulation is nice because now we can think of entries of Θ as regression weights that we need to maximize the above objective

Learning a Graphical Gaussian Model Learning the structure of a GGM entails estimating which entries in the inverse of the covariance matrix are non-zero These correspond to the direct dependencies among two random variables

Learning a GGM Requires us to solve the following optimization problem But if we want the inverse of covariance to be sparse, we can add a regularization term This is the idea behind the Graphical LASSO algorithm and also the GNAT approach

Goals for today Define Multi-task learning for dynamic network inference Graphical Gaussian Models (GGMs) Learning multiple GGMs for hierarchically related tasks Gene Network Analysis Tool (GNAT) Pierson et al., 2015, PLOS Computational Biology Applications to inference of tissue-specific networks

Consider the following problem Suppose we had N time points or conditions or cell types We can measure p different entities for each of the cell types/individuals We can repeat this experiment several times (m) We wish to identify the network is in each of the cell types/individuals that produces p different measurements

Multi-task learning (MTL) Suppose we had T different tasks that we want to solve For example, each task could be a regression task, to predict the expression level of gene from its regulator expression Suppose we knew the tasks are related Multi-task learning aims to simultaneously solve these T tasks while sharing information between them Different MTL frameworks might share information differently MTL is especially useful when for each task we do not have many samples We will look at a particular example of MTL

GNAT Given gene expression measurements from multiple tissues, several per tissue A tissue hierarchy relating the tissues Do Learn a gene-coexpression network for each tissue Naïve approach: Learn co-expression network in each tissue independently; Some tissues have 2 dozen samples (n<<<p) Key idea of GNAT is to exploit the tissue hierarchy to share information between each tissue co-expression network Pierson et al., Plos computational biology 2015

GNAT Each tissue’s gene network is a co-expression network: A Graphical Gaussian Model (GGM) Learning a GGM is equivalent to estimate the non-zeros in the inverse of the covariance matrix (precision matrix) Sharing information in a hierarchy by constraining the precision matrix of two tissues close on the hierarchy to be more similar to each other

Hierarchically related GGM learning tasks 7 5 6 1 2 3 4 p genes m1 samples m2 samples m3 samples m4 samples Estimate

GNAT objective function Parent of k from the hierarchy Sparse precision matrix Encourage similarity with parent node K: Total number of tasks mk: Number of samples in task k They don’t directly optimize this, but rather apply a two-step iterative algorithm

Two-step iterative algorithm in GNAT For each dataset/tissue at the leaf nodes k, learn an initial matrix Θk Repeat until convergence Optimize the internal matrices, Θp for all the ancestral nodes p keeping the leaf nodes fixed This can be computed analytically, because of the L2 penalty Optimize the leaf matrices Θk using their combined objective function

Updating the ancestral nodes To obtain the estimate of the ancestral precision matrices, we need to derive the objective with respect to each Θp(k) Turns out the ancestral matrix is an average of the child matrices Left and right child of p

Key steps of the GNAT algorithm Define tissue hierarchy based on gene expression levels Learn co-expression network in each leaf tissue Infer network in internal nodes and update leaf nodes Final inferred networks Networks in internal nodes is defined by an “average” of networks of its children

Results Ask whether sharing information helps Simulated data Apply to multi-tissue expression data from GTEX consortium 35 different human tissues Hierarchy learned from the expression matrix

Simulation experiment Use data from each tissue and generate five different sets Learn networks from four of the five tissues per tissue Assess the data likelihood on the hold out tests Baselines Independent learning per tissue Merging all datasets and learning one model Repeat for three gene sets

Does sharing information help? Three gene sets. Compute test data likelihood using 5 fold cross-validation Single network likelihood was too low to be shown!

Tissue hierarchy used he hierarchy was created using hierarchical clustering: for each tissue, the mean expression of each gene in the tissue was computed, and tissues with similar gene expression patterns were merged into clusters. Lower branching points represent clusters with more similar gene expression patterns. Many biologically plausible clusters are apparent: the brain and non-brain cluster, and clusters for the basal ganglia, cortex, adipose tissue, heart, artery, and skin. Compute the mean expression of each gene per tissue Tissues were clustered using hierarchical clustering of the mean expression vectors.

Biological assessment of the networks Pairs of genes predicted to be connected were shown to be co-expressed in third party expression databases. Including tissue-specific expression database. Genes predicted to be linked in a specific tissue were 10 times more likely to be co-expressed in specific tissues Test if genes linked in the networks were associated with shared biological functions Genes that shared a function, were linked 94% more often than genes not sharing a function

Examining tissue-specific properties of the networks Brighter the green, the more expressed is a gene. Blue circles: TFs Transcription factors specific to a tissue, tend to have a lot of connections, and connect to genes associated with other genes specific to the tissue

Defining tissue-specific and shared gene modules Use a clustering algorithm to group genes into modules while using the graph structure We will see algorithms that do this type of clustering For each module, assess conservation in other tissues based on the fraction of links present among genes in other tissues.

Modules captured tissue-specific functions An important immune-related module associated with blood-specific transcription factor GATA3. GATA3 and RUNX3 coordinately interact with other tissue-specific genes

Take away points Graphical Gaussian models can be used to capture direct dependencies Learning a GGM entails estimating the precision matrix Dynamics of networks: How networks change across different tissues GNAT: A multi-task learning approach to learn tissue-specific networks One task maps to a learning one GGM Share information between tasks using the hierarchy Has good generalization capability and infers biologically meaningful associations