Identifying co-regulation using Probabilistic Relational Models by Christoforos Anagnostopoulos BA Mathematics, Cambridge University MSc Informatics, Edinburgh.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Computer vision: models, learning and inference Chapter 8 Regression.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Supervised Learning Recap
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Visual Recognition Tutorial
1 Identification of overlapping biclusters using Probabilistic Relational Models Tim Van den Bulcke Hui Zhao Kristof Engelen Bart De Moor Kathleen Marchal.
Classification and risk prediction
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Visual Recognition Tutorial
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Outline Separating Hyperplanes – Separable Case
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Introduction to biological molecular networks
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Logistic Regression William Cohen.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Qian Liu CSE spring University of Pennsylvania
Learning gene regulatory networks in Arabidopsis thaliana
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Learning Sequence Motif Models Using Expectation Maximization (EM)
Data Mining Lecture 11.
Dynamical Statistical Shape Priors for Level Set Based Tracking
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Estimating Networks With Jumps
1 Department of Engineering, 2 Department of Mathematics,
Probabilistic Latent Preference Analysis
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Presentation transcript:

Identifying co-regulation using Probabilistic Relational Models by Christoforos Anagnostopoulos BA Mathematics, Cambridge University MSc Informatics, Edinburgh University supervised by Dirk Husmeier

General Problematic Bringing together disparate data sources: Promoter sequence data...ACGTTAAGCCAT......GGCATGAATCCC...

General Problematic Bringing together disparate data sources: Promoter sequence data...ACGTTAAGCCAT......GGCATGAATCCC... mRNA Gene expression data gene 1: overexpressed gene 2: overexpressed...

General Problematic Bringing together disparate data sources: Promoter sequence data...ACGTTAAGCCAT......GGCATGAATCCC... mRNA Gene expression data gene 1: overexpressed gene 2: overexpressed... Proteins Protein interaction data protein 1 protein 2 ORF 1 ORF AAC1 TIM10 YMR056C YHR005CA AAD6 YNL201C YFL056C YNL201C

Our data Promoter sequence data...ACGTTAAGCCAT......GGCATGAATCCC... mRNA Gene expression data gene 1: overexpressed gene 2: overexpressed...

Bayesian Modelling Framework Bayesian Networks

Bayesian Modelling Framework Bayesian Networks Conditional Independence Assumptions Factorisation of the Joint Probability Distribution UNIFIED TRAINING

Bayesian Modelling Framework Bayesian Networks Probabilistic Relational Models

Aims for this presentation: 1.Briefly present the Segal model and the main criticisms offered in the thesis 2.Briefly introduce PRMs 3.Outline directions for future work

The Segal Model Cluster genes into transcriptional modules... Module 1 gene Module 2 ?

The Segal Model Module 1 gene Module 2 P(M = 1)P(M = 2)

The Segal Model How to determine P(M = 1)? Module 1 gene P(M = 1)

The Segal Model How to determine P(M = 1)? Module 1 Motif Profile motif 3: active motif 4: very active motif 16: very active motif 29: slightly active gene

The Segal Model How to determine P(M = 1)? Module 1 Motif Profile motif 3: active motif 4: very active motif 16: very active motif 29: slightly active Predicted Expression Levels Array 1: overexpressed Array 2: overexpressed Array 3: underexpressed... gene

The Segal Model How to determine P(M = 1)? Module 1 Motif Profile motif 3: active motif 4: very active motif 16: very active motif 29: slightly active Predicted Expression Levels Array 1: overexpressed Array 2: overexpressed Array 3: underexpressed... gene P(M = 1)

The Segal model PROMOTER SEQUENCE

The Segal model PROMOTER SEQUENCE MOTIF PRESENCE

The Segal model PROMOTER SEQUENCE MOTIF PRESENCE MOTIF MODEL

The Segal model MOTIF PRESENCE MODULE ASSIGNMENT

The Segal model MOTIF PRESENCE MODULE ASSIGNMENT REGULATION MODEL

The Segal model MODULE ASSIGNMENT EXPRESSION DATA

The Segal model MODULE ASSIGNMENT EXPRESSION DATA EXPRESSION MODEL

Learning via hard EM HIDDEN

Learning via hard EM Initialise hidden variables

Learning via hard EM Initialise hidden variables Set parameters to Maximum Likelihood

Learning via hard EM Initialise hidden variables Set parameters to Maximum Likelihood Set hidden values to their most probable value given the parameters (hard EM)

Learning via hard EM Initialise hidden variables Set parameters to Maximum Likelihood Set hidden values to their most probable value given the parameters (hard EM)

Motif Model OBJECTIVE: Learn motif so as to discriminate between genes for which the Regulation variable is “on” and genes for which it is “off”. r = 1 r = 0

Motif Model – scoring scheme...CATTCC......TGACAA... high score: low score:

Motif Model – scoring scheme...CATTCC......TGACAA... high score: low score:...AGTCCATTCCGCCTCAAG... high scoring subsequences

Motif Model – scoring scheme...CATTCC......TGACAA... high score: low score:...AGTCCATTCCGCCTCAAG... high scoring subsequences low scoring (background) subsequences

Motif Model – scoring scheme...CATTCC......TGACAA... high score: low score:...AGTCCATTCCGCCTCAAG... high scoring subsequences low scoring (background) subsequences promoter sequence scoring

Motif Model SCORING SCHEME P ( g.r = true | g.S, w ) w: parameter set can be taken to represent motifs

Motif Model SCORING SCHEME P ( g.r = true | g.S, w ) w: parameter set can be taken to represent motifs Maximum Likelihood setting Most discriminatory motif

Motif Model – overfitting TRUE PSSM

Motif Model – overfitting typical motif:...TTT.CATTCC... high score TRUE PSSM

Motif Model – overfitting typical motif:...TTT.CATTCC... high score TRUE PSSM INFERRED PSSM Can triple the score!

Regulation Model For each module m and each motif i, we estimate the association u mi P ( g.M = m | g. R ) is proportional to

Regulation Model: Geometrical Interpretation The (u mi ) i define separating hyperplanes Classification criterion is the inner product: Each datapoint is given the label of the hyperplane it is the furthest away from, on its positive side.

Regulation Model: Divergence and Overfitting pairwise linear separability overconfident classification Method A: dampen the parameters (eg Gaussian prior) Method B: make the dataset linearly inseparable by augmentation

Erroneous interpretation of the parameters Segal et al claim that: When u mi = 0, motif i is inactive in module m When u mi > 0 for all i,m, then only the presence of motifs is significant, not their absence

Erroneous interpretation of the parameters Segal et al claim that: When u mi = 0, motif i is inactive in module m When u mi > 0 for all i,m, then only the presence of motifs is significant, not their absence Contradict normalisation conditions!

Sparsity TRUE PROCESS INFERRED PROCESS

Sparsity Sparsity can be understood as pruning Pruning can improve generalisation performance (deals with overfitting both by damping and by decreasing the degrees of freedom) Pruning ought not be seen as a combinatorial problem, but can be dealt with appropriate prior distributions Reconceptualise the problem:

Sparsity: the Laplacian How to prune using a prior: choose a prior with a simple discontinuity at the origin, so that the penalty term does not vanish near the origin every time a parameter crosses the origin, establish whether it will escape the origin or is trapped in Brownian motion around it if trapped, force both its gradient and value to 0 and freeze it Can actively look for nearby zeros to accelerate pruning rate

Results: generalisation performance Synthetic Dataset with 49 motifs, 20 modules and 1800 datapoints

Results: interpretability TRUE MODULE STRUCTURE DEFAULT MODEL: LEARNT WEIGHTS LAPLACIAN PRIOR MODEL: LEARNT WEIGHTS

Regrets: BIOLOGICAL DATA

Aims for this presentation: 1.Briefly present the Segal model and the main criticisms offered in the thesis 2.Briefly introduce PRMs 3.Outline directions for future work

Probabilistic Relational Models How to model context – specific regulation? Need to cluster the experiments...

Probabilistic Relational Models Variable A can vary with genes but not with experiments

Probabilistic Relational Models We now have variability with experiments but also with genes!

Probabilistic Relational Models Variability with experiments as required but too many dependencies

Probabilistic Relational Models Variability with experiments as required provided we constrain the parameters of the probability distributions P(E|A) to be equal

Probabilistic Relational Models Resulting BN is essentially UNIQUE. But derivation: VAGUE, COMPLICATED, UNSYSTEMATIC

Probabilistic Relational Models GENES g.S 1, g.S 2,... g.R 1, g.R 2,... g.M g.E 1, g.E 1,... this variable cannot be considered an attribute of a gene, because it has attributes of its own that are gene-independent

Probabilistic Relational Models GENES g.S 1, g.S 2,... g.R 1, g.R 2,... g.M g.E 1, g.E 1,...

Probabilistic Relational Models GENES g.S 1, g.S 2,... g.R 1, g.R 2,... g.M g.E 1, g.E 1,... EXPERIMENTS e.Cycle_Phase e.Dye_Type

Probabilistic Relational Models GENES g.S 1, g.S 2,... g.R 1, g.R 2,... g.M g.E 1, g.E 1,... EXPERIMENTS e.Cycle_Phase e.Dye_Type An expression measurement is an attribute of both a gene and an experiment.

Probabilistic Relational Models GENES g.S 1, g.S 2,... g.R 1, g.R 2,... g.M g.E 1, g.E 1,... EXPERIMENTS e.Cycle_Phase e.Dye_Type MEASUREMENTS m(e,g).Level

Examples of PRMs - 1 Segal et al, “From Promoter Sequence to Gene Expression”

Examples of PRMs – 1 Segal et al, “From Promoter Sequence to Gene Expression”

Examples of PRMs - 2 Segal et al, “Decomposing gene expression into cellular processes”

Examples of PRMs - 2 Segal et al, “Decomposing gene expression into cellular processes”

Probabilistic Relational Models PRM = { BN 1, BN 2, BN 3,... } given Dataset 1 PRM = BN 1 given Dataset 2 PRM = BN 2 Relational schema :higher level description of data PRM:higher level description of BNs

Probabilistic Relational Models Relational vs flat data structures: Natural generalisation – knowledge carries over Expandability Richer semantics – better interpretability No loss in coherence Personal opinion (not tested yet): Not entirely natural as a generalisation Some loss in interpretability Some loss in coherence

Aims for this presentation: 1.Briefly present the Segal model and the main criticisms offered in the thesis 2.Briefly introduce PRMs 3.Outline directions for future work

Future research 1.Improve the learning algorithm ‘soften’ it by exploiting sparsity systematise dynamic addition / deletion

Future research 2. Model Selection Techniques improve interpretability learn the optimal number of modules in our model

Future research 2. Model Selection Techniques improve interpretability learn the optimal number of modules in our model Are such methods consistent? Do they carry over just as well in PRMs?

Future research 3. Fine tune the Laplacian regulariser to fit the skewing of the model

Future research 4. The choice of encoding the question into a BN/PRM is only partly determined by the domain Are there any general ‘rules’ about how to restrict the choice so as to promoter interpretability?

Future research 5. Explore methods to express structural, nonquantifiable prior beliefs about the biological domain using Bayesian tools.

Summary: 1.Briefly presented the Segal model and the main observations offered in the thesis 2.Briefly introduced PRMs 3.Hinted towards directions for future work