Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Pattern Recognition and Machine Learning
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Supervised Learning Recap
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Introduction of Probabilistic Reasoning and Bayesian Networks
Chapter 4: Linear Models for Classification
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Visual Recognition Tutorial
Pattern Recognition and Machine Learning
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Classification and risk prediction
Data Mining Techniques Outline
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Lecture 5: Learning models using EM
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
6. Gene Regulatory Networks
Bayesian Networks Alan Ritter.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EM and expected complete log-likelihood Mixture of Experts
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Lecture 2: Statistical learning primer for biologists
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Data Modeling Patrice Koehl Department of Biological Sciences
Qian Liu CSE spring University of Pennsylvania
Learning Coordination Classifiers
Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics
Ch3: Model Building through Regression
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Generalized Belief Propagation
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Outline Motivation and background Methods –Shared base method –Reconciliation methods Results

The problem Given: protein sequence, knockout phenotype, gene expression profile, protein-protein interactions, and phylogenetic profile Predict a probability for every term in the Gene Ontology  Heterogeneous data  Missing data  Multiple labels per gene  Structured output

Consistent predictions Cytoplasmic membrane-bound vesicle (GO: ) Cytoplasmic vesicle (GO: ) is a The probability that protein X is a cytoplasmic membrane-bound vesicle must be less than or equal to the probability that protein X is a cytoplasmic vesicle.

Data sets

Kernels

SVM → Naïve Bayes Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Data 33 SVM/AL 1 SVM/AL 2 SVM/AL 3 SVM/AL 4 SVM/AL 5 SVM/AL 6 SVM/AL 7 SVM/AL 8 SVM/AL 33 Product, plus Bayes’ rule Probability 1 Probability 2 Probability 3 Probability 4 Probability 6 Probability 8 Probability 33 Probability Gaussian Asymmetric Laplace

SVM → logistic regression Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Data 33 SVM 1 SVM 2 SVM 3 SVM 4 SVM 5 SVM 6 SVM 7 SVM 8 SVM 33 Logistic regressor 1 Logistic regressor 2 Logistic regressor 3 Logistic regressor 11 Predict 1 Predict 2 Predict 3 Predict 4 Predict 6 Predict 8 Predict 33 Probability

Reconciliation Methods 3 heuristic methods 3 Bayesian networks 1 cascaded logistic regression 3 projection methods

Heuristic methods Max: Report the maximum probability of self and all descendants. And: Report the product of probabilities of all ancestors and self. Or: Compute the probability that at least one descendant of the GO term is “on,” assuming independence. All three methods use probabilities estimated by logistic regression.

Bayesian network Belief propagation on a graphical model with the topology of the GO. Given Y i, the distribution of each SVM output X i is modeled as an independent asymmetric Laplace distribution. Solved using a variational inference algorithm. “Flipped” variant: reverse the directionality of edges in the graph.

Cascaded logistic regression Fit a logistic regression to the SVM output only for those proteins that belong to all parent terms. Models the conditional distribution of the term, given all parents. The final probability is the product of these conditionals:

Isotonic regression Consider the squared Euclidean distance between two sets of probabilities. Find the closest set of probabilities to the logistic regression values that satisfy all the inequality constraints.

Isotonic regression Consider the squared Euclidean distance between two sets of probabilities. Find the closest set of probabilities to the logistic regression values that satisfy all the inequality constraints.

Küllback-Leibler projection Küllback-Leibler projection on the set of distributions which factorize according to the ontology graph. Two variants, depending on the directions of the edges.

Likelihood ratios obtained from logistic regression Hybrid method Replace the Bayesian log posterior for Y i by the marginal log posterior obtained from the logistic regression. Uses discriminative posteriors from logistic regression, but still uses a structural prior. BPAL KLP BPLR

Axes of evaluation Ontology –biological process –cellular compartment –molecular function Term size –3-10 proteins –11-30 proteins – proteins – proteins Evaluation mode –Joint evaluation –Per protein –Per term Recall –1% –10% –50% –80%

Legend Belief propagation, asymmetric Laplace Belief propagation, asymmetric Laplace, flipped Belief propagation, logistic regression Cascaded logistic regression Isotonic regression Logistic regression Küllback-Leibler projection Küllback-Leibler projection, flipped Naïve Bayes, asymmetric Laplace

Precision TP/(TP+FP) Recall TP / (TP+FN) Joint evaluation Biological process ontology Large terms ( )

Biological process ontology

Molecular function ontology

Cellular compartment ontology

Conclusions: Joint evaluation Reconciliation does not always help. Isotonic regression performs well overall, especially for recall > 20%. For lower recall values, both Küllback- Leibler projection methods work well.

Average precision per protein Biological process All term sizes

Biological process

Statistical significance Biological process Large terms

Biological process Large terms

Biological process proteins 435 proteins 239 proteins 100 proteins

Molecular function proteins 142 proteins 111 proteins 35 proteins

Cellular component proteins 135 proteins 171 proteins 278 proteins

Conclusions: per protein Several methods perform well –Unreconciled logistic regression –Unreconciled naïve Bayes –Isotonic regression –Belief propagation with asymmetric Laplace For small terms –For molecular function and biological process, we do not observe many significant differences. –For cellular components, belief propagation with logistic regression works well.

Average precision per term Biological process All term sizes

Biological process terms 435 terms 239 terms 100 terms

Molecular function terms 142 terms 111 terms 35 terms

Cellular component terms 97 terms 48 terms 30 terms

Conclusions Reconciliation does not always help. Isotonic regression (IR) performs well overall. For small biological process and molecular function terms, it is less clear that IR is one of the best methods.

Acknowledgments Guillaume Obozinski Charles Grant Gert Lanckriet Michael Jordan The mousefunc organizers Tim Hughes Lourdes Pena-Castillo Fritz Roth Gabriel Berriz Frank Gibbons

Per term for small terms Biological process Molecular function Cellular component