CS 6293 Advanced Topics: Translational Bioinformatics Intro & Ch2 - Data-Driven View of Disease Biology Jianhua Ruan.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Probabilistic models Haixu Tang School of Informatics.
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Chapter 4 Probability and Probability Distributions
COUNTING AND PROBABILITY
Economics 105: Statistics Any questions? Go over GH2 Student Information Sheet.
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
NIPRL Chapter 1. Probability Theory 1.1 Probabilities 1.2 Events 1.3 Combinations of Events 1.4 Conditional Probability 1.5 Probabilities of Event Intersections.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Parameter Estimation using likelihood functions Tutorial #1
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Chapter 4 Probability.
Chapter 2: Probability.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Pattern Classification, Chapter 1 1 Basic Probability.
CS5263 Bioinformatics Lecture 9: Motif finding Biological & Statistical background.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Probability and Probability Distributions
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Chapter 1 Basics of Probability.
QA in Finance/ Ch 3 Probability in Finance Probability.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
1 Probability and Statistics  What is probability?  What is statistics?
Random Sampling, Point Estimation and Maximum Likelihood.
IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
Topic 2 – Probability Basic probability Conditional probability and independence Bayes rule Basic reliability.
LECTURE IV Random Variables and Probability Distributions I.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
PROBABILITY CONCEPTS Key concepts are described Probability rules are introduced Expected values, standard deviation, covariance and correlation for individual.
LECTURE 15 THURSDAY, 15 OCTOBER STA 291 Fall
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
Probability theory Petter Mostad Sample space The set of possible outcomes you consider for the problem you look at You subdivide into different.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
LECTURE 14 TUESDAY, 13 OCTOBER STA 291 Fall
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
Computer Performance Modeling Dirk Grunwald Spring ‘96 Jain, Chapter 12 Summarizing Data With Statistics.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
1 Naïve Bayes Classification CS 6243 Machine Learning Modified from the slides by Dr. Raymond J. Mooney
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
V7 Foundations of Probability Theory „Probability“ : degree of confidence that an event of an uncertain nature will occur. „Events“ : we will assume that.
Psychology 202a Advanced Psychological Statistics September 24, 2015.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Stat 31, Section 1, Last Time Big Rules of Probability –The not rule –The or rule –The and rule P{A & B} = P{A|B}P{B} = P{B|A}P{A} Bayes Rule (turn around.
Probability. Probability Probability is fundamental to scientific inference Probability is fundamental to scientific inference Deterministic vs. Probabilistic.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
AP Statistics From Randomness to Probability Chapter 14.
Pattern Recognition Probability Review
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Lecture 1.31 Criteria for optimal reception of radio signals.
Review of Probability.
Probability and Statistics
Learn to let go. That is the key to happiness. ~Jack Kornfield
Review of Probability and Estimators Arun Das, Jason Rebello
Statistical NLP: Lecture 4
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Presentation transcript:

CS 6293 Advanced Topics: Translational Bioinformatics Intro & Ch2 - Data-Driven View of Disease Biology Jianhua Ruan

Road map What is translational bioinformatics Probability and statistics background Data-driven view of disease biology –Bayesian Inference –Network of functional related genes –Evaluation of network

What is translational bioinformatics? –Advancement in biological technology (for high-throughput data collection) and computing technology (for cheap and efficient large- scale data storage, processing, and management) has shifted modern biomedical research towards integrative and translational –Translational medical research: the process of moving discoveries and innovations generated during research in the laboratory, and in preclinical studies, to the development of trials and studies in humans, leading to improved diagnosis, prognosis, and treatment. –Barriers to translating our molecular understanding into technologies that impact patients: understanding health market size and forces, the regulatory milieu, how to harden the technology for routine use, and how to navigate an increasingly complex intellectual property landscape Connecting the stuff of molecular biology to the clinical world –The book chapters in this PLoS Comput Bio collection deals mostly with computational methodologies that likely to have an impact on clinical research / practice

Topic 1: Network-based understanding of disease mechanisms Chapter 2: Data-Driven View of Disease Biology Chapter 4: Protein Interactions and Disease Chapter 5: Network Biology Approach to Complex Diseases Chapter 15: Disease Gene Prioritization

Topic 2: drug design / discovery using computational / systems approaches Chapter 3: Small Molecules and Disease Chapter 7: Pharmacogenomics Chapter 17: Bioimage Informatics for Systems Pharmacology

Topics 3: Genome sequencing and disease Chapter 6: Structural Variation and Medical Genomics Chapter 12: Human Microbiome Analysis Chapter 14: Cancer Genome Analysis

Topic 4: Automated knowledge discovery and representation Chapter 8: Biological Knowledge Assembly and Interpretation Chapter 9: Analyses Using Disease Ontologies Chapter 13: Mining Electronic Health Records in the Genomics Era Chapter 16: Text Mining for Translational Bioinformatics

Ch2: Data-Driven View of Disease Biology Diverse genome-scale datasets exist –Genome sequences –Microarrays –genome-wide association studies –RNA interference screens –Proteomics databases –Databases of gene functions, pathways, chemicals, protein interactions, etc. Promise to provide systems level understanding of disease mechanisms –Modeling (understand) –Inference (make prediction) Integration is the key challenge –Experimental noise –Biological heterogeneity: e.g. source of material – cells in culture or biopsied tissues? –Computational heterogeneity: e.g. data format – discrete or continuous?

Bayesian Inference Powerful tool used to make predictions based on experimental evidence Simple yet elegant probabilistic theories Easy to understand and implement Data-driven modeling –No explicit assumption about the underlying biological mechanisms

Probability Basics Definition (informal) –Probabilities are numbers assigned to events that indicate “how likely” it is that the event will occur when a random experiment is performed –A probability law for a random experiment is a rule that assigns probabilities to the events in the experiment –The sample space S of a random experiment is the set of all possible outcomes

Example 0  P(A i )  1 P(S) = 1

Random variable A random variable is a function from a sample to the space of possible values of the variable –When we toss a coin, the number of times that we see heads is a random variable –Can be discrete or continuous The resulting number after rolling a die The weight of an individual

Cumulative distribution function (cdf) The cumulative distribution function F X (x) of a random variable X is defined as the probability of the event {X≤x} F (x) = P(X ≤ x) for −∞ < x < +∞

Probability density function (pdf) The probability density function of a continuous random variable X, if it exists, is defined as the derivative of F X (x) For discrete random variables, the equivalent to the pdf is the probability mass function (pmf):

Probability density function vs probability What is the probability for somebody weighting 200lb? The figure shows about 0.62 –What is the probability of lb? The right question would be: –What’s the probability for somebody weighting lb. The probability mass function is true probability –The chance to get any face is 1/6

Some common distributions Discrete: –Binomial –Multinomial –Geometric –Hypergeometric –Possion Continuous –Normal (Gaussian) –Uniform –EVD –Gamma –Beta –…

Probabilistic Calculus If A, B are mutually exclusive: –P(A U B) = P(A) + P(B) Thus: P(not(A)) = P(A c ) = 1 – P(A) A B

Probabilistic Calculus P(A U B) = P(A) + P(B) – P(A ∩ B)

Conditional probability The joint probability of two events A and B P(A∩B), or simply P(A, B) is the probability that event A and B occur at the same time. The conditional probability of P(B|A) is the probability that B occurs given A occurred. P(A | B) = P(A ∩ B) / P(B)

Example Roll a die –If I tell you the number is less than 4 –What is the probability of an even number? P(d = even | d < 4) = P(d = even ∩ d < 4) / P(d < 4) P(d = 2) / P(d = 1, 2, or 3) = (1/6) / (3/6) = 1/3

Independence P(A | B) = P(A ∩ B) / P(B) => P(A ∩ B) = P(B) * P(A | B) A, B are independent iff –P(A ∩ B) = P(A) * P(B) –That is, P(A) = P(A | B) Also implies that P(B) = P(B | A) –P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A)

Examples Are P(d = even) and P(d < 4) independent? –P(d = even and d < 4) = 1/6 –P(d = even) = ½ –P(d < 4) = ½ –½ * ½ > 1/6 If your die actually has 8 faces, will P(d = even) and P(d < 5) be independent? Are P(even in first roll) and P(even in second roll) independent? Playing card, are the suit and rank independent?

Theorem of total probability Let B 1, B 2, …, B N be mutually exclusive events whose union equals the sample space S. We refer to these sets as a partition of S. An event A can be represented as: Since B 1, B 2, …, B N are mutually exclusive, then P(A) = P(A∩B 1 ) + P(A∩B 2 ) + … + P(A∩B N ) And therefore P(A) = P(A|B 1 )*P(B 1 ) + P(A|B 2 )*P(B 2 ) + … + P(A|B N )*P(B N ) =  i P(A | B i ) * P(B i )

Example Row a loaded die, 50% time = 6, and 10% time for each 1 to 5 What’s the probability to have an even number? Prob(even) = Prob(even | d < 6) * Prob(d<6) + Prob(even | d=6) * Prob(d=6) = 2/5 * * 0.5 = 0.7

Another example We have a box of dies, 99% of them are fair, with 1/6 possibility for each face, 1% are loaded so that six comes up 50% of time. We pick up a die randomly and roll, what’s the probability we’ll have a six? P(six) = P(six | fair) * P(fair) + P(six | loaded) * P(loaded) –1/6 * * 0.01 = 0.17 > 1/6

Bayes theorem P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A) AP BP ABP )( )( )|( = => Posterior probability of B Normalizing constant BAP)|( Prior of B Likelihood This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition

Bayes theorem (cont’d) Given B 1, B 2, …, B N, a partition of the sample space S. Suppose that event A occurs; what is the probability of event B j ? P(B j | A) = P(A | B j ) * P(B j ) / P(A) = P(A | B j ) * P(B j ) /  j P(A | B j )*P(B j ) B j : different models In the observation of A, should you choose a model that maximizes P(B j | A) or P(A | B j )? Depending on how much you know about B j !

Example Prosecutor’s fallacy –Some crime happened –The suspect did not leave any evidence, except some hair –The police got his DNA from his hair Some expert matched the DNA with that of a suspect –Expert said that both the false-positive and false negative rates are Can this be used as an evidence of guilty against the suspect?

Prosecutor’s fallacy Prob (match | innocent) = Prob (no match | guilty) = Prob (match | guilty) = ~ 1 Prob (no match | innocent) = ~ 1 Prob (guilty | match) = ?

Prosecutor’s fallacy P (g | m) = P (m | g) * P(g) / P (m) ~ P(g) / P(m) P(g): the probability for someone to be guilty with no other evidence P(m): the probability for a DNA match How to get these two numbers? –We don’t really care P(m) –We want to compare two models: P(g | m) and P(i | m)

Prosecutor’s fallacy P(i | m) = P(m | i) * P(i) / P(m) = * P(i) / P(m) Therefore P(i | m) / P(g | m) = * P(i) / P(g) P(i) + P(g) = 1 It is clear, therefore, that whether we can conclude the suspect is guilty depends on the prior probability P(i) How do you get P(i)?

Prosecutor’s fallacy How do you get P(i)? Depending on what other information you have on the suspect Say if the suspect has no other connection with the crime, and the overall crime rate is That’s a reasonable prior for P(g) P(g) = 10 -7, P(i) ~ 1 P(i | m) / P(g | m) = * P(i) / P(g) = /10 -7 = 10

P(observation | model1) / P(observation | model2): likelihood-ratio test LR test Often take logarithm: log (P(m|i) / P(m|i)) Log likelihood ratio (score) Or log odds ratio (score) Bayesian model selection: log (P(model1 | observation) / P(model2 | observation)) = LLR + log P(model1) - log P(model2)

Prosecutor’s fallacy P(i | m) / P(g | m) = /10 -7 = 10 Therefore, we would say the suspect is more likely to be innocent than guilty, given only the DNA samples We can also explicitly calculate P(i | m): P(m) = P(m|i)*P(i) + P(m|g)*P(g) = * * = 1.1 x P(i | m) = P(m | i) * P(i) / P(m) = 1 / 1.1 = 0.91

Prosecutor’s fallacy If you have other evidences, P(g) could be much larger than the average crime rate In that case, DNA test may give you higher confidence How to decide prior? –Subjective? –Important? –There are debates about Bayes statistics historically –Some strongly support, some strongly against –Growing interests in many fields However, no question about conditional probability If all priors are equally possible, decisions based on bayes inference and likelihood test are equivalent

Another example A test for a rare disease claims that it will report a positive result for 99.5% of people with the disease, and 99.9% of time of those without. The disease is present in the population at 1 in 100,000 What is P(disease | positive test)? What is P(disease | negative test)?

Yet another example We’ve talked about the boxes of casinos 99% fair, 1% loaded (50% at six) We said if we randomly pick a die and roll, we have 17% of chance to get a six If we get 3 six in a row, what’s the chance that the die is loaded? How about 5 six in a row?

P(loaded | 3 six in a row) = P(3 six in a row | loaded) * P(loaded) / P(3 six in a row) = 0.5^3 * 0.01 / (0.5^3 * (1/6)^3 * 0.99) = 0.21 P(loaded | 5 six in a row) = P(5 six in a row | loaded) * P(loaded) / P(5 six in a row) = 0.5^5 * 0.01 / (0.5^5 * (1/6)^5 * 0.99) = 0.71

Relation to multiple testing problem When searching a DNA sequence against a database, you get a high score, with a significant p-value P(unrelated | high score) / P(related | high score) = P(high score | unrelated) * P(unrelated) P(high score | related) * P(related) P(high score | unrelated) is much smaller than P(high score | related) But your database is huge, and most sequences should be unrelated, so P(unrelated) is much larger than P(related) Likelihood ratio

Combining Diverse Data Using Bayesian Inference Want to calculate the probability that a gene of unknown function is involved in a disease Collect positive and negative genes (gold standard) Measure their activities under three hypothetical conditions

Figure 1. Potential distributions of experimental results obtained for datasets collected under three different conditions. Greene CS, Troyanskaya OG (2012) Chapter 2: Data-Driven View of Disease Biology. PLoS Comput Biol 8(12): e doi: /journal.pcbi Higher score in cond A and lower score in cond C => involved in disease P (involved in disease | experimental data)?

Table 1. A contingency table for the experimental results for Condition A. Greene CS, Troyanskaya OG (2012) Chapter 2: Data-Driven View of Disease Biology. PLoS Comput Biol 8(12): e doi: /journal.pcbi

Probability that a gene i is involved in disease given the experimental results for gene i Prior likelihood Normalizing factor

Prior

Combining datasets using Naïve Bayes P(D | E B, E C )  P(E B, E C | D) P(D)  P(E B | D) P(E C | D) P(D) P(~D | E B, E C )  P(E B, E C | ~D) P(~D)  P(E B | ~D) P(E C | ~D) P(~D) P(D | E B, E C ) + P(~D | E B, E C ) = 1.

Define Gold Standard (training samples) for gene-gene network Positive examples: genes within the same biological process –Rely on expert selected Gene Ontology terms biological regulation  response to stimulus  cell-matrix adhesion involved in tangential migration using cell-cell interactions  response to DNA damage stimulus  ldehyde metabolism  Negative examples: random gene pairs –Assuming most gene pairs are not related

Building a Network of Functionally Related Genes P(FR ij | E ij ) = P(E ij | FR ij ) P(FR ij ) E ij : evidence (score) for a functional relationship between gene i and gene j from a particular dataset For some dataset, e.g., physical interaction data, obtaining S ij is trivial In general, S ij can be calculated using gene-wise correlation

Fisher's z-transformation Pearson correlation coefficient Source: wikipedia Z-transformation Purpose: stabilizing variance

Figure 4. The highest and lowest contributing datasets for the pair of APOE and PLTP are shown ( Greene CS, Troyanskaya OG (2012) Chapter 2: Data-Driven View of Disease Biology. PLoS Comput Biol 8(12): e doi: /journal.pcbi

Figure 5. The diseases that are significantly connected to APOE through the guilt by association strategy used in HEFalMp. Greene CS, Troyanskaya OG (2012) Chapter 2: Data-Driven View of Disease Biology. PLoS Comput Biol 8(12): e doi: /journal.pcbi Used Fisher’s exact test

Figure 6. The genes that are most significantly connected to Alzheimer disease genes using the HEFalMp network and OMIM disease gene annotations ( Greene CS, Troyanskaya OG (2012) Chapter 2: Data-Driven View of Disease Biology. PLoS Comput Biol 8(12): e doi: /journal.pcbi

Evaluating Functional Relationship Networks TPR vs FPR plot (ROC curve) and AUC Separate gold standard into training and testing Cross validation Literature evaluation

Summary We talked about –Prob / stats background –Bayes inference method to integrate multiple large- scale, noisy datasets to predict gene-disease associations gene-gene associations –Network useful for discovering novel gene functions and directing experimental followups Advantage against curated literature or analysis based on single dataset Limited by availability / quality of gold standard data