Download presentation
Presentation is loading. Please wait.
Published bySavannah Fitzpatrick Modified over 10 years ago
1
Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College Microarray Centre Bayesian Modelling of Differential Gene Expression
2
Introduction to microarrays and differential expression Bayesian hierarchical model for differential expression Decision rules Predictive model checks Gene Ontology analysis for differentially expressed genes Further work Outline
3
(1) Array contains thousands of spots Millions of strands of DNA of known sequence fixed to each spot (2) Sample (unknown sequences of cDNA) labelled with fluorescent dye (3) Matching sequences of DNA and cDNA hybridize together * * * * * (4) Array washed only matching samples left (see which from fluorescent spots) Pictures courtesy of Affymetrix Microarrays measure gene expression (mRNA) DNA TGCT cDNA ACGA
4
Microarray Data 3 SHR compared with 3 transgenic rats (with Cd36) 3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out 12000 genes on each array Biological Question Find genes which are expressed differently between animals with and without Cd36. Microarray experiment to find genes associated with Cd36 Cd36: gene known to be important in insulin resistance Aitman et al 1999, Nature Genet 21:76-83
5
Introduction to microarrays and differential expression Bayesian hierarchical model for differential expression Decision rules Predictive model checks Gene Ontology analysis for differentially expressed genes Further work Outline
6
1st level y g1r | g, δ g, g1 N( g – ½ δ g + r(g)1, g1 2 ), y g2r | g, δ g, g2 N( g + ½ δ g + r(g)2, g2 2 ), Bayesian hierarchical model for differential expression array effect or normalisation (function of g ) differential effect for gene g between 2 conditions (fixed effect or mixture prior) overall gene expression (fixed effect) variance for each gene y gsr is log gene expession
7
2nd level gs 2 | μ s, τ s logNorm (μ s, τ s ) Hyper-parameters μ s and τ s can be influential, so these are estimated in the model. 3rd level μ s N( c, d) τ s Gamma (e, f) Prior for gene variances Variances estimated using information from all measurements (~12000 x 3) rather than just 3 3 wildtype mice
8
Spline Curve r(g)s = quadratic in g for a rs(k-1) g a rs(k) with coeff (b rsk (1), b rsk (2) ), k =1, … #breakpoints Prior for array effects (Normalization) Locations of break points not fixed Must do sensitivity checks on # break points a1a1 a2a2 a3a3 a0a0
9
loess Bayesian posterior mean Array effect as function of gene effect
10
Inference on δ (1)d g = E(δ g | data) posterior mean Like point estimate of log fold change. Decision Rule: gene g is DE if |d g | > δ cut (2)p g = P( |δ g | > δ cut | data) posterior probability (incorporates uncertainty) Decision Rule: gene g is DE if p g > p cut This allows biologist to specify what size of effect is interesting (not just statistical significance) Decision Rules for Inference: Fixed Effects Model biological interest biological interest statistical confidence
11
Illustration of decision rule p g = P( |δ g | > log(2) and g > 4 | data) x p g > 0.8 Δ t-statistic > 2.78 (95% CI) 3 wildtype v. 3 knock-out mice
12
Introduction to microarrays and differential expression Bayesian hierarchical model for differential expression Decision rules Predictive model checks Gene Ontology analysis for differentially expressed genes Further work Outline
13
Key Points Predict new data from the model (using the posterior distribution) Get Bayesian p-value for each gene Use all genes together (1000s) to assess model fit (p-value distribution close to Uniform if model is good) Predictive Model Checks
14
Mixed Predictive Checks g ybar g SgSg post. pred. S g mixed pred. S g σ g pred σgσg μ,τμ,τ Mixed prediction is less conservative than posterior prediction
15
Bayesian predictive p-values
16
Introduction to microarrays and differential expression Bayesian hierarchical model for differential expression Decision rules Predictive model checks Gene Ontology analysis for differentially expressed genes Further work Outline
17
Picture from Gene Ontology website Links connect more general to more specific terms Directed Acyclic Graph ~16,000 terms Gene Ontology: network of terms
18
Picture from Gene Ontology website Each term may have 1000s of genes annotated (or none) Gene may be annotated to several GO terms Gene annotated to term A annotated to all ancestors of A Annotations of genes to a node
19
GO annotations of genes associated with the insulin-resistance gene Cd36 Compare GO annotations of genes most and least differentially expressed Most differentially expressed p g > 0.5 (280 genes) Least differentially expressed p g < 0.2 (11171 genes)
20
GO annotations of genes associated with the insulin-resistance gene Cd36 Use Fishers test to compare GO annotations of genes most and least differentially expressed (one test for each GO term) None significant with simple multiple testing adjustment, but there are many dependencies Inflammatory response recently found to be important in insulin resistance
21
Summary of work in Biometrics paper Bayesian hierarchical model flexible, estimates variances robustly Predictive model checks show exchangeable prior good for gene variances Useful to find GO terms over-represented in the most differentially-expressed genes
22
Introduction to microarrays and differential expression Bayesian hierarchical model for differential expression Decision rules Predictive model checks Gene Ontology analysis for differentially expressed genes Further work Outline
23
BGmix: mixture model for differential expression Group genes into 3 classes: non-DE over-expressed under-expressed Estimation and classification is simultaneous Change the prior on the differential expression parameters δ g
24
BGmix: mixture model for differential expression Choice of Null Distribution True log fold changes = 0 Nugget null: true log fold changes = small but not necessarily zero Choice of DE genes distributions Gammas Uniforms Normal
25
Outputs Point estimates (and s.d.) of log fold changes (stabilised and smoothed) Posterior probability for gene to be in each group Estimate of proportion of differentially expressed genes based on grouping (parameter of model) BGmix: mixture model for differential expression
26
Obtaining gene lists Threshold on posterior probabilities (Posterior probability of classification in the null < threshold gene is DE) Estimate of False Discovery Rate for any gene list (estimate = average of posterior probabilities) Very simple estimate! Choice of decision rule: Bayes Rule Fix False Discovery Rate More complex rules for mixture of 3 components BGmix: mixture model for differential expression
27
g g pred zgzg ybar g SgSg mixed pred. ybar g mixed pred. S g σ g pred σgσg μ,τμ,τ η w Model checks for differential expression parameters δ g More complex for mixture model Important point: we check each mixture component separately Predictive Checks for Mixture Model
28
Bayesian p-values for Mixture Model Simulated data from incorrect model Simulated data from correct model
29
Acknowledgements Co-authors Sylvia Richardson, Clare Marshall (IC Epidemiology) Tim Aitman, Anne-Marie Glazier (IC Microarray Centre) Collaborators on BGX Grant Anne-Mette Hein, Natalia Bochkina (IC Epidemiology) Helen Causton (IC Microarray Centre) Peter Green (Bristol) BBSRC Exploiting Genomics Grant
30
Papers and Software Software: Winbugs code for model in Biometrics paper BGmix (R package) includes mixture model Papers: BGmix paper, submitted Paper on predictive checks for mixure prior, in preparation http://www.bgx.org.uk/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.