Download presentation
Presentation is loading. Please wait.
Published byGyles Reeves Modified over 9 years ago
1
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley
2
Multiple Species Comparative Gene Finding (with Alignment) McAuliffe et al. (2004), Siepel et al. (2004)
3
Multiple Species Comparative Gene Finding (with Alignment) McAuliffe et al. (2004), Siepel et al. (2004)
4
Multiple Species Comparative Gene Finding (without Alignment)
5
Gibbs Sampling for Biological Sequence Analysis Introduced by Lawrence et al. 1993 Motif Detection Extensions Multiple Motifs in a Sequence Multiple Types of Motifs Applications Alignment Linkage Analysis
6
Gibbs Sampling Aim : To sample from the joint distribution p(x 1,x 2,…,x n ) when it is easy to sample from the conditional distributions p(x i | x 1,…x i-1,x i+1,…,x n ) but not from the joint distribution. Method: Iteratively sample x i t from the conditional distribution p(x i | x 1 t,…x i-1 t,x i+1 t-1,…,x n t-1 ) Theorem : For discrete distributions, the distribution of (x 1 t,x 2 t …,x n t ) converges to p(x 1,x 2,…,x n )
7
tt ss Connection to HMMs Z1Z1 Y1Y1 Z2Z2 YmYm ZmZm Y2Y2 ss ss tt tt t = output probabilities s = transition probabilities Difficult to sample from P( Z | Y) Easy to sample from P( | Z,Y) Easy to sample Z from P(Z | ,Y)
8
Gibbs Sampling for Gene Finding
9
Initial Predictions
10
Gibbs Sampling for Gene Finding Sample Z 1 from P(Z 1 | Z [-1], Y)
11
Gibbs Sampling for Gene Finding Sample Z 2 from P(Z 2 | Z [-2], Y)
12
Additional Details Issues in the Gibbs Sampling Method Gibbs sampling assumes sequences independently generated by a HMM: need to generalize method a tree topology. Learn parameters from a subset of sequences roughly equidistant from each other: human, mouse, dog and cow Things get messy when there are multiple genes; need to handle multiple set of parameters. Make use of an approximate alignment Boost scores using a phyloHMM model
13
Results 2060 exons predicted Exon level Sensitivity : 23.2% Exon level Specificity : 46.7% 28.5% of predicted exons partially overlap with true exons. Nucleotide Level Sensitivity : 42.8% Nucleotide Level Specificity : 82.1%
14
Results Nucleotide level results much better than exon level results Need of better splice site models, probably multiple species splice site models. Low Sensitivity Is it the alignment?
15
Analysis of results (novel genes) Statistics of transcripts overlapping with novel VEGA genes 223 exons predicted Exon level Sensitivity : 24.8% (78 of 315 true exons are predicted correctly) Exon level Specificity : 35.0% (78 of the 223 predicted exons are correct) Additionally, 24.7% of predicted exons partially overlap with the true exons. Nucleotide level Sensitivity : 56.6% Nucleotide level Specificity : 62.9%
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.