Download presentation
Presentation is loading. Please wait.
Published byClifton Anthony Modified over 9 years ago
1
ECFG for G ene Identification using DART Yuri Bendana, Sharon Chao, Karsten Temme
2
Eukaryotic Gene Structure Figure 4-14 from Lodish et al., Molecular Cell Biology, 2004. Adapted from Figure 1.4 in Graur and Li, Fundamentals of Molecular Evolution, 2000.
3
Evogene Pedersen and Hein, 2003. EHMM = HMM + Evolutionary Tree Gene structure model Region specific evolutionary models EM and ML estimation of parameters using Baum- Welch and Powell from annotated human/mouse alignments. Gaps in the MSA are treated as missing data MAP estimate of gene structure using Viterbi. Pedersen and Hein, 2003. EHMM = HMM + Evolutionary Tree Gene structure model Region specific evolutionary models EM and ML estimation of parameters using Baum- Welch and Powell from annotated human/mouse alignments. Gaps in the MSA are treated as missing data MAP estimate of gene structure using Viterbi.
4
Evogene EHMM Phase 1 and 2 introns model frameshift: inner codon interrupted by the intron Alignment column(s) are generated for each state visited HKY/Goldman-Yang evol models used for nt/codons.
5
Evogene Results 116 human/mouse orthologs used for training and testing Prediction improves when inputting MSA versus single sequence 116 human/mouse orthologs used for training and testing Prediction improves when inputting MSA versus single sequence
6
DART DNA, Amino, and RNA Tests [Holmes] ECFG = SCFG + Evolutionary Tree xgram, xfold, xprot programs xgram - generic grammar xfold - built-in nt grammar xprot - built-in aa grammar DNA, Amino, and RNA Tests [Holmes] ECFG = SCFG + Evolutionary Tree xgram, xfold, xprot programs xgram - generic grammar xfold - built-in nt grammar xprot - built-in aa grammar
7
Xgram Workflow Grammar MSA + Tree Xgram Annotated MSA
8
Xgram Implementation Grammar Format Terminal alphabet Markov chains Production rules for nonterminals Null states Bifurcation states Emit states EM for estimating parameters for the evolutionary grammar MAP for alignment annotations Grammar Format Terminal alphabet Markov chains Production rules for nonterminals Null states Bifurcation states Emit states EM for estimating parameters for the evolutionary grammar MAP for alignment annotations
9
Xfold Codon Grammar StartNull Forward Reverse Null -> NUC Null’ Null’ -> Fwd | Rev | End Fwd -> POS1 POS2 POS3 Fwd’ Codon: “0” “1” “2” Rev -> ~POS3 ~POS2 ~POS1 Rev’ Codon: “2” “1” “0” # Stockholm 1.0 Seq1 ATGGAA…. Seq2 ATGACG…. #=GC Codon 012012….210210 0 1 2 2 1 0 NUC
10
Codon model extensions Adapt to match Evogene model Start/stop translation codons Splicing acceptor/donor sites Frameshift introns Extensions to Evogene model 5’ and 3’ UTR Promoter region Adapt to match Evogene model Start/stop translation codons Splicing acceptor/donor sites Frameshift introns Extensions to Evogene model 5’ and 3’ UTR Promoter region
11
Testing methods Verify DART performance vs Evogene Mouse/human alignments as training data mreB/actin genes for model growth Introns Intergenic Verify DART performance vs Evogene Mouse/human alignments as training data mreB/actin genes for model growth Introns Intergenic
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.