Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECFG for G ene Identification using DART Yuri Bendana, Sharon Chao, Karsten Temme.

Similar presentations


Presentation on theme: "ECFG for G ene Identification using DART Yuri Bendana, Sharon Chao, Karsten Temme."— Presentation transcript:

1 ECFG for G ene Identification using DART Yuri Bendana, Sharon Chao, Karsten Temme

2 Eukaryotic Gene Structure Figure 4-14 from Lodish et al., Molecular Cell Biology, 2004. Adapted from Figure 1.4 in Graur and Li, Fundamentals of Molecular Evolution, 2000.

3 Evogene  Pedersen and Hein, 2003.  EHMM = HMM + Evolutionary Tree  Gene structure model  Region specific evolutionary models  EM and ML estimation of parameters using Baum- Welch and Powell from annotated human/mouse alignments.  Gaps in the MSA are treated as missing data  MAP estimate of gene structure using Viterbi.  Pedersen and Hein, 2003.  EHMM = HMM + Evolutionary Tree  Gene structure model  Region specific evolutionary models  EM and ML estimation of parameters using Baum- Welch and Powell from annotated human/mouse alignments.  Gaps in the MSA are treated as missing data  MAP estimate of gene structure using Viterbi.

4 Evogene EHMM Phase 1 and 2 introns model frameshift: inner codon interrupted by the intron Alignment column(s) are generated for each state visited HKY/Goldman-Yang evol models used for nt/codons.

5 Evogene Results  116 human/mouse orthologs used for training and testing  Prediction improves when inputting MSA versus single sequence  116 human/mouse orthologs used for training and testing  Prediction improves when inputting MSA versus single sequence

6 DART  DNA, Amino, and RNA Tests [Holmes]  ECFG = SCFG + Evolutionary Tree  xgram, xfold, xprot programs  xgram - generic grammar  xfold - built-in nt grammar  xprot - built-in aa grammar  DNA, Amino, and RNA Tests [Holmes]  ECFG = SCFG + Evolutionary Tree  xgram, xfold, xprot programs  xgram - generic grammar  xfold - built-in nt grammar  xprot - built-in aa grammar

7 Xgram Workflow Grammar MSA + Tree Xgram Annotated MSA

8 Xgram Implementation  Grammar Format  Terminal alphabet  Markov chains  Production rules for nonterminals  Null states  Bifurcation states  Emit states  EM for estimating parameters for the evolutionary grammar  MAP for alignment annotations  Grammar Format  Terminal alphabet  Markov chains  Production rules for nonterminals  Null states  Bifurcation states  Emit states  EM for estimating parameters for the evolutionary grammar  MAP for alignment annotations

9 Xfold Codon Grammar StartNull Forward Reverse Null -> NUC Null’ Null’ -> Fwd | Rev | End Fwd -> POS1 POS2 POS3 Fwd’ Codon: “0” “1” “2” Rev -> ~POS3 ~POS2 ~POS1 Rev’ Codon: “2” “1” “0” # Stockholm 1.0 Seq1 ATGGAA…. Seq2 ATGACG…. #=GC Codon 012012….210210 0 1 2 2 1 0 NUC

10 Codon model extensions  Adapt to match Evogene model  Start/stop translation codons  Splicing acceptor/donor sites  Frameshift introns  Extensions to Evogene model  5’ and 3’ UTR  Promoter region  Adapt to match Evogene model  Start/stop translation codons  Splicing acceptor/donor sites  Frameshift introns  Extensions to Evogene model  5’ and 3’ UTR  Promoter region

11 Testing methods  Verify DART performance vs Evogene  Mouse/human alignments as training data  mreB/actin genes for model growth  Introns  Intergenic  Verify DART performance vs Evogene  Mouse/human alignments as training data  mreB/actin genes for model growth  Introns  Intergenic


Download ppt "ECFG for G ene Identification using DART Yuri Bendana, Sharon Chao, Karsten Temme."

Similar presentations


Ads by Google