Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.

Similar presentations


Presentation on theme: "Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of."— Presentation transcript:

1 Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of

2 Complexity of Eukaryotic Genomes Complexity of genomic data: Complexity of genomic data: Transposons Transposons Both Strands of DNA may code Both Strands of DNA may code

3 Levels of Genome Annotation Quality Assessment Base Level: Base Level: A T C G T A C C C A T G Y N N N Y Y Y Y Y Y Y N Exon Level: Exon Level: Whole Gene Level: Whole Gene Level: –Whether all a gene’s exons are properly ID’d and assembled

4 Impediments to Gene- Finder Quality Assessment Underlying biology is still poorly understood Underlying biology is still poorly understood cDNA libraries must be very complete—often requires multiple passes to generate a complete library. cDNA libraries must be very complete—often requires multiple passes to generate a complete library. *Diagram courtesy of University of Miami, http://fig.cox.miami.edu/~cmallery/150/gene/sf16x5.jpg

5 Impediments to Gene-Finder Quality Assessment, cont’d Even the most experienced experts make errors Even the most experienced experts make errors –Example: 4 “genes” were found to be untranslated regions Genome Annotation Software often identifies genes that the experts missed Genome Annotation Software often identifies genes that the experts missed

6 Approaches to Locating Genomic Features Comparison to cDNA libraries Comparison to cDNA libraries –Problem: Can only compare to existing libraries; cDNA libraries for target organism probably don’t exist –Highly effective, though Protein homology (utilizing SwissPROT, BLAT, etc.) Protein homology (utilizing SwissPROT, BLAT, etc.) –Ineffective overall

7 Approaches to Locating Genomic Features, cont’d Hidden Markov Models: Hidden Markov Models: –Complex statistical analyses –Assign probabilities to nucleotides having certain functions (exon, intron, promoter, suppressor, etc.); compute probabilities in aggregate to determine functions of specific regions of the genome

8 Promoters, Repeats Identifying Promoters: Identifying Promoters: 1.Site-specific identification (binding sites) 2.Statistical identification (similar to HMM) 3.Locate gene and then guess  Repeat Sequences  Must be able to identify even with point mutations, insertions/deletions, etc.  Useful for determining evolutionary significance

9 And the Winner Is… Genie EST—most effective overall gene finder; relies on EST (Expressed Sequence Tag) data (somewhat like cDNA data) Genie EST—most effective overall gene finder; relies on EST (Expressed Sequence Tag) data (somewhat like cDNA data) Genie—identifies fewer genes, but has fewer false positives Genie—identifies fewer genes, but has fewer false positives

10 Best Gene Annotation Programs, continued (Table from Reese, et al)

11 Conclusions Field is still in infancy Field is still in infancy As the exponential amount of genome data continues to grow, genome annotation software will grow in importance. As the exponential amount of genome data continues to grow, genome annotation software will grow in importance. Researchers will rely on programs like Genie for annotations as quality improves. Researchers will rely on programs like Genie for annotations as quality improves. Illustration courtesy of Genbank, http://www.ncbi.nlm.nih.gov/Genbank/index.html


Download ppt "Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of."

Similar presentations


Ads by Google