Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Genome organization Lesk, Ch 2 (Lesk, 2008). Genomes and proteomes Genome of a typical bacterium comes as a single DNA molecule of about 5 million characters.
Chapter 3 Ying Xu. Total numbers of occurrences of X in coding and noncoding regions. Relative frequency (RF)of X in coding regions = number of.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Introduction to BioInformatics GCB/CIS535
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
Comparative ab initio prediction of gene structures using pair HMMs
Eukaryotic Gene Finding
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Lecture 12 Splicing and gene prediction in eukaryotes
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Eukaryotic Gene Finding
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
On line (DNA and amino acid) Sequence Information
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Screening a Library Plate out library on nutrient agar in petri dishes. Up to 50,000 plaques or colonies per plate.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Chapter 21 Eukaryotic Genome Sequences
Gene Regulations and Mutations
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Chapter 3 The Interrupted Gene.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Applied Bioinformatics
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Finding genes in the genome
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.
Mutations and Gene Regulation Chapter 12 Sections 4-5.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Genome Annotation (protein coding genes)
EGASP 2005 Evaluation Protocol
The Transcriptional Landscape of the Mammalian Genome
What is a Hidden Markov Model?
EGASP 2005 Evaluation Protocol
Eukaryotic Gene Finding
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Working in the Post-Genomic C. elegans World
Comparison of the variable regions of (A) pHNZY32, pHNZY118, and pHNAH24; (B) pHNMCC14; (C) pHNFKU92; (D) pE80; (E) pECB11; (F) p42-2; and (G) pSLK172-2.
Evolution of Genomes Chapter 21.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of

Complexity of Eukaryotic Genomes Complexity of genomic data: Complexity of genomic data: Transposons Transposons Both Strands of DNA may code Both Strands of DNA may code

Levels of Genome Annotation Quality Assessment Base Level: Base Level: A T C G T A C C C A T G Y N N N Y Y Y Y Y Y Y N Exon Level: Exon Level: Whole Gene Level: Whole Gene Level: –Whether all a gene’s exons are properly ID’d and assembled

Impediments to Gene- Finder Quality Assessment Underlying biology is still poorly understood Underlying biology is still poorly understood cDNA libraries must be very complete—often requires multiple passes to generate a complete library. cDNA libraries must be very complete—often requires multiple passes to generate a complete library. *Diagram courtesy of University of Miami,

Impediments to Gene-Finder Quality Assessment, cont’d Even the most experienced experts make errors Even the most experienced experts make errors –Example: 4 “genes” were found to be untranslated regions Genome Annotation Software often identifies genes that the experts missed Genome Annotation Software often identifies genes that the experts missed

Approaches to Locating Genomic Features Comparison to cDNA libraries Comparison to cDNA libraries –Problem: Can only compare to existing libraries; cDNA libraries for target organism probably don’t exist –Highly effective, though Protein homology (utilizing SwissPROT, BLAT, etc.) Protein homology (utilizing SwissPROT, BLAT, etc.) –Ineffective overall

Approaches to Locating Genomic Features, cont’d Hidden Markov Models: Hidden Markov Models: –Complex statistical analyses –Assign probabilities to nucleotides having certain functions (exon, intron, promoter, suppressor, etc.); compute probabilities in aggregate to determine functions of specific regions of the genome

Promoters, Repeats Identifying Promoters: Identifying Promoters: 1.Site-specific identification (binding sites) 2.Statistical identification (similar to HMM) 3.Locate gene and then guess  Repeat Sequences  Must be able to identify even with point mutations, insertions/deletions, etc.  Useful for determining evolutionary significance

And the Winner Is… Genie EST—most effective overall gene finder; relies on EST (Expressed Sequence Tag) data (somewhat like cDNA data) Genie EST—most effective overall gene finder; relies on EST (Expressed Sequence Tag) data (somewhat like cDNA data) Genie—identifies fewer genes, but has fewer false positives Genie—identifies fewer genes, but has fewer false positives

Best Gene Annotation Programs, continued (Table from Reese, et al)

Conclusions Field is still in infancy Field is still in infancy As the exponential amount of genome data continues to grow, genome annotation software will grow in importance. As the exponential amount of genome data continues to grow, genome annotation software will grow in importance. Researchers will rely on programs like Genie for annotations as quality improves. Researchers will rely on programs like Genie for annotations as quality improves. Illustration courtesy of Genbank,