Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Finding Eukaryotic Open reading frames.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Finding prokaryotic genes and non intronic eukaryotic genes
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
 GEP Digital Laboratory Notebook Nick Reeves, Mt. San Jacinto Community College.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation Rosana O. Babu.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Bioinformatics and Computational Biology
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Research about Alternative Splicing recently 楊佳熒.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Finding genes in the genome
Annotation of eukaryotic genomes
DNA What is the Function of DNA?. Nucleic Acids : Vocab Translation page 183Translation Transcription Protein Synthesis RNA DNA Complementary Introns.
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Primer on Reading Frames and Phase Wilson Leung08/2012.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Annotation for D. virilis
bacteria and eukaryotes
Annotating The data.
Primer on Reading Frames and Phase
EGASP 2005 Evaluation Protocol
Daphnia Genome Preview at wFleaBase.org
EGASP 2005 Evaluation Protocol
Genomics and Personalized Care in Health Systems Lecture 7 Gene Finding (Part 2) Ab initio and Evidence-Based Gene Finding Leming Zhou, PhD School of.
GEP Annotation Workflow
Visualization of genomic data
Visualization of genomic data
Genome Center of Wisconsin, UW-Madison
Ab initio gene prediction
The triplet code Starter A DNA molecule is 23% guanine.
Genome Annotation w/ MAKER
Ensembl Genome Repository.
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
BLAT Blast Like Alignment Tool
Introduction to Alternative Splicing and my research report
Determine CDS Coordinates
Presentation transcript:

Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis- annotated exon. (Yandell and Ence, 2012, Nature Reviews) Automated annotation is often not good enough for genes you really care about!

Yandell and Ence, 2012, Nature Reviews lab.org/publications/pdf/euk_genome_annotation_review.pdf

Different lines of evidence go into modern gene annotation pipelines: 1.Computational prediction (Open Reading Frames, etc.) 2.Evidence based prediction (ESTs, RNA-seq, etc) 3.Homology based prediction (BLAST, etc) Synthesized into a consensus gene annotation – still may be wrong!

Bees (Order Hymenoptera, Family Apidae) Western Honey Bee (Apis mellifera) Common Eastern Bumble Bee (Bombus impatiens) Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee (Apis florea)

NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH cytochrome P450 monooxygenase enzymes classification:CYP 3 A 4 family >40% amino acid sequence- homology sub-family >55% amino acid sequence- homology isoenzyme *15 A-B allele

Chemical signalling??? (pheromone synthesis and breakdown) Detoxication (toxin and pesticide metabolism) Hormone synthesis (highly conserved orthologs) + Detoxication

Repeats

Intron splice sites are highly conserved

P450s: ~ 500 amino acids (1500 nucleotides) Highly conserved heme-binding site (cysteine)

Basic Annotation Rules CDS Start Amino acid M Nucleotide ATG CDS Stop * Amino Acid TAA/TAG/TAG Nucleotide Translation Frames Frame 1 Frame 2 Frame 3

Intron splice sites GT-AG

“(\w)” “\1 “

‘GT’ intron donor site

‘AG’ intron acceptor site

‘GT’ intron donor site 1 nucelotide “G” for next codon = Phase 1 intron

‘AG’ intron acceptor site 2 nucelotides “AA” before first full codon Combine with “G” on exon 2 Make the codon “GAA” for glutamic acid (E)

This start looks good!

Jamboree! Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera) CYP9R1 CYP6AS3 CYP6BD1 CYP6AQ1 CYP4G11 Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism: Apis florea Bombus impatiens Bombus terrestris Megachile rotundata Copy and paste verified amino acid sequences (FASTA formatted) into a text file:

Add comments to the header and include a gene identifier Send to me at: Thanks!!