Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.

Gene Finding Genome Annotation

Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics Population biology & evolution Medical genomics

Basic Approaches Computational – Absolute rules: start and stop codons – Statistical probabilities: which codon is a true start? Introns splice junctions codon usage Experimental – Comparison with known genes/proteins (BLAST) – Expressed sequence tags – RNAseq data

Computational Gene Prediction Statistical properties of protein-coding genes differ from those of non-coding sequence – Long ORFs On average stop codons should occur 3 times in every 64 codons (~1/21) – Codon bias codonAmino acid % ACAThr24.6 ACCThr35.5 ACGThr28.4 ACUThr11.4

Gene features tend to occur in specific sequence contexts from Korf(2004) a.Splice acceptor sites b.Splice donor sites c.Translation starts d.Splice acceptor sites for A. thaliana genes predicted using C. elegans parameters

Many of the ab initio gene finders use Hidden Markov Models (HMMs) HMMs – Contain parameters defining probabilities that specific gene features occur in different sequence contexts They can be used to predict – transcription start sites – Intron splice junctions – Poly-A addition sites – promoters

Standard practice is to perform gene predictions with multiple programs We will run two programs in today’s exercise: SNAP – Korf (2004) Gene finding in novel genomes BMC Bioinformatics 5:59 AUGUSTUS – Stanke et al (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucl. Acids Research 32:W309

Gene validation Independent evidence that our candidate gene is, in fact, a gene – Conserved protein motifs – Blast matches – Expressed sequence tags – RNAseq reads

For today’s exercise We will use the following evidences: – Genes/proteins already identified in M.oryzae (many being well supported by blast, EST and other transcriptomic data) Splice junction information from the RNAseq mapping that we performed yesterday

Information overload!!! Results from: – SNAP – AUGUSTUS Magnaporthe genes Magnaporthe proteins RNAseq mapping data How are we going to make sense out of these highly redundant datasets?

Enter…MAKER Synthesizes multiple forms of gene prediction data – Predictions and evidences Outputs a single, consistent set of genes and gene models, including quality values Uses a standard gene annotation format – GFF3 (related to the GTF format used yesterday) – Results can be imported into a genome browser

GFF3 format 123456789 seqidsourcetypeStartEndScoreStrandphaseattributes

Gene finding is an iterative process SNAP AUGUSTUS HMM GENE MODELS GENE MODELS BLAST matches BLAST matches ESTs MAKER

Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.

Similar presentations

Presentation on theme: "Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.

Similar presentations

Presentation on theme: "Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics."— Presentation transcript:

Similar presentations

About project

Feedback