Annotation of eukaryotic genomes

Slides:



Advertisements
Similar presentations
Chapter 17~ From Gene to Protein
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Molecular Genetics DNA RNA Protein Phenotype Genome Gene
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Chapter 4 Transcription and Translation. The Central Dogma.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
How Are Genes Expressed? Chapter11. DNA codes for proteins, many of which are enzymes. Proteins (enzymes) can be used to make all the other molecules.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Lecture 12 Splicing and gene prediction in eukaryotes
Eukaryotic Gene Finding
Transcription & Translation
Biological Motivation Gene Finding in Eukaryotic Genomes
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Transcription and Translation
Large-scale genome projects
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Gene Activity: How Genes Work
How Are Genes Expressed? Chapter11. DNA codes for proteins, many of which are enzymes. Proteins (enzymes) can be used to make all the other molecules.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
Transcription and Translation
RNA and Protein Synthesis
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Gene Prediction and Annotation techniques Basics
Transcription & Translation Transcription DNA is used to make a single strand of RNA that is complementary to the DNA base pairs. The enzyme used is.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Genome Annotation Rosana O. Babu.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Protein Synthesis. Transcription DNA  mRNA Occurs in the nucleus Translation mRNA  tRNA  AA Occurs at the ribosome.
From Genomes to Genes Rui Alves.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Chapter 17 Transcription and Translation From Gene to Protein.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
CFE Higher Biology DNA and the Genome Transcription.
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Question of the DAY Jan 14 During DNA Replication, a template strand is also known as a During DNA Replication, a template strand is also known as a A.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
(3) Gene Expression Gene Expression (A) What is Gene Expression?
RNA and Protein Synthesis
Sequencing Data Analysis
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Amino Acid Activation And Translation.
Gene Annotation with DNA Subway
Introduction to Bioinformatics II
Comparative Genomics.
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
RNA and Protein Synthesis
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Sequencing Data Analysis
Presentation transcript:

Annotation of eukaryotic genomes Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA Comparative gene prediction translation Nascent polypeptide folding Active enzyme Functional identification Function Reactant A Product B

Genome analysis overview: C.elegans

Gene finding: ab initio What features of a ORF can we use? Size - large open reading frames DNA composition - codon usage / 3rd position codon bias Other features: Kozak sequence CCGCCAUGG Ribosome binding sites Termination signal (stops) Splice junction boundaries

Gene finding: comparative Use knowledge of known coding sequences to identify region of genomic DNA by similarity transcribed DNA sequence peptide sequence related genomic sequence

Annotation of eukaryotic genomes Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA Comparative gene prediction translation Nascent polypeptide folding Active enzyme Functional identification Function Reactant A Product B

Artemis display for S.pombe cosmid

Methods for searching Needleman & Wunsch - global alignment Pairwise alignments: matching a query sequence against a database of subject sequences Needleman & Wunsch - global alignment Smith-Waterman - local alignment FastA BLAST Others: SSAHA, WABA see Chapter 7 Developing Bionformatics Computer Skills

BLAST - local similarity searches BLAST (Basic Local Alignment Search Tool) is the workhorse of genome annotation due to it’s early optimisation for the UNIX platform Underlies most of the web-based servers world-wide Comes in many flavours: BLASTN - DNA against DNA BLASTX - DNA against Protein BLASTP - Protein against Protein TBLASTN - Protein against DNA TBLASTX - DNA against DNA at the peptide level

BLAST - results BLAST returns high-scoring pairs (HSPs) with a score and p-value. Blast output files can be large and difficult to interpret. Hence we need tools to make sense of the data - both to filter/process the file and to visualise the resulting multiple sequence alignments. MSPcrunch - a post-processor for BLAST with a number of different output types. BioPerl - modules for handling sequences and BLAST output

Standard similarity searches for first-pass annotation genomic DNA v transcript data BLASTN / EST_GENOME TBLASTX genomic DNA v genomic DNA BLASTN genomic DNA v non-redundant protein data BLASTX

Data for gene prediction EST/mRNA - intra-species matches TBLASTX - inter-species matches BLASTX - intra-species matches BLASTX - inter-species matches Coding measures - genefinder, hexamer Splice sites - consensus sequences

Multiple Sequence alignments in ACEDB

Manual review of gene predictions Check concordance with transcript data Check concordance with peptide similarity data Check splice site usage (intron / exon boundaries) Set of human appraised gene predictions. The translations of the CDS sequences are used for protein feature analysis and initial assignment (ID, function)