What do you with a whole genome sequence?

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Xenolog: Homologs resulting from horizontal gene transfer.
Sequence Similarity Searching Class 4 March 2010.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Review of Laboratory 3 Spectrophotometric determination of DNA quantity, purity Abs 260 nmAbs 280 nmAbs 320 nmAbs 260/Abs
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Organizing information in the post-genomic era The rise of bioinformatics.
Construction of Substitution Matrices
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Protein and RNA Families
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
ORF Calling. Why? Need to know protein sequence Protein sequence is usually what does the work Functional studies Crystallography Proteomics Similarity.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
The Genetic Code. The DNA that makes up the human genome can be subdivided into information bytes called genes. Each gene encodes a unique protein that.
Step 3: Tools Database Searching
Evolutionary change involves genetic change   Phenotype   Genotype Study of evolution of macromolecules - nature of changes (in DNA, protein) & their.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. The sequence.
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
ORF Calling.
bacteria and eukaryotes
Using BLAST to Identify Species from Proteins
Sequence similarity, BLAST alignments & multiple sequence alignments
Bacterial infection by lytic virus
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
Pipelines for Computational Analysis (Bioinformatics)
Using BLAST to Identify Species from Proteins
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Predicting Genes in Actinobacteriophages
Comparative Genomics.
Basic Local Alignment Search Tool
30% grade = class presentations
Pairwise Sequence Alignment
From Mendel to Genomics
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Using BLAST to Identify Species from Proteins
Condor: BLAST Tuesday, Dec 7th, 10:45am
Figure 1a. Insertion of sequence into Claudi capsid gene
Presentation transcript:

What do you with a whole genome sequence?

Translate it into all 6 reading frames……

Identify all of the stop codons..…

And the start codons…… Can then identify all Open Reading Frames (ORFs) But are all real genes?

Three major prokayotic gene modelers: Generation uses predominantly 6-mer statistics to recognize coding regions; it uses a proximity rule-based start call with ATG and GTG as potential starts. Glimmer uses interpolated Markov models (IMMs) to identify the coding regions; it uses ATG, GTG, and TTG as potential starts. Critica uses blastn to produce alignments from the entire dataset and derives dicodon statistics to recognize coding sequences. It uses an SD sensor with ATG, GTG, and TTG as potential starts.

Now what? BLAST genes: To assign functions based on similarity with known genes

Basic Local Alignment Search Tool BLAST Basic Local Alignment Search Tool finds regions of local similarity between sequences >my favorite gene Atgtcgctagctagctsctagctag Database of many gene sequences GenBank is one example Answers the questions— Is there a match? And how good is it?

But--Gene D has only 20% identity to gene A! What are the genes doing? Function is assigned based on degree of similarity of an already characterized gene in the database 2 potential problems with this approach Transitive catastrophe Gene A Assigned function based on mutant phenotype or biochemical characterization of protein product Gene B From genome sequence: 70% identity to gene A Gene C From genome sequence: 60% identity to gene B Gene D From genome sequence: 70% identity to gene C But--Gene D has only 20% identity to gene A!

Would like to propagate function only to orthologous genes Homolog– genes sharing a common origin note: two genes are homologs or they or not no such thing as %homology or “more homologous” Two main kinds of homologs Orthologs-genes orginating from a single ancestral gene in the last common ancestor of the compared genomes Paralogs-genes related via duplication

X,Y,Z are genes in the same family A, B, C are three species

Two more complicated cases: Xenologs-genes orginating from a HGT of an ortholog in a distant lineage Pseudoparalogs- homologous genes that appear to paralogs in a single genome analysis but have arisen due to a combination of vertical and lateral descent

How to identify orthologs: One way: Reciprocal BLAST analysis >Genome A gene1 AGTGCATGTCCC >Genome A gene 2 TGTGCGTAGTCCAAA Database: Genome B AND >Genome B gene1 GGTTTTTACA >Genome B gene 2 AAACCTCTCTGA Database: Genome A ASK: are two genes each other’s Best BLAST hit?

Can be confounded by lineage specific gene loss

What if there is nothing at all similar in the database? 4% 4% 2% 20% Call it a “hypothetical” gene If it has a match but that is to another hypothetical gene? “conserved hypothetical” 1% 4% 1% 2% 32% 1% Conserved Hypothetical 25% Hypothetical 1% 4% DNA Replication & Repair Energy Metabolism Nucleotide Metabolism Lipid Metabolism Transcription Amino Acid Metabolism Translation Carbohydrate Metabolism Transport Cofactor Metabolism Unassigned