Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
© Wiley Publishing All Rights Reserved. Phylogeny.
Xenolog: Homologs resulting from horizontal gene transfer.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Some new sequencing technologies. Molecular Inversion Probes.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Protein Modules An Introduction to Bioinformatics.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
T-COFFEE Multiple Alignments of Orthologous Sequences Horizontal Gene Transfer (Phylogenetic Trees) WebLogo.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Manually Adjusting Multiple Alignments Chris Wilton.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Using blast to study gene evolution – an example.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Construction of Substitution matrices
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Jalview Visualising DAS annotation on Multiple Sequence Alignments 26 th February 2007 Andrew Waterhouse
Copyright OpenHelix. No use or reproduction without express written consent1.
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
1 MSA (Multiple Sequence Alignment) and Evolution Fiona Brinkman Simon Fraser University, Greater Vancouver, BC, Canada.
First & Last Name August X, 2000 Evolution
Sequence similarity, BLAST alignments & multiple sequence alignments
Basics of Comparative Genomics
Sequence based searches:
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
Genome Annotation Continued
INFORMATION FLOW AARTHI & NEHA.
Ensembl Genome Repository.
Pairwise Sequence Alignment
Multiple sequence alignment & Phylogenetics Analysis
Basics of Comparative Genomics
Presentation transcript:

Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Overview  Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources  Alignment & Assembly Differences Key programs for each Jalview example

Homologs Have common origins but may or may not have common activity. Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment

Homologs …have common ancestry, but the way they are related can vary (i.e. the reasons they have diverged into different sequences can vary)  orthologs - Homologs produced by speciation. They tend to have similar function.  paralogs - Homologs produced by gene duplication. They tend to have differing functions.

Orthologous or paralogous homologs Early globin gene mouse  ß -chain gene  -chain gene cattle ß human ß mouse ß human  cattle  Orthologs (  ) Orthologs ( ß ) Paralogs (cattle) Homologs Gene Duplication Orthologs – diverged after speciation – tend to have similar function Paralogs – diverged after gene duplication – some functional divergence occurs Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs

True or False? A1x is the ortholog in species x of A1y? A1x is a paralog of A2x? A1x is a paralog of A2y?

Identifying Gene/Protein Relationships from Phylogenies  Orthologs – Homologs produced by speciation – Gene phylogeny matches organismal phylogeny  Paralogs – Homologs produced by gene duplication. – Multiple copies of homologs in a given species or evidence that gene duplication involved through phylogenetic analysis – Lack of match to organismal phylogeny

Gene Orthology: How to detect?  Most : Identify reciprocal best BLAST hits (EGO, COGs,…) Example Problem:  If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete  Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced cattlehumancattle mouse

2 Forms in 1 Species ++++ Slides from Jonathan Eisen

2 Forms in 1 Species - Gene Loss Gene duplicated in common ancestor Loss

Unusual Distribution Pattern + +

Unusual Distribution - Gene Loss + + Gene present in ancestor Gene lost here

Unusual Distribution - Evolutionary Rate Variation -? + + Gene too diverged to be found

Ortholog guess via synteny ACB AC?

Syntenic blocks

Alignments and Assemblies  Alignment ALL sequences from SAME region Therefore can be useless for  non-overlapping contigs  PCR probes/oligos Good for  paralog/orthologs  Basis for phylogeny  More dissimilar sequences  Assembly: Good for near identical sequences Read Length  Short Read [Next Gen Sequencing]  Long Read [Sanger and 3 rd Gen sequencing?] Reference?  De-novo  Guided [reference sequence]

ensEMBL calculations demo

OMA Browser demo

Alignment  Implicit statement Each residue in an aligned sequence derived from the last common ancestor [LCA]  Therefore ok to only look at conserved regions or mask non- conserved regions Especially for phylogeny

Alignment Tools  Faster but less accurate (some better with gaps) Muscle ClustalW/X MAFFT  Slow but more accurate *-Coffee  T: original  3D: uses pdb as guide (structural)  M: uses multiple methods Probcons

Alignment Edit Tools  NEVER use a word processor or excel to edit alignments……  JalView (Java Alignment Viewer) Good for editing DAS capable

Figure Generation Trees Annotation Features Structures PDB ‘Standard’ Formats FASTA MSF CLUSTAL PILEUP BLC PFAM Distributed Annotation System Distributed Annotation System GFF Jalview Features Newick Secondary Structure Prediction Multiple Sequence Alignment Sequences Alignments Clickable HTML Images Line Art Analysis Consensus Conservation & Clustering Visualization Jalview Annotation

Jalview DAS Client Functionality DAS ANNOTATION SERVERS DAS ANNOTATION SERVERS Query matches ID to Authority Map to local reference frame Mouse over for feature name, links and scores Group features by source Type==colour Highlight start-end Select specific sources Filtered list Add user defined sources

Assemblers  Many free options : examples below  Long Reads STADEN - staden.sf.net  NextGenSequencing Guided: Bowtie, Novoalign, MAQ Denovo: Velvet  3 rd Generation Sequencing ????

Post Assembly  Correction Reads mapping to multiple places PCR amplification prior to mapping  Tools and workflows available in our Galaxy platform demo