Comparative Genomics.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
Comparative genomics Joachim Bargsten February 2012.
© Wiley Publishing All Rights Reserved. Phylogeny.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
How to access genomic information using Ensembl August 2005.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Nucleotide sequence alignments in Compara Stephen Fitzgerald
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Comparative genomics and proteomics in Ensembl Sep 2006.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
Introduction to Phylogenetics
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Comparative genomics Haixu Tang School of Informatics.
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
Using blast to study gene evolution – an example.
Phylogenetic analysis taken from and es/MSAPhylogeny.htm.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Phylogenetics.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Phylogeny.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Sequence similarity, BLAST alignments & multiple sequence alignments
Exploring Molecular Evolution
Genetics and Evolutionary Biology
Basics of Comparative Genomics
Pipelines for Computational Analysis (Bioinformatics)
Evidence of Evolution.
In-Text Art, Ch. 16, p. 316 (1).
Genome Annotation Continued
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Exploring Molecular Evolution
with the Ensembl Genome Browser
Model of segmental duplication Acceptor regions of the genome acquire segments of genomic material that range from 1–200 kb from disparate regions.
Volume 8, Issue 1, Pages 9-17 (January 2005)
6.2 Evidence of Evolution Key concepts: What evidence supports the theory of evolution? How do scientists infer evolutionary relationships among organisms?
Evolutionary Trees.
Basics of Comparative Genomics
Evolution and Natural Selection
Presentation transcript:

Comparative Genomics

Overview Orthologues and paralogues Protein families Genome-wide DNA alignments Syntenic blocks

Comparative Genomics Allows us to achieve a greater understanding of vertebrate evolution Tells us what is common and what is unique between different species at the genome level The function of human genes and other regions may be revealed by studying their counterparts in lower organisms Helps identify both coding and non-coding genes and regulatory elements

Species in Ensembl MAMMALS BIRDS REPTILES FISHES CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA 570 505 438 408 360 286 245 208 144 65 MYBP MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS BIRDS PALEOGNATHS REPTILES PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS FISHES SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

Orthologue / Paralogue Prediction Algorithm (1) Load the longest translation of each gene from all species used in Ensembl. (2) Run WUBLASTp+SmithWaterman of every gene against every other (both self and non-self species) in a genome-wise manner. (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values. (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family. (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE. (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage. (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree, using RAP. (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.

Homologue Relationships Orthologues : any gene pairwise relation where the ancestor node is a speciation event Paralogues : any gene pairwise relation where the ancestor node is a duplication event

Orthologue and Paralogue Types

Orthologue and Paralogue types

GeneView

GeneView

GeneTreeView MUSCLE protein alignment GeneTree

GeneTreeView Speciation node (blue) Duplication node (red)

Protein Dataset More than 1,500,000 proteins clustered: All Ensembl protein predictions from all species supported ~ 670,000 protein predictions All metazoan (animal) proteins in UniProt: ~ 80,000 UniProt/Swiss-Prot ~ 830,000 UniProt/TrEMBL

Clustering Strategy BLASTP all-versus-all comparison Markov clustering For each cluster: Calculation of multiple sequence alignments with ClustalW Assignment of a consensus description

GeneView / TransView / ProtView Link to FamilyView

FamilyView Consensus annotation JalView multiple alignments Ensembl family members within human UniProt family members Ensembl family members in other species

JalView

Whole Genome Alignments Functional sequences evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function. Comparing genomic sequences from species at different evolutionary distances allows us to identify: Coding genes Non-coding genes Non-coding regulatory sequences

Selection of Species for DNA comparisons Both coding and non-coding sequences ~70-75% ~150 MYA 4.2 Opossum 0.4 2.5 3.0 Size (Gbp) ~65% ~80% >99% Sequence conservation (in coding regions) Primarily coding sequences Recently changed sequences and genomic rearrangements Aids identification of… ~450 MYA ~ 65 MYA ~5 MYA Time since divergence Pufferfish Mouse Chimpanzee Human vs..

Alignment Algorithm Should find all highly similar regions between two sequences Should allow for segments without similarity, rearrangements etc. Issues Heavy process Scalability, as more and more genomes are sequenced Time constraint

BLASTZ-net, tBLAT and PECAN BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human - mouse Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human - zebrafish PECAN is used for multispecies alignments 7 eutherian mammals 10 amniota vertebrates

BLASTZ-net, tBLAT and PECAN For which combinations of species whole genome alignments have been done is shown on the Comparative Genomics page (Help & Documentation > Genomic Data > Comparative Genomics):

ContigView Constrained elements Conservation score PECAN alignments Blastz mouse tBLAT zebrafish

MultiContigView Conserved sequences human Conserved sequences dog

AlignSliceView Human Mouse Dog Rat

MultiContigView vs. AlignSliceView

AlignView

GeneSeqalignView

GeneSeqalignView

Syntenic Blocks Genome alignments are refined into larger syntenic regions Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent Any clusters less than 100 kb are discarded

SyntenyView Human chromosome Orthologues Mouse chromosomes

CytoView Syntenic blocks Orientation Chromosome

Q & A Q U E S T I O N S A N S W E R S