Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
Comparative genomics Joachim Bargsten February 2012.
© Wiley Publishing All Rights Reserved. Phylogeny.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
Multiple Sequence Alignments. Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter Gusfield’s book: Chapter 14.1, 14.2, 14.5,
How to access genomic information using Ensembl August 2005.
[Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Aequatus Browser, an open-source web-based tool developed at TGAC to visualise homologous gene structures among differing species or subtypes of a common.
Nucleotide sequence alignments in Compara Stephen Fitzgerald
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
HOGENOM a phylogenomic database
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Comparative genomics and proteomics in Ensembl Sep 2006.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1/29 Comparative Genomics. 2/29 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.
Introduction to Phylogenetics
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Protein and RNA Families
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Comparative genomics Haixu Tang School of Informatics.
Using blast to study gene evolution – an example.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Accessing and visualizing genomics data
Lecture/Lab 7.31
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Sequence similarity, BLAST alignments & multiple sequence alignments
Basics of Comparative Genomics
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
In-Text Art, Ch. 16, p. 316 (1).
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Mattew Mazowita, Lani Haque, and David Sankoff
with the Ensembl Genome Browser
What do you with a whole genome sequence?
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Volume 8, Issue 1, Pages 9-17 (January 2005)
Gautam Dey, Tobias Meyer  Cell Systems 
Basics of Comparative Genomics
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics

Bioinformatic Tools for Comparative Genomics of Vectors Overview  Comparing Genomes  Homologies and Families  Sequence Alignments

Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics  Allows us to achieve a greater understanding of vertebrate evolution  Tells us what is common and what is unique between different species at the genome level  The function of human genes and other regions may be revealed by studying their counterparts in lower organisms  Helps identify both coding and non-coding genes and regulatory elements

Bioinformatic Tools for Comparative Genomics of Vectors Sequence Conservation Over Time

Bioinformatic Tools for Comparative Genomics of Vectors  Large stretches of non-coding regions in vertebrates  Regulatory regions of: Developmental genes Transcription factors miRNA Non Coding Regions Kikuta et al., Genome Research, May 2007

Bioinformatic Tools for Comparative Genomics of Vectors Methods of Alignment- Ensembl  BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human – mouse  Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human – zebrafish  PECAN global alignment used for multispecies alignments

Bioinformatic Tools for Comparative Genomics of Vectors We can better understand evolution/ speciation We can find important, functional regions of the sequence (codons, promoters, regulatory regions) It can help us locate genes in other species that are missing or not well- defined (also through comparison and alignments). Quality control! Why Compare Genomes?

Bioinformatic Tools for Comparative Genomics of Vectors Evolution at the DNA Level …ACTGACATGTACCA… …AC----CATGCACCA… Mutation Sequence edits Rearrangements Deletion Inversion Translocation Duplication

Bioinformatic Tools for Comparative Genomics of Vectors Mammals have roughly 3 billion base pairs in their genomes Over 98% human genes are shared with primates, with more than % similarity between genes. Even the fruit fly shares 60% of its genes with humans! (March 2000) Compare human & Mouse 40% of human genome align with mouse 24% of human genome missing in mouse (also mouse-specific sequences) Comparing Genomes

Bioinformatic Tools for Comparative Genomics of Vectors Improving Gene Quality Comparative genomics predicts one long transcript.

Bioinformatic Tools for Comparative Genomics of Vectors Pseudogene recovery chr 3chr X human mouse rat dog cow We find 67 confident cases where a human protein is closer to the ancestor than any extant species in the alignment

Bioinformatic Tools for Comparative Genomics of Vectors Uses all the species Prediction pipeline: Begins with BLAST and sequence clustering Compares gene relationships to species relationships How Does Ensembl Predict Homology?

BSR: Blast Score Ratio. When 2 proteins P1 and P2 are compared, BSR=scoreP1P2/max(self- scoreP1 or self-scoreP2). The default threshold used in the initial clustering step is 0.33.

Bioinformatic Tools for Comparative Genomics of Vectors Orthologue / Paralogue Prediction Algorithm (1) Load the longest translation of each gene from all species used in Ensembl. (2) Run WUBLASTp+SW of every gene against every other (both self and non-self species) in a genome-wide manner. (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values. (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family. (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE. (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage. (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree (TreeBeSt). (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.

Bioinformatic Tools for Comparative Genomics of Vectors Species Tree

Bioinformatic Tools for Comparative Genomics of Vectors Phylogenetic Tree Reconciliation: the Species/Gene Tree Problem Dufayard et al. ERCIM News No. 43 October 2000 Species and Gene Trees

Bioinformatic Tools for Comparative Genomics of Vectors Genes/Species Tree reconciliation: TreeBeST

Reconciliation M R H M R H species tree unrooted gene tree Duplication node Speciation node MRHMRH MHRMHR gene loss R’ H’ M’

Bioinformatic Tools for Comparative Genomics of Vectors Viewing Trees in Ensembl  GeneView page  GeneTreeView

Bioinformatic Tools for Comparative Genomics of Vectors Types of Homologues Orthologs : any gene pairwise relation where the ancestor node is a speciation event Paralogs : any gene pairwise relation where the ancestor node is a duplication event

Bioinformatic Tools for Comparative Genomics of Vectors Orthologue and Paralogue Types  ortholog_one2one  ortholog_one2many  ortholog_many2many  apparent_ortholog_one2one  within_species_paralog  between_species_paralog

Ortholog and Paralog types

Bioinformatic Tools for Comparative Genomics of Vectors Ortholog and Paralog types

Bioinformatic Tools for Comparative Genomics of Vectors What is ‘1 to 1’? What is ‘1 to many’? Orthologues on GeneView

Bioinformatic Tools for Comparative Genomics of Vectors Protein Families  How: Cluster proteins for every isoform (transcript) in every species.  Why: Predict a function for ‘novel’ genes/proteins Understand gene relationships

Bioinformatic Tools for Comparative Genomics of Vectors Protein Dataset More than 1,800,000 proteins clustered:  All Ensembl protein predictions from all species supported 895,070 protein predictions  All metazoan (animal) proteins in UniProt: 96,030 UniProtKB/Swiss-Prot 892,0208 UniProtKB/TrEMBL

Bioinformatic Tools for Comparative Genomics of Vectors Clustering Strategy  BLASTP all-versus-all comparison  Markov clustering  For each cluster:  Calculation of multiple sequence alignments with ClustalW  Assignment of a consensus description

Bioinformatic Tools for Comparative Genomics of Vectors Link to FamilyView Where are Families shown? ProtView

Bioinformatic Tools for Comparative Genomics of Vectors Ensembl family members within human Ensembl family members in other species JalView multiple alignments Where are Families shown? FamilyView

Bioinformatic Tools for Comparative Genomics of Vectors  Comparing Genomes  Homologies and Families  Sequence alignments

Bioinformatic Tools for Comparative Genomics of Vectors To identify homologous regions To spot trouble gene predictions Conserved regions could be functional To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved) Aligning Whole Genomes- Why?

Bioinformatic Tools for Comparative Genomics of Vectors  Should find all highly similar regions between two sequences  Should allow for segments without similarity, rearrangements etc.  Issues  Heavy process  Scalability, as more and more genomes are sequenced  Time constraint Aligning large genomic sequences

 Enredo  Defines orthology map (co-linear regions)‏  Supports segmental duplications  Pecan  Consistency based multiple aligner  Optimized to cope with long DNA sequences  Ortheus  Ancestral sequences reconstructor  Inferring the history of insertion and deletions Whole Genome Multiple Alignments

Bioinformatic Tools for Comparative Genomics of Vectors In ContigView...

Bioinformatic Tools for Comparative Genomics of Vectors  Currently 2 sets:  10 amniota vertebrates:  7 eutherian mammals: Multiple Alignments using PECAN To come… the fish!

Bioinformatic Tools for Comparative Genomics of Vectors  Use all coding exons  Get sets of best reciprocal hits  Create orthology maps  Use all coding exons  Get sets of best reciprocal hits  Create orthology maps  Build multiple global alignments Alignment Strategy  Use all coding exons  Get sets of best reciprocal hits

Bioinformatic Tools for Comparative Genomics of Vectors View Alignments: ContigView In the Detailed View Panel:

Bioinformatic Tools for Comparative Genomics of Vectors View Conservation: ContigView Click on a Pink Bar for AlignSliceView… export alignments

Bioinformatic Tools for Comparative Genomics of Vectors AlignSliceView

GeneSeqalignView

GeneSeqalignView

MultiContigView Comparison of chromosomes in multiple species. (Links from SyntenyView, ContigView, CytoView)

Bioinformatic Tools for Comparative Genomics of Vectors Export Alignments in BioMart Choose ‘Compara pairwise alignments’

Bioinformatic Tools for Comparative Genomics of Vectors Syntenic Regions  Genome alignments are compiled into larger syntenic regions  Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent  Any clusters less than 100 kb are discarded

Bioinformatic Tools for Comparative Genomics of Vectors Enredo Anchors anchors for mammals --- more than 1 anchor per 10Kb Supports segmental duplications!! Covers 90% of the human protein coding genes (Hsap-Mmus-Rnor-Cfam-Btau)‏

Bioinformatic Tools for Comparative Genomics of Vectors SyntenyView Human chromosome Mouse chromosomes Orthologues

Bioinformatic Tools for Comparative Genomics of Vectors Syntenic blocks CytoView

Bioinformatic Tools for Comparative Genomics of Vectors Summary  View Homology in pages such as GeneView, ProtView, SyntenyView, GeneTreeView, or BioMart  View Protein Family information in FamilyView  View Alignments in ContigView, GeneSeqAlign View, through BioMart