Download presentation
Presentation is loading. Please wait.
1
Comparative Genomics
2
Overview Orthologues and paralogues Protein families
Genome-wide DNA alignments Syntenic blocks
3
Comparative Genomics Allows us to achieve a greater understanding of vertebrate evolution Tells us what is common and what is unique between different species at the genome level The function of human genes and other regions may be revealed by studying their counterparts in lower organisms Helps identify both coding and non-coding genes and regulatory elements
4
Species in Ensembl MAMMALS BIRDS REPTILES FISHES
CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA 570 505 438 408 360 286 245 208 144 65 MYBP MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS BIRDS PALEOGNATHS REPTILES PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS FISHES SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES
5
Orthologue / Paralogue Prediction Algorithm
(1) Load the longest translation of each gene from all species used in Ensembl. (2) Run WUBLASTp+SmithWaterman of every gene against every other (both self and non-self species) in a genome-wise manner. (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values. (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family. (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE. (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage. (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree, using RAP. (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.
6
Homologue Relationships
Orthologues : any gene pairwise relation where the ancestor node is a speciation event Paralogues : any gene pairwise relation where the ancestor node is a duplication event
7
Orthologue and Paralogue Types
8
Orthologue and Paralogue types
9
GeneView
10
GeneView
11
GeneTreeView MUSCLE protein alignment GeneTree
12
GeneTreeView Speciation node (blue) Duplication node (red)
13
Protein Dataset More than 1,500,000 proteins clustered:
All Ensembl protein predictions from all species supported ~ 670,000 protein predictions All metazoan (animal) proteins in UniProt: ~ 80,000 UniProt/Swiss-Prot ~ 830,000 UniProt/TrEMBL
14
Clustering Strategy BLASTP all-versus-all comparison Markov clustering
For each cluster: Calculation of multiple sequence alignments with ClustalW Assignment of a consensus description
15
GeneView / TransView / ProtView
Link to FamilyView
16
FamilyView Consensus annotation JalView multiple alignments
Ensembl family members within human UniProt family members Ensembl family members in other species
17
JalView
18
Whole Genome Alignments
Functional sequences evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function. Comparing genomic sequences from species at different evolutionary distances allows us to identify: Coding genes Non-coding genes Non-coding regulatory sequences
19
Selection of Species for DNA comparisons
Both coding and non-coding sequences ~70-75% ~150 MYA 4.2 Opossum 0.4 2.5 3.0 Size (Gbp) ~65% ~80% >99% Sequence conservation (in coding regions) Primarily coding sequences Recently changed sequences and genomic rearrangements Aids identification of… ~450 MYA ~ 65 MYA ~5 MYA Time since divergence Pufferfish Mouse Chimpanzee Human vs..
20
Alignment Algorithm Should find all highly similar regions between two sequences Should allow for segments without similarity, rearrangements etc. Issues Heavy process Scalability, as more and more genomes are sequenced Time constraint
21
BLASTZ-net, tBLAT and PECAN
BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human - mouse Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human - zebrafish PECAN is used for multispecies alignments 7 eutherian mammals 10 amniota vertebrates
22
BLASTZ-net, tBLAT and PECAN
For which combinations of species whole genome alignments have been done is shown on the Comparative Genomics page (Help & Documentation > Genomic Data > Comparative Genomics):
23
ContigView Constrained elements Conservation score PECAN alignments
Blastz mouse tBLAT zebrafish
24
MultiContigView Conserved sequences human Conserved sequences dog
25
AlignSliceView Human Mouse Dog Rat
26
MultiContigView vs. AlignSliceView
27
AlignView
28
GeneSeqalignView
29
GeneSeqalignView
30
Syntenic Blocks Genome alignments are refined into larger syntenic regions Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent Any clusters less than 100 kb are discarded
31
SyntenyView Human chromosome Orthologues Mouse chromosomes
32
CytoView Syntenic blocks Orientation Chromosome
33
Q & A Q U E S T I O N S A N S W E R S
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.