Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Genomics.

Similar presentations


Presentation on theme: "Comparative Genomics."— Presentation transcript:

1 Comparative Genomics

2 Overview Orthologues and paralogues Protein families
Genome-wide DNA alignments Syntenic blocks

3 Comparative Genomics Allows us to achieve a greater understanding of vertebrate evolution Tells us what is common and what is unique between different species at the genome level The function of human genes and other regions may be revealed by studying their counterparts in lower organisms Helps identify both coding and non-coding genes and regulatory elements

4 Species in Ensembl MAMMALS BIRDS REPTILES FISHES
CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA 570 505 438 408 360 286 245 208 144 65 MYBP MAMMALS PLACENTALS MONOTREMES MARSUPIALS OTHER BIRDS BIRDS PALEOGNATHS REPTILES PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS FISHES SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

5 Orthologue / Paralogue Prediction Algorithm
(1) Load the longest translation of each gene from all species used in Ensembl. (2) Run WUBLASTp+SmithWaterman of every gene against every other (both self and non-self species) in a genome-wise manner. (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values. (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family. (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE. (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage. (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree, using RAP. (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.

6 Homologue Relationships
Orthologues : any gene pairwise relation where the ancestor node is a speciation event Paralogues : any gene pairwise relation where the ancestor node is a duplication event

7 Orthologue and Paralogue Types

8 Orthologue and Paralogue types

9 GeneView

10 GeneView

11 GeneTreeView MUSCLE protein alignment GeneTree

12 GeneTreeView Speciation node (blue) Duplication node (red)

13 Protein Dataset More than 1,500,000 proteins clustered:
All Ensembl protein predictions from all species supported ~ 670,000 protein predictions All metazoan (animal) proteins in UniProt: ~ 80,000 UniProt/Swiss-Prot ~ 830,000 UniProt/TrEMBL

14 Clustering Strategy BLASTP all-versus-all comparison Markov clustering
For each cluster: Calculation of multiple sequence alignments with ClustalW Assignment of a consensus description

15 GeneView / TransView / ProtView
Link to FamilyView

16 FamilyView Consensus annotation JalView multiple alignments
Ensembl family members within human UniProt family members Ensembl family members in other species

17 JalView

18 Whole Genome Alignments
Functional sequences evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function. Comparing genomic sequences from species at different evolutionary distances allows us to identify: Coding genes Non-coding genes Non-coding regulatory sequences

19 Selection of Species for DNA comparisons
Both coding and non-coding sequences ~70-75% ~150 MYA 4.2 Opossum 0.4 2.5 3.0 Size (Gbp) ~65% ~80% >99% Sequence conservation (in coding regions) Primarily coding sequences Recently changed sequences and genomic rearrangements Aids identification of… ~450 MYA ~ 65 MYA ~5 MYA Time since divergence Pufferfish Mouse Chimpanzee Human vs..

20 Alignment Algorithm Should find all highly similar regions between two sequences Should allow for segments without similarity, rearrangements etc. Issues Heavy process Scalability, as more and more genomes are sequenced Time constraint

21 BLASTZ-net, tBLAT and PECAN
BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human - mouse Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human - zebrafish PECAN is used for multispecies alignments 7 eutherian mammals 10 amniota vertebrates

22 BLASTZ-net, tBLAT and PECAN
For which combinations of species whole genome alignments have been done is shown on the Comparative Genomics page (Help & Documentation > Genomic Data > Comparative Genomics):

23 ContigView Constrained elements Conservation score PECAN alignments
Blastz mouse tBLAT zebrafish

24 MultiContigView Conserved sequences human Conserved sequences dog

25 AlignSliceView Human Mouse Dog Rat

26 MultiContigView vs. AlignSliceView

27 AlignView

28 GeneSeqalignView

29 GeneSeqalignView

30 Syntenic Blocks Genome alignments are refined into larger syntenic regions Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent Any clusters less than 100 kb are discarded

31 SyntenyView Human chromosome Orthologues Mouse chromosomes

32 CytoView Syntenic blocks Orientation Chromosome

33 Q & A Q U E S T I O N S A N S W E R S


Download ppt "Comparative Genomics."

Similar presentations


Ads by Google