Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.

Similar presentations


Presentation on theme: "Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features."— Presentation transcript:

1 Molecular Phylogeny

2 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are mainly used for phylogenetic analyses. One tree of life A sketch Darwin made soon after returning from his voyage on HMS Beagle (1831–36) showed his thinking about the diversification of species from a single stock (see Figure, overleaf). This branching, extended by the concept of common descent,

3 3 Haeckel (1879)Pace (2001)

4 4 Molecular phylogeny uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data Human Chimpanzee Gorilla Orangutan Gorilla Chimpanzee Orangutan Human Molecular analysis: Chimpanzee is related more closely to human than the gorilla Pre-Molecular analysis: The great apes (chimpanzee, Gorilla & orangutan) Separate from the human

5 5 What can we learn from phylogenetics tree?

6 Was the extinct quagga more like a zebra or a horse? 1. Determine the closest relatives of one organism in which we are interested

7 7 Which species are closest to Human? Human Chimpanzee Gorilla Orangut an Gorilla Chimpanzee Orangutan Human

8 8 Example Metagenomics A new field in genomics aims the study the genomes recovered from environmental samples. A powerful tool to access the wealthy biodiversity of native environmental samples 2. Help to find the relationship between the species and identify new species

9 10 6 cells/ ml seawater 10 7 virus particles/ ml seawater >99% uncultivated microbes Incredible microbial diversity in a drop of seawater

10 shear 3 – 4 kb shotgun library paired-end sequence (F / R) composite contig assembly community DNA …ACGGCTGCGTTACATCGATCATTTACGA ACATCGATCATTTACGATACCATTG… community sample (cloning bias) (extraction bias) Metagenomics

11 11 From : “The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples” Williamson et al, PLOS ONE 2008

12 3. Discover a function of an unknown gene or protein 12 RBP1_HS RBP2_pig RBP_RAT ALP_HS ALPEC_BV ALPA1_RAT ECBLC Hypothetical protein X

13 13 Relationships can be represented by Phylogenetic Tree or Dendrogram A B C D E F

14 14 Phylogenetic Tree Terminology Graph composed of nodes & branches Each branch connects two adjacent nodes A B C D E F R

15 15 Rooted tree based on priori knowledge: Human Chimp Chicken Gorilla Human Chimp Chicken Gorilla Un-rooted tree Phylogenetic Tree Terminology

16 16 Rooted vs. unrooted trees 1 2 3 31 2

17 17 How can we build a tree with molecular data? -Trees based on DNA sequence (rRNA) -Trees based on Protein sequences

18 18 Questions: Can DNA and proteins from the same gene produce different trees ? Can different genes have different evolutionary history ? Can different regions of the same gene produce different trees ?

19 19 Methods

20 20 Approach 1 - Distance methods Two steps : –Compute a distances between any two sequences from the MSA. –Find the tree that agrees most with the distance table. Algorithms : -Neighbor joining Approach 2 - State methods Algorithms: –Maximum parsimony (MP) –Maximum likelihood (ML)

21 21 Neighbor Joining (NJ) Reconstructs unrooted tree Calculates branch lengths Based on pairwise distance In each stage, the two nearest nodes of the tree are chosen and defined as neighbors in our tree. This is done recursively until all of the nodes are paired together.

22 Star Structure Assumption: Divergence of sequences is assumed to occur at constant rate  Distance to root equals a d c b

23 23 abcd a0875 b8039 c7308 d5980 a d c b Basic Algorithm Initial star diagramDistance matrix

24 24 abcd a0875 b8039 c7308 d5980 a d c b Choose the nodes with the shortest distance and fuse them. Selection step

25 25 a Then recalculate the distance between the rest of the remaining sequences (a and d) to the new node (e) and remove the fused nodes from the table. d c,b e a ade a056 d507 e670 D (EA) = (D(AC)+ D(AB)-D(CB))/2 Next Step D (ED) = (D(DC)+ D(DB)-D(CB))/2 abcd a0875 b8039 c7308 d5980

26 26 In order to get a tree, un-fuse c and b by calculating their distance to the new node (e) d c e a ade a056 d507 e670 b D ce D de Next Step

27 27 a a,d c e ade a056 d507 e670 b D ce D de f Next…

28 28 a a c e fe f04 e40 b D af D de f d D ce D bf Final D (EF) = (D(EA)+ D(ED)-D(AD))/2

29 29 d c,b e a a,d c e b D ce D de f d a c e b D af D de f D ce D bf 12 3

30 30 IMPORTANT !!! Usually we don’t start from a star diagram and in order to choose the nodes to fuse we have to calculate the relative distance matrix (Mij) representing the relative distance of each node to all other nodes

31 31 EXAMPLE A B C D E B 5 C 4 7 D 7 10 7 E 6 9 6 5 F 8 11 8 9 8 A B C D E B -13 C -11 D -10 -10.5 E -10 -11-13 F -10.5 -11 -11.5 Original distance MatrixRelative Distance Matrix (Mij) The Mij Table is used only to choose the closest pairs not for calculating the distances

32 32 Advantages -It is fast and thus suited for large datasets -permits lineages with largely different branch lengths Disadvantages - sequence information is reduced - gives only one possible tree Advantages and disadvantages of the neighbor-joining method

33 More problems with phylogenetic trees It is wrong to assume that branch length is proportional to speciation time (molecular clock). It is wrong to produce a tree based on distance values of the whole alignment.

34 Problems with phylogenetic trees

35 Bacillus E.coli Pseudomonas Salmonella Aeromonas Lechevaliera Burkholderias Problems with phylogenetic trees

36 It is wrong to assume that branch length is proportional to speciation time (molecular clock). It is wrong to produce a tree based on distance values of the whole alignment : using different regions from a same alignment may produce different trees. What to do?: use bootstrap

37 Boostraped tree Bootstrapping is a methods for estimating generalization error based on “resampling“. In the context of phylogenetic trees, it consist in randomly selecting different positions from an alignment and constructing a tree based on these position. As a result we get the % of times a certain node was formed. Highly reliable none less reliable none

38 38 Tools for tree reconstruction CLUSTALX (NJ method) Phylip -PHYLogeny Inference PackagePhylip –includes parsimony, distance matrix, and likelihood methods, including bootstrapping. Phyml (maximum likelihood method)Phyml More phylogeny programsphylogeny programs

39 39 362

40 40 http://www.phylogeny.fr


Download ppt "Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features."

Similar presentations


Ads by Google