We have shown that: To see what this means in the long run let α=.001 and graph p:

Slides:



Advertisements
Similar presentations
Cao et al. (2000) Gene Phylogeny of Mammals a good example where molecular sequences have led to a big improvement of our understanding of evolution.
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic Trees Lecture 4
Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
The Tree of Life From Ernst Haeckel, 1891.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Lecture 24 Inferring molecular phylogeny Distance methods
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
1 Additive Distances Between DNA Sequences MPI, June 2012.
Terminology of phylogenetic trees
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Substitution Numbers and Scoring Matrices
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Evolution: Fact and Theory  Fact: Species change over time.  Theory: Species arise from common descent through natural selection  Random mutations lead.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Sequence alignment. aligned sequences substitution model.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
MAT 4830 Mathematical Modeling
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic Trees. An old and controversial question: What is our relationship to the modern species of apes? Consider the following species: gorilla,
Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. By Chris Paine
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Maximum likelihood (ML) method
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Ab initio gene prediction
The Tree of Life From Ernst Haeckel, 1891.
Inferring phylogenetic trees: Distance and maximum likelihood methods
The Most General Markov Substitution Model on an Unrooted Tree
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Phylogenetic tree based on 16S rRNA gene sequence comparisons over 1,260 aligned bases showing the relationship between species of the genus Actinomyces.
Unit Genomic sequencing
Phylogenetic tree based on predominant 16S rRNA gene sequences obtained by C4–V8 Sutterella PCR from AUT-GI patients, Sutterella species isolates, and.
Presentation transcript:

We have shown that: To see what this means in the long run let α=.001 and graph p:

We see that in this case as t becomes large,, in fact this is the case for any α < 1 From this fact the probability of the position showing any one of the other three nucleotides is (1/3)*(1/4) = 1/12 = From these two facts we can gain some insight on the derivation of the Jukes- Cantor scoring matrix. Recall that the score in position (a,b) of the matrix at time, t, is: If we assume the nucleotides are essentially equally distributed throughout the sequences then q b = ¼ =.25 we have that

Multiplying s 1 by 10 and rounding, we get the score for a position having the same nucleotide as it initially had is 5. Noting that the score obtained for it having a different value is 9.5 units away from the score obtained when the value is unchanged, it is reasonable to assign it a value of -4. NOTE: One could argue that -5 makes more sense, except that then the scoring model would be no different than the simple counting model which assigns 1 to a match and -1 to a mismatch. Since 5 – 9.5 = –4.5 it makes sense to attribute a value of –4 as the score for a mismatch. The end result is that we have the familiar Jukes–Cantor scoring matrix

Returning to Phylogenetic distances. We saw that the distance between two genes, pseudogenes or conserved regions represented by nucleotide sequences S 1 and S 2 is given by the expression: d JC (S 1,S 2 ) = where is the fraction of sites that disagree when comparing S 1 to S 2. The exact same type of analysis can give us the formula for the distance based on the Kimura two parameter model that is shown in K&R on page 67: where p is the fraction of transitions and q is the fraction of transversions. In fact, we could consider a three parameter model and the same type of analysis would reveal that where, p, q, r are the three frequencies of the changes considered by the model.. And so the game goes on.

Phylogenetic Trees

An old and controversial question: What is our relationship to the modern species of apes? Consider the following species: gorilla, chimpanzee, orangatang, and gibbon Which is our closest evolutionary kin? On the other hand are these species more closely related to each other than they are to us? An examination of sequences for the HindIII Restriction Enzyme in these and 7 other primates revealed agreement of between 67% and 93% of the positions in the 898bp long sequences. Human Chimpanzee Gorilla Orangutan Gibbon Human Chimpanzee Gorilla Gibbon Orangutan

We can ask and answer some interesting questions based on the construction of the phylogenetic tree. If the following tree is correct, what can we say about our relationship with the apes? Human Chimpanzee Gorilla Orangutan Gibbon If we accept the other tree from the pair and use the fact that gorillas and chimpanzees are African in origin, while Orangutans and Gibbons are Asian, what can we deduce as the most likely place for the first appearance of Humans? Human Chimpanzee Gorilla Gibbon Orangutan

Unfortunately, knowing the Phylogenetic Distances, does not infer the shape of the tree. For example consider the following unrooted tree below A B C D E Unfortunately, we can not on this evidence alone construct a unique rooted version of the tree. For example, both of the following could be deduced. E A B D C A B D C E These trees are topologically as well as biologically different. However, both are possible without further evidence on which to base the construction.