Distances.

Slides:



Advertisements
Similar presentations
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Advertisements

Evolution of genomes.
1 Number of substitutions between two protein- coding genes Dan Graur.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Measuring the degree of similarity: PAM and blosum Matrix
MOLECULAR EVOLUTION Molecular evolution examines DNA and proteins, addressing two types of questions: How do DNA and proteins evolve? How are genes and.
Molecular Evolution Revised 29/12/06
14 Molecular Evolution and Population Genetics
CS262 Discussion Section 4. Agenda for today Brief coverage of mutations. Discussion of material related to HMMs (on the blackboard)
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
From population genetics to variation among species: Computing the rate of fixations.
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
MUTATIONS.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
Gene Mutations.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Sequence alignment. aligned sequences substitution model.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Chau-Ti Ting Unless noted, the course materials are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Taiwan (CC.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
N=50 s=0.150 replicates s>0 Time till fixation on average: t av = (2/s) ln (2N) generations (also true for mutations with negative “s” ! discuss among.
NEW TOPIC: MOLECULAR EVOLUTION.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Point Mutations Silent Missense Nonsense Frameshift.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Modelling evolution Gil McVean Department of Statistics TC A G.
Schematic of Eukaryotic Protein-Coding Locus
Evolutionary Change in Sequences
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
Multiple Sequence Alignment
Lesson Four Structure of a Gene.
Molecular mechanism of mutation
Lesson Four Structure of a Gene.
Causes of Variation in Substitution Rates
Gene Mutations.
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Molecular Evolutionary Analysis
Mutations.
In-Text Art, Ch. 16, p. 316 (1).
Types of Mutations.
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Schematic of Eukaryotic Protein-Coding Locus
What are the Patterns Of Nucleotide Substitution Within Coding and
Molecular Clocks Rose Hoberman.
Pedir alineamiento múltiple
Copyright Pearson Prentice Hall
Gene and Chromosomal Mutations
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
Academic Biology Notes
Copyright Pearson Prentice Hall
12–4 Mutations 12-4 Mutations Copyright Pearson Prentice Hall.
Presentation transcript:

Distances

A natural or ideal measure of distance between two sequences should have an evolutionary meaning. One such measure may be the number of nucleotide substitutions that have accumulated in the two sequences since they have diverged from each other.

To derive a measure of distance, we need to make several simplifying assumptions regarding the probability of substitution of a nucleotide by another.

Jukes & Cantor one-parameter model

Assumption: Substitutions occur with equal probabilities among the four nucleotide types.

Kimura’s two-parameter model

Assumptions: The rate of transitional substitution at each nucleotide site is  per unit time. The rate of each type of transversional substitution is  per unit time.

NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN TWO DNA SEQUENCES

After two nucleotide sequences diverge from each other, each of them will start accumulating nucleotide substitutions. If two sequences of length N differ from each other at n sites, then the proportion of differences, n/N, is referred to as the degree of divergence or Hamming distance. Degrees of divergence are usually expressed as percentages (n/N  100%).

The observed number of differences is likely to be smaller than the actual number of substitutions due to multiple hits at the same site.

13 mutations = 3 differences

Number of substitutions between two noncoding sequences

The one-parameter model In this model, it is sufficient to consider only I(t), which is the probability that the nucleotide at a given site at time t is the same in both sequences.

where p is the observed proportion of different nucleotides between the two sequences.

L = number of sites compared in the ungapped alignment between the two sequences.

The two-parameter model

The differences between two sequences are classified into transitions and transversions. P = proportion of transitional differences Q = proportion of transversional differences ATCGG ACCCG Q = 0.2 P = 0.2

Numerical example (2P-model)

Substitution schemes with more than two parameters Substitution schemes with more than two parameters. - Parameter-free substitution schemes.

Number of substitutions between two protein-coding genes

Difficulties with denominator: 1. The classification of a site changes with time: For example, the third position of CGG (Arg) is synonymous. However, if the first position changes to T, then the third position of the resulting codon, TGG (Trp), becomes nonsynonymous.

T Trp Nonsynonymous

Difficulties with denominator: 2. Many sites are neither completely synonymous nor completely nonsynonymous. For example, a transition in the third position of GAT (Asp) will be synonymous, while a transversion to either GAG or GAA will alter the amino acid.

Difficulties with nominator: 1 Difficulties with nominator: 1. The classification of the change depends on the order in which the substitutions had occurred.

Difficulties with nominator: 2 Difficulties with nominator: 2. Transitions occur with different frequencies than transversions. 3. The type of substitution depends on the mutation. Transitions result more frequently in synonymous substitutions than transversions.

Miyata & Yasunaga (1980) and Nei & Gojobori (1986) method

Step 1: Classify Nucleotides into non-degenerate, twofold and fourfold degenerate sites

Number of Amino-Acid Replacements between Two Proteins The observed proportion of different amino acids between the two sequences (p) is p = n /L n = number of amino acid differences between the two sequences L = length of the aligned sequences.

Number of Amino-Acid Replacements between Two Proteins The Poisson model is used to convert p into the number of amino replacements between two sequences (d ): d = - ln(1 – p) The variance of d is estimated as V(d) = p/L (1 – p)

How do you detect adaptive evolution at the genetic level?

Theoretical Expectations Deleterious mutations Neutral mutations Advantageous mutations Overdominant mutations