1 Evolutionary Change in Nucleotide Sequences Dan Graur.

Slides:



Advertisements
Similar presentations
1 Number of substitutions between two protein- coding genes Dan Graur.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
CHEMICAL AND PHASE EQUILIBRIUM (1)
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
R ATES OF P OINT M UTATION. The rate of mutation = the number of new sequence variants arising in a predefined target region per unit time. Target region.
Sampling distributions of alleles under models of neutral evolution.
Phylogenetic Trees Lecture 4
Measuring the degree of similarity: PAM and blosum Matrix
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
What is the probability that of 10 newborn babies at least 7 are boys? p(girl) = p(boy) = 0.5 Lecture 10 Important statistical distributions Bernoulli.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
From population genetics to variation among species: Computing the rate of fixations.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
2: Population genetics break.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
We have shown that: To see what this means in the long run let α=.001 and graph p:
Scott Williamson and Carlos Bustamante
Sequence similarity.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Procedures in RFLP. RFLP analysis can detect Point mutations Length mutations Inversions.
Gene Mutations.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Population Genetics Learning Objectives
MICROEVOLUTION Mechanisms of. POPULATIONS Populations are groups of individuals that can breed with one another and are localized in certain regions.
Broad-Sense Heritability Index
Weak forces in Evolution
Section 4 Evolution in Large Populations: Mutation, Migration & Selection Genetic diversity lost by chance and selection regenerates through mutation.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
SECOND-ORDER DIFFERENTIAL EQUATIONS Series Solutions SECOND-ORDER DIFFERENTIAL EQUATIONS In this section, we will learn how to solve: Certain.
MAT 4830 Mathematical Modeling 4.1 Background on DNA
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Sequence alignment. aligned sequences substitution model.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
MAT 4830 Mathematical Modeling
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
The Hardy-Weinberg principle is like a Punnett square for populations, instead of individuals. A Punnett square can predict the probability of offspring's.
NEW TOPIC: MOLECULAR EVOLUTION.
Compression of Protein Sequences EE-591 Information Theory FEI NAN, SUMIT SHARMA May 3, 2003.
Evolutionary Models CS 498 SS Saurabh Sinha. Models of nucleotide substitution The DNA that we study in bioinformatics is the end(??)-product of evolution.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
Mutation: Origin of genetic variation sources of new alleles rate and nature of mutations sources of new genes highly repeated functional sequences.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Modelling evolution Gil McVean Department of Statistics TC A G.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
Evolution of Populations
Evolutionary Change in Sequences
Evolutionary Interpretation of Log Odds Scores for alignment Alexei Drummond Department of Computer Science.
Lecture 6 Genetic drift & Mutation Sonja Kujala
The neutral theory of molecular evolution
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Molecular Evolutionary Analysis
Distances.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Hardy-Weinberg Equillibrium
The Evolution of Populations
Evolution by Genetic Drift : Main Points (p. 231)
Evolution by Genetic Drift : Main Points (p. 231)
Presentation transcript:

1 Evolutionary Change in Nucleotide Sequences Dan Graur

2 a single individual in the population So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation in a single individual, progressively increase their frequency and ultimately become fixed in the population.

3 We may look at the process from a different point of view. An allele that becomes fixed is different in its sequence from the allele that it replaces. That is, the substitution of a new allele for an old one is the substitution of a new sequence for a previous sequence. 123

4 If we use a time scale in which one time unit is larger than the time of fixation, then the DNA sequence at any given locus will appear to change with time. 1. actgggggtaaactatcggtatagatcat 2. actgggggttaactatcggtatagatcat 3. actgggggtgaactatcggtatagatcat 4. actgggggtgaactatcggtacagatcat

5 1. actgggggtaaactatcggtatagatcat 2. actgggggttaactatcggtatagatcat 3. actgggggtgaactatcggtatagatcat 4. actgggggtgaactatcggtacagatcat Nucleotide Substitution

6 To study the dynamics of nucleotide substitution, we must make several assumptions regarding the probability of substitution of a nucleotide by another.

7 Jukes & Cantor’s one-parameter model

8 Assumption: Substitutions occur with equal probabilities among the four nucleotide types.Substitutions occur with equal probabilities among the four nucleotide types.

9 If the nucleotide residing at a certain site in a DNA sequence is A at time 0, what is the probability, P A(t), that this site will be occupied by A at time t?

10 Since we start with A, P A(0) = 1. At time 1, the probability of still having A at this site is where 3  is the probability of A changing to T, C, or G, and 1 – 3  is the probability that A has remained unchanged.

11 1. The nucleotide has remained unchanged from time 0 to time 2. To derive the probability of having A at time 2, we consider two possible scenarios: 2. The nucleotide has changed to T, C or G at time 1, but has reverted to A at time 2.

12

13 The following equation applies to any t and any t+1

14 We can rewrite the equation in terms of the amount of change in P A(t) per unit time as:

15 We approximate the discrete-time process by a continuous-time model, by regarding  P A(t) as the rate of change at time t.

16 The solution is:

17 In the Jukes and Cantor model, the probability of each of the four nucleotides at equilibrium (t =  ) is 1/4.

18 So far, we treated P A(t) as a probability. However, P A(t) can also be interpreted as the frequency of A in a DNA sequence at time t. For example, if we start with a sequence made of adenines only, then P A(0) = 1, and P A(t) is the expected frequency of A in the sequence at time t. The expected frequency of A in the sequence at equilibrium will be 1/4, and so will the expected frequencies of T, C, and G.

19 After reaching equilibrium no further change in the nucleotide frequencies is expected to occur. However, the actual frequencies of the nucleotides will remain unchanged only in DNA sequences of infinite length. In practice, fluctuations in nucleotide frequencies are likely to occur.

20

21 Kimura’s two-parameter model

22 Assumptions: The rate of transitional substitution at each nucleotide site is  per unit time. The rate of each type of transversional substitution is  per unit time.

23 α ⁄ β ≈ 5−10

24 If the nucleotide residing at a certain site in a DNA sequence is A at time 0, what is the probability, P A(t), that this site will be occupied by A at time t?

25 After one time unit the probability of A changing into G is , the probability of A changing into C is  and the probability of A changing into T is . Thus, the probability of A remaining unchanged after one time unit is:

26 To derive the probability of having A at time 2, we consider four possible scenarios:

27 1. A remained unchanged at t = 1 and t = 2

28 2. A changed into G at t = 1 and reverted by a transition to A at t = 2

29 3. A changed into C at t = 1 and reverted by a transversion to A at t = 2

30 4. A changed into T at t = 1 and reverted by a transversion to A at t = 2

31 X (t) = The probability that a nucleotide at a site at time t is identical to that at time 0 At equilibrium, the equation reduces to X (  ) = 1/4. Thus, as in the case of Jukes and Cantor's model, the equilibrium frequencies of the four nucleotides are 1/4. 3 probabilities

32 Y (t) = The probability that the initial nucleotide and the nucleotide at time t differ from each other by a transition. Because of the symmetry of the substitution scheme, Y (t) = P AG(t) = P GA(t) = P TC(t) = P CT(t). 3 probabilities

33 Z (t) = The probability that the nucleotide at time t and the initial nucleotide differ by a specific type of transversion is given by 3 probabilities

34 Each nucleotide is subject to two types of transversion, but only one type of transition. Therefore, the probability that the initial nucleotide and the nucleotide at time t differ by a transversion is twice the probability that differ by a transition X (t) + Y (t) + 2Z (t) = 1

35 Problem with the “t” approach. Too long even for Methuselah, who is said to have lived 187 years (Genesis 5:25)

36

37

38 =

39

40 NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN TWO DNA SEQUENCES

41 After two nucleotide sequences diverge from each other, each of them will start accumulating nucleotide substitutions. If two sequences of length N differ from each other at n sites, then the proportion of differences, n/N, is referred to as the degree of divergence or Hamming distance. Degrees of divergence are usually expressed as percentages (n/N  100%).

42

43 The observed number of differences is likely to be smaller than the actual number of substitutions due to multiple hits at the same site.

44 13 substitutions = 3 differences

45

46 Number of substitutions between two noncoding (NOT protein coding) sequences

47 The one-parameter model The probability that the two sequences are different at a site at time t is p = 1 – I (t). Where  is the probability of a change from one nucleotide to another in one unit time, and t is the time of divergence.

48 The one-parameter model Problem: t and  are usually not known. Instead, we compute K, which is the number of substitutions per site since the time of divergence between the two sequences.

49 L = number of sites compared between the two sequences.

50 In the two-parameter model: The differences between two sequences are classified into transitions and transversions. P = proportion of transitional differences Q = proportion of transversional differences

51

52

53 Numerical example (2P-model)

54 There are substitution schemes with more than two parameters!