Models for DNA substitution

Slides:



Advertisements
Similar presentations
MAT 4830 Mathematical Modeling 4.1 Background on DNA
Advertisements

Phylogenetic Trees Lecture 4
MAT 4830 Mathematical Modeling 4.4 Matrix Models of Base Substitutions II
Overview of Markov chains David Gleich Purdue University Network & Matrix Computations Computer Science 15 Sept 2011.
What is the probability that the great-grandchild of middle class parents will be middle class? Markov chains can be used to answer these types of problems.
Matrix Multiplication To Multiply matrix A by matrix B: Multiply corresponding entries and then add the resulting products (1)(-1)+ (2)(3) Multiply each.
1 Markov Chains Tom Finke. 2 Overview Outline of presentation The Markov chain model –Description and solution of simplest chain –Study of steady state.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Schedule Day 1: Molecular Evolution Introduction Lecture: Models of Sequence Evolution Practical: Phylogenies Chose Project and collect literature Read.
Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit J. Hein, C. Wiuf, B. Knudsen, M.B. Moller and G. Wibling.
Phylogeny Tree Reconstruction
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
Phylogeny Tree Reconstruction
The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
1 Additive Distances Between DNA Sequences MPI, June 2012.
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 5: Variation within and between species * Chapter 5: Are Neanderthals among.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
DNA Structure.
Evolutionary Models for Multiple Sequence Alignment CBB/CS 261 B. Majoros.
DNA Bases. Adenine: Adenine: (A) pairs with Thymine (T) only.
DNA (deoxyribonucleic acid) consists of three components.
DNA Questions. Question How many pyrimidines and purines in total occur in DNA and RNA? A)2, 2 B)2, 3 C)3, 2 D)4, 1.
DNA structure.
MAT 4830 Mathematical Modeling 4.1 Background on DNA
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Sequence alignment. aligned sequences substitution model.
DNA structure This is known as the 5 I (five prime) carbon 2:001:591:581:571:561:551:541:531:521:511:501:491:481:471:461:451:441:431:421:411:401:391:381:371:361:351:341:331:321:311:301:291:281:271:261:251:241:231:221:211:201:191:181:171:161:151:141:131:12
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
MS Sequence Clustering
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Evolutionary Models CS 498 SS Saurabh Sinha. Models of nucleotide substitution The DNA that we study in bioinformatics is the end(??)-product of evolution.
Use the shapes to the right to draw a molecule of DNA that has the following sequence: C A T “Science is an imaginative adventure of the mind seeking truth.
Point Mutations Silent Missense Nonsense Frameshift.
1 Probability Review E: set of equally likely outcomes A: an event E A Conditional Probability (Probability of A given B) Independent Events: Combined.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Modelling evolution Gil McVean Department of Statistics TC A G.
Structure of DNA Notes 12/17. The Double Helix Made up of units called nucleotides, which have three parts.
DNA Deoxyribonucleic Acid “living code”. DNA The genetic material of a cell contains information for the cell’s growth and other activities.
Evolutionary Change in Sequences
Molecular Genetics DNA = Deoxyribonucleic Acid. Chromosomes are tightly coiled and compacted DNA DNA is twisted and wrapped around organizing proteins.
Click on the words in blue to find out more
What is DNA?.
Lecture 10 – Models of DNA Sequence Evolution
Maximum likelihood (ML) method
Matrix Multiplication
The +I+G Models …an aside.
Distances.
Models of Sequence Evolution
Lecture 6 : More trees 9/21/09.
Goals of Phylogenetic Analysis
Gene – Expression – Mutation - polymorphism
5-1 Notes: Structure of DNA
I. DNA.
Markov Chains Lecture #5
Modeling Signals in DNA
DNA DNA is the only organic molecule capable of duplicating itself which allows cells to divide which allows for life to exist. DNA is contained in the.
Lecture 10 – Models of DNA Sequence Evolution
Assessing model adequacy in molecular phylogenetics (or more to the point – not doing it and saying why it’s tricky) Barbara Holland University of Tasmania.
DNA.
DNA Structure Be able to label the following:
Chapter 12 DNA and GENES.
CS723 - Probability and Stochastic Processes
Presentation transcript:

Models for DNA substitution

http://www.stat.rice.edu/ ~mathbio/Polanski/stat655/

Plan Basics Models in discrete time Model is continuous time Parameter estimation

Nucleotides Adenine ( A ) or ( a ) Guanine ( G ) or ( g ) purines Cytosine ( C ) or ( c ) Thymine ( T ) or ( t ) purines pyrimidines

Substitution Purine Purine Transitions Pyrimidine Pyrimidine Purine AG, G A, C T, T C Purine Pyrimidine Pyrimidine Purine Transversions AT, T A, A C, C A GT, T G, G C, C G

Other Deletions, insertions Insertions in reverse order

Hypothesis Substitution of nucleotides in the evolution of DNA sequences can be modeled by a Markov chain or Markov process

Other assumptions Stationarity Reversibility

Transition matrix P = a g c t paa pag pac pat a g pga pgg pgc pgt c pca pcg pcc pct t pta ptg ptc ptt

Models – discrete time

Jukes – Cantor model All substitutions are equally probable

Stationary distribution

Spectral decomposition of Pn

Remark When learning and researching Markov models for nucleotide substitution, it greatly helps to use a software for symbolic computation, like Mathematica, Maple, Scientific Workplace.

Kimura models  - probability of a transition  - probability of a specific transversion

Kimura 3ST model  - probability of : AG, C T  - probability of : AC, G T  - probability of : AT, C G

Stationary distribution

Generalizations of Kimura models By Ewens:  - probability of : AG, C T  - probability of : AC, A  T, G C, G T  - probability of : CA, T  A, C G, T G

Stationary distribution

Spectral decomposition

By Blaisdell:  - probability of : AG, CT  - probability of : GA, TC  - probability of : AC, A  T, G C, G T  - probability of : CA, T  A, C G, T G

Stationary distribution where Remark: this model is not reversible

Felsenstein model Probability of substitution of any nucleotide by another is proportional to the stationary probability of the substituting nucleotide

Stationary distribution

HKY model Hasegawa, Kishino, Yano Different rates for transitions and transversions

Eigenvalues of P

Left (row) eigenvectors

Right (column) eigenvectors

General 12 parameter model Tavare, 1986

Stationary distribution

Reversibility A=D, B=G, C=J, E=H, F=K, I=L Conclusion – the most general reversible model has 12 – 6 = 6 free parameters

Continuous – time models

Matrix of transition probabilites Q – intensity matrix

Jukes – Cantor model

Spectral decomposition of P(t)

Kimura model

Spectral decomposition of P(t)

Parameter estimation

Jukes – Cantor model Three things are equivalent due to reversibility: Ancestor (A) D2 A D1 D1 A D2 D1 D2

Probability that the nucleotides are different in two descendants

Estimating p We have two DNA sequences of length N D1: ACAATACAGGGCAGATAGATACAGATAGACACAGACAGAGCAGAGACAG D2: ACAATACAGGACAGTTAGATACAGATAGACACAGACAGAGCAGAGACAG Number of differences p = N

Kimura model p – probability of two different purines or pyrimidines q – probability of purine and pyrimidine