MAT 4830 Mathematical Modeling

Slides:



Advertisements
Similar presentations
1 Number of substitutions between two protein- coding genes Dan Graur.
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Coalescence DNA Replication DNA Coalescence
MAT 4830 Mathematical Modeling 4.1 Background on DNA
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Trees Lecture 4
Phylogenetic reconstruction
MAT 4830 Mathematical Modeling 4.4 Matrix Models of Base Substitutions II
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
What is the probability that of 10 newborn babies at least 7 are boys? p(girl) = p(boy) = 0.5 Lecture 10 Important statistical distributions Bernoulli.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
From population genetics to variation among species: Computing the rate of fixations.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
We have shown that: To see what this means in the long run let α=.001 and graph p:
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Class 3: Estimating Scoring Rules for Sequence Alignment.
Sequence Alignments Revisited
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
1 Additive Distances Between DNA Sequences MPI, June 2012.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
Introduction Random Process. Where do we start from? Undergraduate Graduate Probability course Our main course Review and Additional course If we have.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
MAT 1235 Calculus II 4.1, 4.2 Part I The Definite Integral
MAT 1234 Calculus I Section 1.6 Part I Using the Limit Laws
MAT 4830 Mathematical Modeling 05 Mean Time Between Failures
MAT 4830 Mathematical Modeling 4.1 Background on DNA
MAT 1235 Calculus II Section 8.5 Probability
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Sequence alignment. aligned sequences substitution model.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
MAT 4725 Numerical Analysis Section 7.1 (Part II) Norms of Vectors and Matrices
Phylogeny Ch. 7 & 8.
Molecular phylogenies. Ancestral DNA sequence: ATTGCTATTC DNA sequence changes to: ATTGCTTTTC Mutations can create synapomorphies.
MAT 2401 Linear Algebra 2.5 Applications of Matrix Operations
MAT 2401 Linear Algebra 4.4 II Spanning Sets and Linear Independence
MAT 4830 Mathematical Modeling 04 Monte Carlo Integrations
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Modelling evolution Gil McVean Department of Statistics TC A G.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Evolutionary Change in Sequences
Evolutionary Interpretation of Log Odds Scores for alignment Alexei Drummond Department of Computer Science.
MAT 1226 Calculus II Section 6.2* The Natural Logarithmic Function
Indel rates and probabilistic alignments Gerton Lunter Budapest, June 2008.
Maximum likelihood (ML) method
Molecular Evolutionary Analysis
Distances.
Goals of Phylogenetic Analysis
Statistical Modeling of Ancestral Processes
Molecular Clocks Rose Hoberman.
Distance based phylogeny reconstruction
Inferring phylogenetic trees: Distance and maximum likelihood methods
MAT 4830 Mathematical Modeling
The Most General Markov Substitution Model on an Unrooted Tree
Section 6.2* The Natural Logarithmic Function
Phylogenetic tree based on predominant 16S rRNA gene sequences obtained by C4–V8 Sutterella PCR from AUT-GI patients, Sutterella species isolates, and.
Presentation transcript:

MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I http://myhome.spu.edu/lauw

Preview Phylogenetic: of or relating to the evolutionary development of organisms Estimate the amount of total mutations (observed and hidden mutations).

Example from 4.1 S0 : Ancestral sequence S1 : Descendant of S0

Example from 4.1 S0 : Ancestral sequence S1 : Descendant of S0 Observed mutations: 2

Example from 4.1 S0 : Ancestral sequence S1 : Descendant of S0 Actual mutations: 5

Example from 4.1 S0 : Ancestral sequence S1 : Descendant of S0 Actual mutations: 5, (some are hidden mutations)

Distance of Two Sequences We want to define the “distance” between two sequences. It measures the average no. of mutations per site that occurred, including the hidden ones.

Distance of Two Sequences Let d(S0,S) be the distance between sequences S0 and S. What properties it “should” have? 1. 2. 3.

Jukes-Cantor Model Assume α is small. Mutations per time step are “rare”.

Jukes-Cantor Model q(t)=conditional prob. that the base at time t is the same as the base at time 0 A

Jukes-Cantor Model q(t)=fraction of sites with no observed mutations A

Jukes-Cantor Model p(t)=1-q(t)=fractions of sites with observed mutations A

Jukes-Cantor Model p(t)=1-q(t)=fractions of sites with observed mutations A

Jukes-Cantor Model p can be estimated from the two sequences A

Example from 4.1 Observed mutations: 2

Jukes-Cantor Distance Given p (and t), the J-C distance between two sequences S0 and S1 is defined as

Jukes-Cantor Distance Given p (and t), the J-C distance between two sequences S0 and S1 is defined as Why?

Jukes-Cantor Distance

Jukes-Cantor Distance

Jukes-Cantor Distance

Example from 4.3 Suppose a 40-base ancestral and descendent DNA sequences are

Example from 4.3 Suppose a 40-base ancestral and descendent DNA sequences are

Example from 4.3 0.275 observed sub. per site. 0.3426 sub. estimated per site.

Example from 4.3 11 observed sub. 13.7 sub. estimated.

Performance of JC distance (Homework Problem 4) Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Performance of JC distance (Homework Problem 4) Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α. Count the number of base substitutions occurred.

Performance of JC distance (Homework Problem 4) Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α. Count the number of base substitutions occurred. Compute the Jukes-Cantor distance of the initial and finial sequence.

Performance of JC distance (Homework Problem 4) Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α. Count the number of base substitutions occurred. Compute the Jukes-Cantor distance of the initial and finial sequence. Compare the actual number of base substitutions and the estimation from the Jukes-Cantor distance.

Performance of JC distance (Homework Problem 4)

Maple: Strings Handling II Concatenating two strings

Maple: Strings Handling II However, no “re-assignment”.

Classwork Work on HW #1, 2