Evolutionary Change in Sequences

Slides:



Advertisements
Similar presentations
DNA Repair. -Errors (at a rate of 1x10 -9 ) are introduced during DNA replication -DNA in cells is constantly being altered by cellular constituents,
Advertisements

1 Number of substitutions between two protein- coding genes Dan Graur.
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
MAT 4830 Mathematical Modeling 4.1 Background on DNA
Phylogenetic Trees Lecture 4
Measuring the degree of similarity: PAM and blosum Matrix
Chap. 6. Molecular Phylogeny. Charles Darwin, 1859 Natural selection Evolution Change in frequency of genes in a population Heritable changes in a population.
DNA sequences alignment measurement
MAT 4830 Mathematical Modeling 4.4 Matrix Models of Base Substitutions II
Introduction to Evolutionary Bioinformatics David H. Ardell,Forskarassistent.
Molecular Evolution Revised 29/12/06
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
BIOE 109 Summer 2009 Lecture 6- Part II Molecular evolution.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Sequence similarity.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Nucleic Acids.
Mutation and DNA Mutation = change(s) in the nucleotide/base sequence of DNA; may occur due to errors in DNA replication or due to the impacts of chemicals.
Mutations, Mutagenesis, and Repair Chapter 10. The Problem DNA extremely long, fragile DNA extremely long, fragile Subject to both physical and chemical.
1 Patterns of Substitution and Replacement. 2 3.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Terminology of phylogenetic trees
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Substitution Numbers and Scoring Matrices
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Evolution?. The Molecular Basis of Mutation-Evolution Mutations alter the nucleotide sequences of genes in several ways, for example the substitution.
Coding for Life Introduction
MAT 4830 Mathematical Modeling 4.1 Background on DNA
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
MAT 4830 Mathematical Modeling
Pairwise Sequence Analysis-III
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
NEW TOPIC: MOLECULAR EVOLUTION.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Chap. 8. Problem 1. Hydrogen bonds can potentially be formed to each of the four nitrogen atoms in a purine ring (N-1, N-3, N-7, and N-9). Since N-1 is.
Evolutionary Models CS 498 SS Saurabh Sinha. Models of nucleotide substitution The DNA that we study in bioinformatics is the end(??)-product of evolution.
Point Mutations Silent Missense Nonsense Frameshift.
1 Probability Review E: set of equally likely outcomes A: an event E A Conditional Probability (Probability of A given B) Independent Events: Combined.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Modelling evolution Gil McVean Department of Statistics TC A G.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Models for DNA substitution
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
In-Text Art, Ch. 16, p. 316 (1).
Distances.
Goals of Phylogenetic Analysis
Molecular Evolution.
Pedir alineamiento múltiple
Chapter 19 Molecular Phylogenetics
Sequence Similarity Andrew Torda, wintersemester 2006 / 2007, Angewandte … What is the easiest information to find about a protein ? sequence history.
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Evolutionary Change in Sequences Lecture 2 (though L1 slightly too long ... a few slides included in beginning of L2) Changes over long evolutionary times e.g., mouse and human diverged 100 mya (million years ago) Diverged = last shared a common ancestor Study how nucleotide sequences change over time Require (mathematical) models Mostly based on simple probability Have many of the ‘same’ genes, but with slightly different sequences e.g., compare mouse myoglobin to human myoglobin

Jukes and Cantor one-parameter model Simplest model of DNA sequence evolution All substitutions occur with equal probability Only one parameter the rate of change of one nucleotide to another,  3 = rate of change of nucleotide to any other

A G After one unit of time C T  purines     Probability that A stays as A = 1-3 Probability that A changes to T =      pyrimidines C T 

Compare number of nucleotide differences between sequences Fewer differences = shorter time Only four possible states (A,C,T,G) Under this simple model, all sequences will retain 25% identity at equilibrium After some time can no longer estimate relationships

Kimura’s two-parameter model Two kinds of nucleotide Purine: A, G Pyrimidine: C, T Model says that a purine is more likely to be replaced with another purine than by a pyrimidine Transitions : replace like with like Transversions: replace purine with pyrimidine, or vice versa Supported by observation: transitions are more common than transversions

Transversion = rate of  C T  purines A G Transition = rate of  Transversion = rate of      pyrimidines C T 

Number of substitutions between two DNA sequences For two sequences of length N, count the number of differences n n/N = percent identity of the sequences BUT, there is a chance that the same position of the sequence was changed more than once e.g., observe A at position 10 in one sequence, and T in the other A  T A  C  T

Multiple Hits 8 substitution events (arrows) Only 6 differences (dots) Time 8 substitution events (arrows) Only 6 differences (dots)

Multiple Hits When degree of divergence is high: Observe n differences, but these are the result of >> n changes By simply counting the differences one can greatly underestimate the amount of divergence of the sequences

Expected (Molecular clock) Number of differences Observed time

Violations of assumptions of models Rate of substitution is not always the same for all sites Some sites are not independent Interacting sites may require complementary mutations (e.g., in a hairpin structure, in protein 3D structure)

CpG dinucleotides Susceptible to methylation Easily deaminated to give Thymine Results in GT mismatch Example of particular sites with a high mutation rate CpG = C followed by G in the 5’ – 3’ direction This dinucleotide is particularly susceptible to methylation (addition of MH3 group) of the C The methylated C is easily deaminated to T Results in TG mismatch May be corrected to CG, or to TA 5’ ...T G... 3’ 3’ ...G C... 5’

Molecular Coevolution (Inter-protein)

Molecular Coevolution (Intra-protein) Amino acids within proteins do not evolve in isolation but they rather form part of complex intra-protein evolutionary units

Coevolution leads to phylogenetic mirroring

Coevolution results from coadaptation

Models of coevolution Models of molecular covariation help understand how proteins evolve and identify residues that may be functionally or structurally linked