Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.

Similar presentations


Presentation on theme: "Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence."— Presentation transcript:

1 Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence similarity (closer in evolutionary time) with archaeal genes Found yeast mitochondrial genes exhibit more sequence similarity with eubacterial genes

2 t-test and significance t-test determines if the data come from the same population or if there are significant differences Calculate the mean of data, standard deviation of each data set, derive a weighted standard deviation to be used in t-test Compare to t-critical value obtained from t- table or software

3 Origins of eukaryotic cells

4 Martin-Muller hypothesis Martin and Muller hypothesis

5 Evidence from phylogenetic relationships

6 Leprae vs. tuberculosis Leprae (3.2Mb) is ~50% coding, contrasted with 4.4 Mb and 91% coding for tuberculosis Comparing genomes using Mummer: http://www.tigr.org/tigr- scripts/CMR2/webmum/mumplothttp://www.tigr.org/tigr- scripts/CMR2/webmum/mumplot

7 How Mummer works: Uses suffix trees to create an internal representation of a genome sequence Identify maximal unique matches (MUM); version 2.0 uses streaming whereas 1.0 adds sequence 2 to suffix tree for sequence 1 Alignment via Smith-Waterman

8 Origin of species Mitochondrial DNA and human evolution Evolution of pathogens

9 Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences

10 Why phylogenetics? Understand evolutionary history Map pathogen strain diversity for vaccines Assist in epidemiology (Dentist and HIV) Aid in prediction of function of novel genes Biodiversity Microbial ecology

11 Changes can occur

12 Observing differences in nucleotides The simplest measure of distance between two sequences is to count the # of sites where the two sequences differ If all sites are not equally likely to change, the same site may undergo repeated substitutions As time goes by, the number of differences between two sequences becomes less and less an accurate estimator of the actual number of substitutions that have occurred

13 The relationship between time and substitutions is non-linear

14 Various models have been generated to more accurately estimate distance and evolution All use the following framework: Probability matrix p AC is the probability of a site starting with an A had a C at the end of time interval t, etc. Base composition of sequence; f a = frequency of A

15 Jukes-Cantor Model Distance between any two sequences is given by: d = -3/4 ln(1-4/3p) p is the proportion of nucleotides that are different in the two sequences All substitutions are equally probable –Each position in matrix =  ; except diagonal = 1- 

16 Kimura’s two parameter model d = ½ ln[1/(1-2P-Q)] + ¼ ln[1/1-2Q)] P and Q are proportional differences between the two sequences due to transitions and transversions, respectively. Accounts for transition bias in sequences (transversions more rare)

17 Evolutionary models

18 Implementing models and building trees

19 Rooted vs. unrooted Root – ancestor of all taxa considered Unrooted – relationship without consideration of ancestry Often specify root with outgroup –Outgroup – distantly related species (ie. mammals and an archaeal species)

20 Tree building Get protein/RNA/DNA sequences Construct multiple sequence alignment Compute pairwise distances (if necessary) Build tree – topology and distances Estimate reliability Visualize

21 Distance methods UPMGA Neighbor joining

22 Unweighted pair-group method using arithmetic averages (UPGMA) Assumes a constant rate of gene substitution, evolution Clustering algorithm that measures distances between all sequences, merges the closest pair, recalculates that node as an average, then merges the next closest pair, re-iterate Usually gives a rooted tree

23 Testing the reliability of trees Interior branch test or Bootstrap analysis Bootstrap analysis – subsequences or sequence deletion or replacement; re-draw trees; how many times do you get some branching? Bootstrap values of 70 (95) or greater are normally considered reliable

24 Homework due on 10/6 Discovery questions in Chapter 2 4, 25-27


Download ppt "Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence."

Similar presentations


Ads by Google