Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
Phylogenetic reconstruction
We have shown that: To see what this means in the long run let α=.001 and graph p:
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Phylogeny - based on whole genome data
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Day 8,9 Carlow Bioinformatics Phylogenetic inferences Trees.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic basis of systematics
Phylogeny - based on whole genome data
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence similarity (closer in evolutionary time) with archaeal genes Found yeast mitochondrial genes exhibit more sequence similarity with eubacterial genes

t-test and significance t-test determines if the data come from the same population or if there are significant differences Calculate the mean of data, standard deviation of each data set, derive a weighted standard deviation to be used in t-test Compare to t-critical value obtained from t- table or software

Origins of eukaryotic cells

Martin-Muller hypothesis Martin and Muller hypothesis

Evidence from phylogenetic relationships

Leprae vs. tuberculosis Leprae (3.2Mb) is ~50% coding, contrasted with 4.4 Mb and 91% coding for tuberculosis Comparing genomes using Mummer: scripts/CMR2/webmum/mumplothttp:// scripts/CMR2/webmum/mumplot

How Mummer works: Uses suffix trees to create an internal representation of a genome sequence Identify maximal unique matches (MUM); version 2.0 uses streaming whereas 1.0 adds sequence 2 to suffix tree for sequence 1 Alignment via Smith-Waterman

Origin of species Mitochondrial DNA and human evolution Evolution of pathogens

Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences

Why phylogenetics? Understand evolutionary history Map pathogen strain diversity for vaccines Assist in epidemiology (Dentist and HIV) Aid in prediction of function of novel genes Biodiversity Microbial ecology

Changes can occur

Observing differences in nucleotides The simplest measure of distance between two sequences is to count the # of sites where the two sequences differ If all sites are not equally likely to change, the same site may undergo repeated substitutions As time goes by, the number of differences between two sequences becomes less and less an accurate estimator of the actual number of substitutions that have occurred

The relationship between time and substitutions is non-linear

Various models have been generated to more accurately estimate distance and evolution All use the following framework: Probability matrix p AC is the probability of a site starting with an A had a C at the end of time interval t, etc. Base composition of sequence; f a = frequency of A

Jukes-Cantor Model Distance between any two sequences is given by: d = -3/4 ln(1-4/3p) p is the proportion of nucleotides that are different in the two sequences All substitutions are equally probable –Each position in matrix =  ; except diagonal = 1- 

Kimura’s two parameter model d = ½ ln[1/(1-2P-Q)] + ¼ ln[1/1-2Q)] P and Q are proportional differences between the two sequences due to transitions and transversions, respectively. Accounts for transition bias in sequences (transversions more rare)

Evolutionary models

Implementing models and building trees

Rooted vs. unrooted Root – ancestor of all taxa considered Unrooted – relationship without consideration of ancestry Often specify root with outgroup –Outgroup – distantly related species (ie. mammals and an archaeal species)

Tree building Get protein/RNA/DNA sequences Construct multiple sequence alignment Compute pairwise distances (if necessary) Build tree – topology and distances Estimate reliability Visualize

Distance methods UPMGA Neighbor joining

Unweighted pair-group method using arithmetic averages (UPGMA) Assumes a constant rate of gene substitution, evolution Clustering algorithm that measures distances between all sequences, merges the closest pair, recalculates that node as an average, then merges the next closest pair, re-iterate Usually gives a rooted tree

Testing the reliability of trees Interior branch test or Bootstrap analysis Bootstrap analysis – subsequences or sequence deletion or replacement; re-draw trees; how many times do you get some branching? Bootstrap values of 70 (95) or greater are normally considered reliable

Homework due on 10/6 Discovery questions in Chapter 2 4, 25-27