Phylogenic trees..

Slides:



Advertisements
Similar presentations
LG 4 Outline Evolutionary Relationships and Classification
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Introduction Classification Phylogeny Cladograms Quiz
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Tree of Life Chapter 26.
Nomenclature is the science of naming organisms Evolution has created an enormous diversity, so how do we deal with it? Names allow us to talk about groups.
Phylogenetic reconstruction
Reconstructing and Using Phylogenies
PHYLOGENY AND SYSTEMATICS
Phylogeny and Systematics By: Ashley Yamachika. Biologists use systematics They use systematics as an analytical approach to understanding the diversity.
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Systematics The study of biological diversity in an evolutionary context.
CS 177 Phylogenetics I Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Model of sequence evolution Phylogenetic trees.
Terminology of phylogenetic trees
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Systematics and the Phylogenetic Revolution Chapter 23.
Introduction to Phylogenetics
PHYLOGENY and SYSTEMATICS CHAPTER 25. VOCABULARY Phylogeny – evolutionary history of a species or related species Systematics – study of biological diversity.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogenies Reconstructing the Past. The field of systematics Studies –the mechanisms of evolution evolutionary agents –the process of evolution speciation.
Phylogeny & the Tree of Life
PHYLOGENY AND THE TREE OF LIFE CH 26. I. Phylogenies show evolutionary relationships A. Binomial nomenclature: – Genus + species name Homo sapiens.
Classification and Phylogenetic Relationships
Chapter 25: Phylogeny and Systematics. “Taxonomy is the division of organisms into categories based on… similarities and differences.” p. 495, Campbell.
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Phylogeny & Systematics The study of the diversity and relationships among organisms.
Classification, Taxonomy and Patterns of Organization Unit 1.4.
Section 2: Modern Systematics
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny & the Tree of Life
Phylogeny and the Tree of Life
Phylogenetics
PHYLOGENY evolution means organisms are related
Section 2: Modern Systematics
Phylogeny and Systematics
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Topics Need for systematics Applications of systematics
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Hierarchical Classification vs. Systematics
Chapter 26 Phylogeny.
Ch. 4 Taxonomy and Phylogeny of Animals
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Chapter 25 Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and Systematics
18.2 Modern Systematics I. Traditional Systematics
Chapter 25 – Phylogeny & Systematics
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
LECTURE 1: Phylogeny and Systematics
Phylogeny and the Tree of Life
Chapter 26- Phylogeny and Systematics
Phylogenetics Chapter 26.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Chapter 26 Phylogeny and the Tree of Life
Chapter 20 Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Presentation transcript:

Phylogenic trees.

Phylogenetic Inference Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Homology and homoplasy Model of sequence evolution Tree building methods Phylogenetic networks Computer software and demos DNA/RNA overview

Classifying Organisms Nomenclature is the science of naming organisms Names allow us to talk about groups of organisms. - Scientific names were originally descriptive phrases; not practical Binomial nomenclature Developed by Linnaeus, a Swedish naturalist Names are in Latin, formerly the language of science binomials - names consisting of two parts The generic name is a noun. The epithet is a descriptive adjective. Thus a species' name is two words e.g. Homo sapiens DNA/RNA overview Carolus Linnaeus (1707-1778)

Classifying Organisms Taxonomy is the science of the classification of organisms Taxonomy deals with the naming and ordering of taxa. The Linnaean hierarchy: 1. Kingdom 2. Division 3. Class 4. Order 5. Family 6. Genus 7. Species The difference between classification and identification DNA/RNA overview Evolutionary distance

Classifying Organisms Systematics is the science of the relationships of organisms Systematics is the science of how organisms are related and the evidence for those relationships Systematics is divided primarily into phylogenetics and taxonomy Speciation -- the origin of new species from previously existing ones - anagenesis - one species changes into another over time - cladogenesis - one species splits to make two DNA/RNA overview Reconstruct evolutionary history Phylogeny

Phylogenetics Phylogenetics is the science of the pattern of evolution. A. Evolutionary biology is the study of the processes that generate diversity, while phylogenetics is the study of the pattern of diversity produced by those processes. B. The central problem of phylogenetics: 1. How do we determine the relationships between species? 2. Use evidence from shared characteristics, not differences 3. Use homologies, not analogies 4. Use derived condition, not ancestral a. synapomorphy - shared derived characteristic b. plesiomorphy - ancestral characteristic C. Cladistics is phylogenetics based on synapomorphies. 1. Cladistic classification creates and names taxa based only on synapomorphies. 2. This is the principle of monophyly 3. monophyletic, paraphyletic, polyphyletic 4. Cladistics is now the preferred approach to phylogeny Review of protein structures Need for analyses of protein structures Sources of protein structure information Computational Modeling DNA/RNA overview The phylogeny and classification of life as proposed by Haeckel (1866)

Phylogenetics Evolutionary theory states that groups of similar organisms are descended from a common ancestor. Phylogenetic systematics is a method of taxonomic classification based on their evolutionary history. It was developed by Hennig, a German entomologist, in 1950. DNA/RNA overview Willi Hennig (1913-1976)

Phylogenetics Who uses phylogenetics? Some examples: Evolutionary biologists (e.g. reconstructing tree of life) Systematists (e.g. classification of groups) Anthropologists (e.g. origin of human populations) Forensics (e.g. transmission of HIV virus to a rape victim) Parasitologists (e.g. phylogeny of parasites, co-evolution) Epidemiologists (e.g. reconstruction of disease transmission) Genomics/Proteomics (e.g. homology comparison of new proteins) DNA/RNA overview

Phylogenetic trees The central problem of phylogenetics: how do we determine the relationships between taxa? DNA/RNA overview in phylogenetic studies, the most convenient way of presenting evolutionary relationships among a group of organisms is the phylogenetic tree

Phylogenetic trees Node: a branchpoint in a tree (a presumed ancestral OTU) Branch: defines the relationship between the taxa in terms of descent and ancestry Topology: the branching patterns of the tree Branch length (scaled trees only): represents the number of changes that have occurred in the branch Root: the common ancestor of all taxa Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all their descendents Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as individuals, populations, species, genera, or bacterial strains Branch DNA/RNA overview Node Clade Root

= Phylogenetic trees There are many ways of drawing a tree DNA/RNA overview

= = = Phylogenetic trees There are many ways of drawing a tree DNA/RNA overview

= / Phylogenetic trees There are many ways of drawing a tree Bifurcation Trifurcation DNA/RNA overview Bifurcation versus Multifurcation (e.g. Trifurcation) Multifurcation (also called polytomy): a node in a tree that connects more than three branches. If the tree is rooted, then one of the branches represents an ancestral lineage and the remaining branches represent descendent lineages. A multifurcation may represent a lack of resolution because of too few data available for inferring the phylogeny (in which case it is said to be a soft multifurcation) or it may represent the hypothesized simultaneous splitting of several lineages (in which case it is said to be a hard multifurcation).

Phylogenetic trees Trees can be rooted or unrooted DNA/RNA overview

Summary Trees can be scaled or unscaled (with or without branch lengths) DNA/RNA overview

Phylogenetic trees Exercise: rooted/unrooted; scaled/unscaled A B C D DNA/RNA overview D E F

Phylogenetic trees Possible evolutionary trees Taxa (n): 2 3 4 Unrooted/rooted 2 1/1 3 1/3 4 3/15 DNA/RNA overview

Phylogenetic trees Possible evolutionary trees Taxa (n) rooted (2n-3)!/(2n-2(n-2)!) unrooted (2n-5)!/(2n-3(n-3)!) 2 1 3 4 15 5 105 6 954 7 10,395 8 135,135 9 2,027,025 10 34,459,425 DNA/RNA overview

Phylogenetic trees Rooting using outgroup(s) the outgroup should be a taxon known to be less closely related to the rest of the taxa (ingroups) it should ideally be as closely related as possible to the rest of the taxa while still satisfying the above condition the root must be somewhere between the outgroup and the ingroup (either on a node or in a branch) DNA/RNA overview Note the outgroup is not the root or the ancestor itself!

Phylogenetics What are useful characters? Cactaceae and Euphorbiaceae Use homologies, not analogies! Homology: common ancestry of two or more character states Analogy: similarity of character states not due to shared ancestry - Homoplasy: a collection of phenomena that leads to similarities in character states for reasons other than inheritance from a common ancestor (e.g. convergence, parallelism, reversal) Homoplasy is huge problem in morphology data sets! But in molecular data sets, too! DNA/RNA overview Cactaceae and Euphorbiaceae

Phylogenetics Molecular data and homoplasy: Orthologs vs. Paralogs When comparing gene sequences, it is important to distinguish between identical vs. merely similar genes in different organisms Orthologs are homologous genes in different species with analogous functions Paralogs are similar genes that are the result of a gene duplication A phylogeny that includes both orthologs and paralogs is likely to be incorrect Sometimes phylogenetic analysis is the best way to determine if a new gene is an ortholog or paralog to other known genes DNA/RNA overview

Phylogenetic methods Cladistics versus Phenetics Within the field of taxonomy there are two different methods and philosophies of building phylogenetic trees: cladistic and phenetic Both phenetic and cladistic methods rely on data (objective methods) evolution is descent with modification so the characteristics of organisms hold information about evolutionary relationships objective analysis of character variation is the foundation of modern phylogenetics Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data

Phenetics vs. cladistics An example

Phenetics vs. cladistics An example Three hypotheses:

Phenetics vs. cladistics Phenetic (overall similarity) overall similarity

Phenetics vs. cladistics Cladistics (shared derived characters) shared derived characters

Phenetics vs. cladistics Difference between methods is more than academic: consider how different hypotheses might affect a search for natural products Phenetics Cladistics

Phenetics vs. cladistics - Relies on character data - Faster algorithms - Popular for molecular evolution - Construct phenograms without recourse to history - Employ distance methods - Each character difference counted equally – large changes have large effects Cladistics - Relies on knowledge of ancestral relationships - Good for physical traits - Good for deeper levels of taxonomy - All assumptions difficult to satisfy for molecular data - Constructs cladogram considering possible evolutionary pathways - Must specify ancestral and derived sequences Cladistics is becoming the method of choice; it is considered to be more powerful and to provide more realistic estimates, however, it is slower than phenetic algorithms

Phylogenetics Genes vs. Species Relationships calculated from sequence data represent the relationships between genes, this is not necessarily the same as relationships between species. Your sequence data may not have the same phylogenetic history as the species from which they were isolated Different genes evolve at different speeds, and there is always the possibility of horizontal gene transfer (hybridization, vector mediated DNA movement, or direct uptake of DNA). DNA/RNA overview

Phylogenetic Inference After working with sequences for a while, one develops an intuitive understanding that for a given gene, closely related organisms have similar sequences and more distantly related organisms have more dissimilar sequences. These differences can be quantified. Given a set of gene sequences, it should be possible to reconstruct the evolutionary relationship among genes and among organisms. DNA/RNA overview

Phylogenetic Inference Disclaimers Before describing any theoretical or practical aspects of phylogenetics, it is necessary to give some disclaimers. This area of computational biology is an intellectual minefield! Neither the theory nor the practical applications of any algorithms are universally accepted throughout the scientific community. The application of different software packages to a data set is very likely to give different answers; minor changes to a data set are also likely to profoundly change the result. DNA/RNA overview

Phylogenetic Inference Which gene to use? Different genes will be best suited to solve different problems:  helix  sheet - the RNA genomes of HIV viruses change so quickly that every person infected carries a different strain - certain enzymes may evolve relatively fast to allow for phylogeographic studies of species distribution post-glaciation - mitochondrial DNA has a relatively fast substitution rate (evolves quickly) – can be used to establish relatively recent divergence - for establishing ‘deep phylogeny’ we need genes that change very slowly (highly conserved ones) - different sequences accumulate changes at different rates - chose level of variation that is appropriate to the group of organisms being studied. - proteins (or protein coding DNAs) are constrained by natural selection - some sequences are highly variable (rRNA spacer regions, immunoglobulin genes), while others are highly conserved (actin, rRNA coding regions) - different regions within a single gene can evolve at different rates (conserved vs. variable domains) DNA/RNA overview

Phylogenetic Inference I Are there Correct trees??  helix  sheet Despite all of all problems, it is actually quite simple to use computer programs calculate phylogenetic trees for data sets Provided the data are clean, outgroups are correctly specified, appropriate algorithms are chosen, no assumptions are violated, etc., can the true, correct tree be found and proven to be scientifically valid? Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered DNA/RNA overview

Phylogenetics What are useful characters? Use homologies, not analogies! - Homology: common ancestry of two or more character states Analogy: similarity of character states not due to shared ancestry Homoplasy: a collection of phenomena that leads to similarities in character states for reasons other than inheritance from a common ancestor (e.g. convergence, parallelism, reversal) Use derived condition, not ancestral - Synapomorphy (shared derived character): homologous traits share the same character state because it originated in their immediate common ancestor Plesiomorphy (shared ancestral character”): homologous traits share the same character state because they are inherited from a common distant ancestor DNA/RNA overview

Phenetics versus cladistics Within the field of taxonomy there are two different methods and philosophies of building phylogenetic trees: cladistic and phenetic Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes; phenograms are based on overall similarity Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data; cladograms are based on character evolution (e.g. shared derived characters) Cladistics is becoming the method of choice; it is considered to be more powerful and to provide more realistic estimates, however, it is slower than phenetic algorithms

Genetic Distance DNA distances - Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences - Insertion/deletions are generally given a larger weight than replacements (gap penalties) Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites - The distance matrix (rectangular or triangular): 7 Rat 0.0000 0.0646 0.1434 0.1456 0.3213 0.3213 0.7018 Mouse 0.0646 0.0000 0.1716 0.1743 0.3253 0.3743 0.7673 Rabbit 0.1434 0.1716 0.0000 0.0649 0.3582 0.3385 0.7522 Human 0.1456 0.1743 0.0649 0.0000 0.3299 0.2915 0.7116 Opossum 0.3213 0.3253 0.3582 0.3299 0.0000 0.3279 0.6653 Chicken 0.3213 0.3743 0.3385 0.2915 0.3279 0.0000 0.5721 Frog 0.7018 0.7673 0.7522 0.7116 0.6653 0.5721 0.0000 DNA/RNA overview Uncorrected (observed) distance: p-distance Corrected (estimated) distance: d-distance

Tree building methods Genetic Distance Unweighted Pair Group (UPGMA) Character-State Unweighted Pair Group (UPGMA) Neighbor-Joining Fitch & Margoliash Maximum Parsimony Maximum Likelihood DNA/RNA overview

Tree building (distance based) UPGMA - The simplest of the distance methods is the UPGMA (Unweighted Pair Group Method using Arithmetic averages) Many multiple alignment programs such as PILEUP use a variant of UPGMA to create a dendrogram of DNA sequences which is then used to guide the multiple alignment algorithm DNA/RNA overview

UPGMA A B C D E F G - 63 94 79 111 96 47 67 16 83 100 23 58 89 106 62 107 92 43 20 102 DNA/RNA overview

UPGMA A B C D E F G - 63 94 79 111 96 47 67 16 83 100 23 58 89 106 62 107 92 43 20 102 DNA/RNA overview

UPGMA A B C E F DG - 63 94 79 67 16 83 23 58 89 62 84 35 88 DNA/RNA overview

UPGMA A B E F CDG - 63 67 16 23 58 62 61 64 74 DNA/RNA overview