The Phylogeny of a Dataset Andrea K Thomer & Nicholas M. Weber Center for Informatics Research in Science and Scholarship Graduate School of Library and.

Slides:



Advertisements
Similar presentations
LG 4 Outline Evolutionary Relationships and Classification
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetic reconstruction
Reconstructing and Using Phylogenies
Molecular Evolution Revised 29/12/06
Objective: I create a cladogram using traits that have evolved. Agenda: 1.Test Corrections 2. Bell Ringer 3. Classification Vocab 4. Cladogram notes 5.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Bioinformatics and Phylogenetic Analysis
Classification and phylogeny
Classification and Phylogenies Taxonomic categories and taxa Inferring phylogenies –The similarity vs. shared derived character states –Homoplasy –Maximum.
Phylogeny and the Tree of Life
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
How classification works
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
A Remarkable Record of Science for Change Since 1967.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Introduction to Phylogenetics
PHYLOGENY AND THE TREE OF LIFE Chapter 26 Sections 1-3 and 6.
UNIT 6 - Evolution SWBAT compare the relatedness of various species by applying taxonomic principles (cladistics, phylogeny, morphology and DNA.
Chapter 14 Table of Contents
GENE 3000 Fall 2013 slides wiki. wiki. wiki.
Phylogenies Reconstructing the Past. The field of systematics Studies –the mechanisms of evolution evolutionary agents –the process of evolution speciation.
Phylogeny & the Tree of Life
PHYLOGENY AND THE TREE OF LIFE CH 26. I. Phylogenies show evolutionary relationships A. Binomial nomenclature: – Genus + species name Homo sapiens.
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Classification.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
The Proliferation of Metadata Standards and the Evolution of NASA’s Global Change Master Directory (GCMD) Standard for Uses in Earth Science Data Discovery.
{ Early Earth and the Origin of Life Chapter 15.  The Earth formed 4.6 billion years ago  Earliest evidence for life on Earth  Comes from 3.5 billion-year-old.
Phylogeny.
Phylogeny and Systematics Phylogeny Evolutionary history of a species of a group of related species Information used to construct phylogenies.
How Biologists Classify Organisms Section What Is a Species? In 1942, the biologist Ernst Mayr of Harvard University proposed the biological species.
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
Phylogeny & Systematics The study of the diversity and relationships among organisms.
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Phylogeny & the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Cladistics (Ch. 22) Based on phylogenetics – an inferred reconstruction of evolutionary history.
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Unit Two, Day 8 Cladograms.
18.2 Modern Systematics I. Traditional Systematics
Chapter 25 – Phylogeny & Systematics
Phylogeny and the Tree of Life
Chapter 19 Molecular Phylogenetics
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Phylogenetics Chapter 26.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogenetic Trees Jasmin sutkovic.
Phylogeny and the Tree of Life
Cladistics 5.4.
1 2 Biology Warm Up Day 6 Turn phones in the baskets
Evolution Biology Mrs. Johnson.
Presentation transcript:

The Phylogeny of a Dataset Andrea K Thomer & Nicholas M. Weber Center for Informatics Research in Science and Scholarship Graduate School of Library and Information Science University of Illinois at Urbana-Champaign

Time

How do we understand the evolution of digital objects? Time

How do we understand the evolution of digital objects when they are complexly interrelated? c/o Steve Worley, NCAR

Evolution as a tree From

tl;dr

1)Biologists construct evolutionary trees by comparing animals’ traits and inferring how they may have evolved

tl;dr 1)Biologists construct evolutionary trees by comparing animals’ traits and inferring how they may have evolved 2)And there’s lots of free, open source software available for this work.

Why not datasets? (which, like organisms, also often lack explicit documentation…) Cornets (Tëmkin & Eldredge, 2007) “Little Red Riding Hood” (Tehrani, 2013) Non-biological evolution

A phylogenetic approach helps us: Study evolution of digital objects more rigorously Model how digital objects are reworked into new “species” Understand what properties of a digital object must be preserved or expressed to facilitate modeling We ask: In a digital object, what properties lead to evolutionary fitness?

Dataset of datasets: COADS, ICOADS and its derivatives (I)COADS= (International) Comprehensive Ocean and Atmosphere Dataset Community project bringing together 1000s of marine surface measurements from buoys, ship’s logs, more – First release: 1987 – New releases as new datasets are added; now at 2.5 Enormously modified & reused by others in climate science

Towards a more rigorous view of the evolutionary process: anagenesis and phylogenesis ICOADS documentation largely describes anagenesis (versioning) GCMD* = 1 of many potential sources of data on phylogenesis (branching) – Found 99 metadata records versions/derivatives of ICOADS (“specimens”) through keyword search – Metadata includes scientific paramaters, geographic scope, instruments used, more *known problems in metadata quality, but value in GCMD is breadth rather than depth

Workflow Download records Create character matrix Create a NEXUS file Assess the tree!

Workflow Download records Create character matrix Create a NEXUS file Assess the tree!

Identifying “characters” In phylogenetics: characters are morphological features, DNA, other measurable qualities In ICOADS datasets: we treated each metadata field as a character, and each term as a character state

Dates, times, resolution are “binned” into categories Parameters are split into individual categories, and presence/absence are noted in binary

Method: * Software: PAUP* (Phylogenetic Analysis Using Parsimony *and other methods) Maximum Likelihood algorithm (we can talk about that more if people are interested). Result:

Phylogeny of ICOADS datasets Each fork = a “speciation event” Each group joined at a node = a “clade” – We annotated primary clades

Related datasets cluster; some clades show up as derived from “ancestral” forms – Clade 1 – original COADS datasets – Clade 2 – ICOADS input datasets – Clade 3 – Sea surface flux calculations – Clade 4 – later COADS data products – Clade 5 – COADS derivatives

Why does it matter that digital objects evolve? Or how? Digital preservation implications – A way to understand the history and contents of a collection – Could be used to browse repositories? – Could be used to complement citation analysis? Offers a lens into cooperative processes that create objects – A way to “read” interplay of different scientific cultures

Challenges and areas for future work What existing statistical models of evolution are most appropriate for this? Or do we need to develop a new one? How can existing software be modified for this work? How do we show reticulating relationships?

Future work: Phylogenies showing hybridization & ‘spontaneous generation’

Future work: what makes a dataset “fit”? Part of ICOADS success and proliferation is surely due to low levels of “competition” – But is some of it due to its open availability? – How do we test the effects of openness on a dataset’s fitness-for-purpose?

Acknowledgements Thanks to Julie Allen, Peter Fox and Steve Worley for feedback, and our reviewers for excellent comments. Thanks to CIRSS and the DCERC program for funding

References & Additional Reading Datasets mentioned in this talk: Howe, C. J., & Windram, H. F. (2011). Phylomemetics-- evolutionary analysis beyond the gene. PLoS Biology, 9(5), e doi: /journal.pbio O’Brien, M. J., Darwent, J., & Lyman, R. L. (2001). Cladistics Is Useful for Reconstructing Archaeological Phylogenies: Palaeoindian Points from the Southeastern United States. Journal of Archaeological Science, 28(10), 1115–1136. doi: /jasc Tehrani JJ (2013) The Phylogeny of Little Red Riding Hood. PLoS ONE 8(11): e doi: /journal.pone Tëmkin, I., & Eldredge, N. (2007). Phylogenetics and Material Cultural Evolution. Current Anthropology, 48(1), 146–154.

Homology

Future work: Phylogenies showing hybridization & ‘spontaneous generation’