Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.

Slides:



Advertisements
Similar presentations
Bootstrapping (non-parametric)
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Molecular Evolution Revised 29/12/06
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Resampling techniques
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
Understanding Research Results. Effect Size Effect Size – strength of relationship & magnitude of effect Effect size r = √ (t2/(t2+df))
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Richard M. Jacobs, OSA, Ph.D.
Processing & Testing Phylogenetic Trees. Rooting.
Maximum parsimony Kai Müller.
Chapter 1: Introduction to Statistics
Terminology of phylogenetic trees
Molecular phylogenetics
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Model Building III – Remedial Measures KNNL – Chapter 11.
Chapter 15 Data Analysis: Testing for Significant Differences.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.
Tree Confidence Have we got the true tree? Use known phylogenies Unfortunately, very rare Hillis et al. (1992) created experimental phylogenies using phage.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
Chapter 16 The Chi-Square Statistic
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Resampling techniques
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.
Processing & Testing Phylogenetic Trees. Rooting.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
1.  The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring* proportions in a.
Evaluating the Fossil Record with Model Phylogenies Cladistic relationships can be determined without ideas about stratigraphic completeness; implied gaps.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Quantifying Uncertainty
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Review Statistical inference and test of significance.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Lecture 14 – Consensus Trees & Nodal Support
Correlation and Linear Regression
SUPPORT and RESAMPLING
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Inferential Statistics:
Summary and Recommendations
Assessing Phylogenetic Hypotheses and Phylogenetic Data
Assessing Phylogenetic Hypotheses and Phylogenetic Data
Lecture 7 – Algorithmic Approaches
Lecture 14 – Consensus Trees & Nodal Support
Phylogenetic Trees Jasmin sutkovic.
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Tree Evaluation Tree Evaluation

Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is whether we can have any confidence that the results of a cladistic analysis are, in some sense, ‘real’ and not just by-products of chance. The concept of cladistic structure can be studied from 2 viewpoints: - Assign confidence to the best cladogram as a whole. - Examine support afforded to individual clades within the best cladogram.

Confidence of Clades Several methods have been proposed that attach numerical values to internal branches in trees that are intended to provide some measure of the strength of support for those branches and the corresponding groups. These methods include: - character resampling methods: bootstrap and jackknife - decay analyses

Bootstrapping Bootstrapping (bootstrap analysis) is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter

Bootstrapping Characters are resampled with replacement to create many bootstrap replicate data sets

Original data matrix Bootstrap – Resampling with Replacement a b c d e f g h i j Taxon A Taxon B Taxon C Taxon D Taxon E g b c e f f b h a d Taxon A Taxon B Taxon C Taxon D Taxon E h f a b g a h c i h Taxon A Taxon B Taxon C Taxon D Taxon E New data matrices

Bootstrapping Each bootstrap replicate data set is analysed (e.g. with MP, ML, distance). Agreement among the resulting trees is summarized into a majority-rule consensus tree (usually 50% majority-rule). Frequency of occurrence of groups (bootstrap support) is a measure of support for those groups. Additional information is given in partition table.

Bootstrap value A B C D E 55% 96% 76%

Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Freq ** ** ** **** ****** ** ****.* ***** ******* **....* **.....* 1.00 Partition Table Ochromonas (1) Symbiodinium (2) Prorocentrum (3) Euplotes (8) Tetrahymena (9) Loxodes (4) Tracheloraphis (5) Spirostomum (6) Gruberia (7)

High bootstrap support (BS) (e.g. > 75%) is indicative of strong ‘signal’ in the data. Provided we have no evidence of strong misleading signal (e.g. base composition biases, great differences in branch lengths) high BS is likely to reflect strong phylogenetic signal. Low BS needs not mean the relationship is false, only that it is poorly supported. Bootstrap - Interpretation

Jackknifing Jackknifing is very similar to bootstrapping and differs only in the character resampling strategy. Some proportion of characters are randomly selected and deleted. Therefore, the size of new data matrices is smaller than the original matrix. Jackknife value is obtained by observing the number of characters being dropped to collapse a clade.

Jackknifing If 2 out of 15 informative characters are dropped to collapse a clade, the jackknife value will be 2/15 = Therefore, the higher the jackknife value the higher the support for the clade. Jackknifing and bootstrapping tend to produce broadly similar results and have similar interpretations.

Jackknife A B C D E

Decay analysis In parsimony analysis, a way to assess support for a group is to see if the group also occurs in slightly less parsimonious trees. The length difference between the shortest trees including the group and the shortest trees that exclude the group (the extra steps required to overturn a group) is the decay index or Bremer support. Decay indices for each clade can be determined by saving increasingly less parsimonious trees and producing corresponding strict consensus trees until the consensus is completely unresolved.

Decay Values A B C D E 1 6 4

Decay indices - Interpretation Generally, the higher the decay index the better the relative support for a group. Decay indices are not scaled (0-100) and it is less clear what is an acceptable decay index. Magnitude of decay indices and bootstrap/jackknife values generally correlated (i.e. they tend to agree). Only groups found in all most parsimonious trees have decay indices > zero.