Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.

Slides:



Advertisements
Similar presentations
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Phylogenetic Analysis
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Input and output. What’s in PHYLIP Programs in PHYLIP allow to do parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Bioinformatics and Phylogenetic Analysis
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Maximum parsimony Kai Müller.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetic Analysis Dayong Guo. Introduction Phylogenetics is the study of evolutionary relatedness among various species, populations, or among a set.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
GENE 3000 Fall 2013 slides wiki. wiki. wiki.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
Introduction to Bioinformatics Resources for DNA Barcoding
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Molecular Evolution.
Chapter 19 Molecular Phylogenetics
Bill Bruno Brian Foley Thomas Leitner Theoretical Biology & Biophysics
#30 - Phylogenetics Distance-Based Methods
Phylogeny.
Molecular data assisted morphological analyses
Presentation transcript:

Phylogenetic analyses Kirsi Kostamo

The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species, etc.) and to study the reliability of the consensus tree.

Assumptions Evolution produces dichotomous branching Evolution is simple – the best explanation assumes least mutations

A phylogeographic tree is a mathematical model of evolution

Parts of a phylogenetic tree Node Root Outgroup Ingroup Branch

Tree structure A tree can be also presented in a text format: (A(B(C,D))) The graphic structure can be difficult to interpret (2-dimentional)

Analyses 1. Choosing the sequence type 2. Alignment of sequence data 3. Search for the best tree 4. Evaluation of tree reproducibility

Analyses can be based on: Differences in DNA-sequence structure Distance matrix between sequences Restriction data Allele data

Methods Distance matrix Maximum parsimony Minimum distance

Distance matrix A distance matrix is calculated from the sequence dataset Algorithms: Fitch-Margoliash, Neighbor-Joining or UPGMA in tree building Simple, finds only one tree Somewhat old-fashioned (OK if your alignment is good and evolutionary distances are short)

Maximum parsimony Finds the optimum tree by minimizing the number of evolutionary changes No assumptions on the evolutionary pattern May oversimplify evolution May produce several equally good trees

Maximum likelihood The best tree is found based on assumptions on evolution model Nucleotide models more advanced at the moment than aminoacid models Programs require lot of capacity from the system

Algorithms used for tree searching Exhaustive search: all possibilities → best tree → requires lots of time and computer resources Branch and Bound: a tree is built according to the model given → the tree is compared to the next tree while its constructed → if the first tree is better the second tree is abandoned → third tree… → best possible tree Heuristic Search: only the most likely options → saves time and resources, does not always result in the best tree

Bootstrapping Evaluation of the tree reliability n number of trees are built (n=100/1000/5000) → How many times a certain branch is reproduced Values between (%)

Programs in sequence analyses Kirsi Kostamo

Programs Most programs freeware – can be obtained from the internet Designed to address particular questions – generally you need several small programs for the whole analysis Lots of bugs and restrictions Use Notepad/Textpad if you need to open the files at any time

Quality of sequencing data

Assessing sequence quality Chromas Assess sequence quality, make corrections into the sequence

A Two As or only one?

Chromas Reverse and compliment the sequence Export sequences in plain text in Fasta, EMBL, GenBank or GCG format Copy the sequences in plain text or Fasta format into other software applications

BioEdit Joining different parts of a sequence together (consensus sequence) Sequence alignments (manual vs. ClustalW) Alignments up to sequences Export in GenBank, Fasta, or PHYLIP format

Sequence alignment Finding similar nucleotide composition for further analysis Manually: can take weeks ClustalW Check the alignment made by ClustalW You may have to go back to Chromas to check the sequences once again

Analysing the aligned sequence matrix PHYLIP POY PAUP, GCG And many more... (274 software packages described at one website)

PHYLIP (Phylogeny Inference Package) Available free in Windows/MacOS/Linux systems Parsimony, distance matrix and likelihood methods (bootstrapping and consensus trees) Data can be molecular sequences, gene frequencies, restriction sites and fragments, distance matrices and discrete characters

Visualising trees Treeview You can change the graphic presentation of a tree (cladogram, rectangular cladogram, radial tree, phylogram), but not change the structure of a tree

POY (Phylogenetic Analysis Using Parsimony) Cladistic and phylogenetic analysis using sequence and/or morphological data Finding among all possible trees, those that exhibit minimal edit costs (minimum number of mutations) Is able to assess directly the number of DNA sequence transformations, evolutionary events, required by a tree topology without the use of multiple sequence alignment CSC