Phylogenetic Trees Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Slides:



Advertisements
Similar presentations
SEL3053: Analyzing Geordie Lecture 13. Hierarchical cluster analysis 2 - cluster tree construction Lecture 12 introduced hierarchical cluster analysis,
Advertisements

Chapter 12 Binary Search Trees
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
REACH-CRC. Lookup Functions INDEX-MATCH LOOKUP Database Functions DSUM DMIN DMAX DCOUNT DAVERAGE.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
A backtrack data structure and algorithm
Chapter 6 Lists and Dictionaries CSC1310 Fall 2009.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Brandon Andrews CS6030.  What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example.
9 Dictionaries © 2010 David A Watt, University of Glasgow Accelerated Programming 2 Part I: Python Programming 1.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Recommendations Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
ABCDEFG A B C D E F G UPGMA: Unweighted Pair-Group.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
PHYLOGENETIC TREES Dwyane George February 24,
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
8 For-Statements © 2010 David A Watt, University of Glasgow Accelerated Programming 2 Part I: Python Programming 1.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Evolutionary tree reconstruction
Dictionaries CS /02/05 L7: Dictionaries Slide 2 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved.
You can sort Access data so you can view records in the order you want to view them, and you can filter data so you only see the records you want to see.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computing Science 1P Large Group Tutorial 14 Simon Gay Department of Computing Science University of Glasgow 2006/07.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Lists Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Tutorial 5 Phylogenetic Trees.
Dictionaries Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Phylogeny - based on whole genome data
Inferring phylogenetic trees: Distance methods
Distance based phylogenetics
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
A Very Basic Gibbs Sampler for Motif Detection
Program Design Invasion Percolation: Resolving Ties
Chapter 6 Transform-and-Conquer
Multiple Alignment and Phylogenetic Trees
Patterns in Evolution I. Phylogenetic
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Phylogeny.
Presentation transcript:

Phylogenetic Trees Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information. Sets and Dictionaries

Phylogenetic Trees Some organisms are more alike than others

Sets and DictionariesPhylogenetic Trees Some organisms are more alike than others

Sets and DictionariesPhylogenetic Trees Some organisms are more alike than others

Sets and DictionariesPhylogenetic Trees Some organisms are more alike than others

Sets and DictionariesPhylogenetic Trees Some organisms are more alike than others

Sets and DictionariesPhylogenetic Trees Some organisms are more alike than others ?

Sets and DictionariesPhylogenetic Trees Nothing in biology makes sense except in the light of evolution. — Theodosius Dobzhansky The closer their DNA, the more recently they had a common ancestor Reconstruct their evolutionary tree using a hierarchical clustering algorithm

Sets and DictionariesPhylogenetic Trees Calculate the common ancestor of the two closest

Sets and DictionariesPhylogenetic Trees Then of the next closest

Sets and DictionariesPhylogenetic Trees And the next

Sets and DictionariesPhylogenetic Trees Combine organisms with common ancestors

Sets and DictionariesPhylogenetic Trees And common ancestors with each other

Sets and DictionariesPhylogenetic Trees To construct a complete tree

Sets and DictionariesPhylogenetic Trees Redraw

Sets and DictionariesPhylogenetic Trees Redraw using height to show number of differences

Sets and DictionariesPhylogenetic Trees Turn this into an algorithm

Sets and DictionariesPhylogenetic Trees U = {all organisms} while U ≠ {}: a, b = two closest entries in U p = common parent of {a, b} U = U - {a, b} U = U + {p} Turn this into an algorithm

Sets and DictionariesPhylogenetic Trees U = {all organisms} while U ≠ {}: a, b = two closest entries in U p = common parent of {a, b} U = U - {a, b} U = U + {p} Ungrouped set shrinks by one element each time Turn this into an algorithm

Sets and DictionariesPhylogenetic Trees U = {all organisms} while U ≠ {}: a, b = two closest entries in U p = common parent of {a, b} U = U - {a, b} U = U + {p} Ungrouped set shrinks by one element each time Keep track of pairings on the side to draw tree later Turn this into an algorithm

Sets and DictionariesPhylogenetic Trees What does "closest" mean?

Sets and DictionariesPhylogenetic Trees What does "closest" mean? Simplest algorithm is unweighted pair-group method using arithmetic averages (UPGMA)

Sets and DictionariesPhylogenetic Trees What does "closest" mean? Simplest algorithm is unweighted pair-group method using arithmetic averages (UPGMA) humanvampirewerewolfmermaid human vampire13 werewolf56 mermaid121529

Sets and DictionariesPhylogenetic Trees Closest entries are human (H) and werewolf (W) HVWM H V13 W56 M121529

Sets and DictionariesPhylogenetic Trees Closest entries are human (H) and werewolf (W) Replace with HW (common ancestor) HVWM H V13 W56 M121529

Sets and DictionariesPhylogenetic Trees Closest entries are human (H) and werewolf (W) Replace with HW (common ancestor) Height is 1/2 value of entry HVWM H V13 W56 M121529

Sets and DictionariesPhylogenetic Trees Closest entries are human (H) and werewolf (W) Replace with HW (common ancestor) Height is 1/2 value of entry Replace score for X with (HX + WX - HW)/2 HVWM H V13 W56 M121529

Sets and DictionariesPhylogenetic Trees HVWM H V13 W56 M Closest entries are human (H) and werewolf (W) Replace with HW (common ancestor) Height is 1/2 value of entry Replace score for X with (HX + WX - HW)/2

Sets and DictionariesPhylogenetic Trees HVWM H V13 W56 M HWVM V7 M1815 Closest entries are human (H) and werewolf (W) Replace with HW (common ancestor) Height is 1/2 value of entry Replace score for X with (HX + WX - HW)/2

Sets and DictionariesPhylogenetic Trees HVWM H V13 W56 M HWVM V7 M1815 HW HW 2.5 Closest entries are human (H) and werewolf (W) Replace with HW (common ancestor) Height is 1/2 value of entry Replace score for X with (HX + WX - HW)/2

Sets and DictionariesPhylogenetic Trees Repeat HWVM M13 HWVM V7 M1815 HW HW 3.5 V HWV

Sets and DictionariesPhylogenetic Trees Repeat HWVM M13 HWVM V7 M1815 HW HW 3.5 V HWV And again HWV M HW HW 6.5 V HWV M M13 M HWVM

Sets and DictionariesPhylogenetic Trees Final tree MVHWMVHW HW HWV HWVM

Sets and DictionariesPhylogenetic Trees Final tree MVHWMVHW HW HWV HWVM Implied

Sets and DictionariesPhylogenetic Trees How to translate this into software?

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix…

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix… …but the order of the rows and columns is arbitrary

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix… …but the order of the rows and columns is arbitrary It's really just a lookup table…

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix… …but the order of the rows and columns is arbitrary It's really just a lookup table… …so we should think about using a dictionary

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix… …but the order of the rows and columns is arbitrary It's really just a lookup table… …so we should think about using a dictionary Key: (organism, organism)

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix… …but the order of the rows and columns is arbitrary It's really just a lookup table… …so we should think about using a dictionary Key: (organism, organism) –In alphabetical order to ensure uniqueness

Sets and DictionariesPhylogenetic Trees How to translate this into software? We drew it as a triangular matrix… …but the order of the rows and columns is arbitrary It's really just a lookup table… …so we should think about using a dictionary Key: (organism, organism) –In alphabetical order to ensure uniqueness Value: distance

Sets and DictionariesPhylogenetic Trees HVWM H V13 W56 M121529

Sets and DictionariesPhylogenetic Trees HVWM H V13 W56 M { ('human', 'mermaid') : 12, ('human', 'vampire') : 13, ('human', 'werewolf') : 5, ('mermaid', 'vampire') : 15, ('mermaid', 'werewolf') : 29, ('vampire', 'werewolf') : 6 }

Sets and DictionariesPhylogenetic Trees Write out the algorithm

Sets and DictionariesPhylogenetic Trees while len(scores) > 0: min_pair = find_min_pair(species, scores) parent, height = create_new_parent(scores, min_pair) print parent, height old_score = remove_entries(species, scores, min_pair) update(species, scores, min_pair, parent, old_score) Write out the algorithm

Sets and DictionariesPhylogenetic Trees while len(scores) > 0: min_pair = find_min_pair(species, scores) parent, height = create_new_parent(scores, min_pair) print parent, height old_score = remove_entries(species, scores, min_pair) update(species, scores, min_pair, parent, old_score) Assumes scores are in a dictionary scores Write out the algorithm

Sets and DictionariesPhylogenetic Trees while len(scores) > 0: min_pair = find_min_pair(species, scores) parent, height = create_new_parent(scores, min_pair) print parent, height old_score = remove_entries(species, scores, min_pair) update(species, scores, min_pair, parent, old_score) Assumes scores are in a dictionary scores And species names are in a list species Write out the algorithm

Sets and DictionariesPhylogenetic Trees while len(scores) > 0: min_pair = find_min_pair(species, scores) parent, height = create_new_parent(scores, min_pair) print parent, height old_score = remove_entries(species, scores, min_pair) update(species, scores, min_pair, parent, old_score) Assumes scores are in a dictionary scores And species names are in a list species And yes, we revised this a couple of times… Write out the algorithm

Sets and DictionariesPhylogenetic Trees def find_min_pair(species, scores): '''Find minimum-value pair of species in scores.''' min_pair, min_val = None, None for pair in combos(species): assert pair in scores, \ 'Pair (%s, %s) not in scores' % pair if (min_val is None) or (scores[pair] < min_val): min_pair, min_val = pair, scores[pair] assert min_val is not None, \ 'No minimum value found in scores' return min_pair

Sets and DictionariesPhylogenetic Trees def find_min_pair(species, scores): '''Find minimum-value pair of species in scores.''' min_pair, min_val = None, None for pair in combos(species): assert pair in scores, \ 'Pair (%s, %s) not in scores' % pair if (min_val is None) or (scores[pair] < min_val): min_pair, min_val = pair, scores[pair] assert min_val is not None, \ 'No minimum value found in scores' return min_pair

Sets and DictionariesPhylogenetic Trees def find_min_pair(species, scores): '''Find minimum-value pair of species in scores.''' min_pair, min_val = None, None for pair in combos(species): assert pair in scores, \ 'Pair (%s, %s) not in scores' % pair if (min_val is None) or (scores[pair] < min_val): min_pair, min_val = pair, scores[pair] assert min_val is not None, \ 'No minimum value found in scores' return min_pair

Sets and DictionariesPhylogenetic Trees def combos(species): '''Generate all combinations of species.''' result = [] for i in range(len(species)): for j in range(i+1, len(species)): result.append((species[i], species[j])) return result

Sets and DictionariesPhylogenetic Trees def combos(species): '''Generate all combinations of species.''' result = [] for i in range(len(species)): for j in range(i+1, len(species)): result.append((species[i], species[j])) return result

Sets and DictionariesPhylogenetic Trees def combos(species): '''Generate all combinations of species.''' result = [] for i in range(len(species)): for j in range(i+1, len(species)): result.append((species[i], species[j])) return result def combos(): def find_min_pair(): if __name__ == '__main__':...main program...

Sets and DictionariesPhylogenetic Trees def create_new_parent(scores, pair): '''Create record for new parent.''' parent = '[%s %s]' % pair height = scores[pair] / 2. return parent, height

Sets and DictionariesPhylogenetic Trees def create_new_parent(scores, pair): '''Create record for new parent.''' parent = '[%s %s]' % pair height = scores[pair] / 2. return parent, height

Sets and DictionariesPhylogenetic Trees def create_new_parent(scores, pair): '''Create record for new parent.''' parent = '[%s %s]' % pair height = scores[pair] / 2. return parent, height def combos(): def find_min_pair(): def create_new_parent(): if __name__ == '__main__':...main program...

Sets and DictionariesPhylogenetic Trees def remove_entries(species, scores, pair): '''Remove species that have been combined.''' left, right = pair species.remove(left) species.remove(right) old_score = scores[pair] del scores[pair] return old_score

Sets and DictionariesPhylogenetic Trees def remove_entries(species, scores, pair): '''Remove species that have been combined.''' left, right = pair species.remove(left) species.remove(right) old_score = scores[pair] del scores[pair] return old_score

Sets and DictionariesPhylogenetic Trees def remove_entries(species, scores, pair): '''Remove species that have been combined.''' left, right = pair species.remove(left) species.remove(right) old_score = scores[pair] del scores[pair] return old_score def combos(): def find_min_pair(): def create_new_parent(): def remove_entries(): if __name__ == '__main__':...main program...

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort()

Sets and DictionariesPhylogenetic Trees def update(species, scores, pair, parent, parent_score): '''Replace two species from the scores table.''' left, right = pair for other in species: l_score = tidy_up(scores, left, other) r_score = tidy_up(scores, right, other) new_pair = make_pair(parent, other) new_score = (l_score + r_score - parent_score)/2. scores[new_pair] = new_score species.append(parent) species.sort() def combos(): def find_min_pair(): def create_new_parent(): def remove_entries(): def update(): if __name__ == '__main__':...main program...

Sets and DictionariesPhylogenetic Trees def tidy_up(scores, old, other): '''Clean out references to old species.''' pair = make_pair(old, other) score = scores[pair] del scores[pair] return score

Sets and DictionariesPhylogenetic Trees def tidy_up(scores, old, other): '''Clean out references to old species.''' pair = make_pair(old, other) score = scores[pair] del scores[pair] return score

Sets and DictionariesPhylogenetic Trees def tidy_up(scores, old, other): '''Clean out references to old species.''' pair = make_pair(old, other) score = scores[pair] del scores[pair] return score def make_pair(left, right): '''Make an ordered pair of species.''' if left < right: return (left, right) else: return (right, left)

Sets and DictionariesPhylogenetic Trees def tidy_up(scores, old, other): '''Clean out references to old species.''' pair = make_pair(old, other) score = scores[pair] del scores[pair] return score def make_pair(left, right): '''Make an ordered pair of species.''' if left < right: return (left, right) else: return (right, left) def combos(): def find_min_pair(): def create_new_parent(): def remove_entries(): def make_pair(): def tidy_up(): def update(): if __name__ == '__main__':...main program...

Sets and DictionariesPhylogenetic Trees $ python phylogen.py [human werewolf] 2.5 [[human werewolf] vampire] 3.5 [[[human werewolf] vampire] mermaid] 6.5 Exercise 1: write unit tests Exercise 2: reconstruct entire tree Exercise 3: why does update sort?

November 2010 created by Elango Cheran Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information.