Education and Computational Biology Dean L. Zeller Kent State University OCCBIO ‘06 July 28-30, 2006.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Timeline Assignment 1 – DNA Modeling Level: 0 (no experience required) Objective: The student will create structurally accurate models of DNA out of pipe.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Bioinformatics Algorithms and Data Structures
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogeny Tree Reconstruction
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Efficient Gathering of Correlated Data in Sensor Networks
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
PHYLOGENETIC TREES Dwyane George February 24,
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
Introduction to Phylogenetic Trees
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
LET’S GET STARTED.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Approximation Algorithms for Maximum Leaf Spanning Trees (MLSTs) Dean L. Zeller Kent State University November 29 th, 2005.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics-2 Marek Kimmel (Statistics, Rice)
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Spatial Data Management
WABI: Workshop on Algorithms in Bioinformatics
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
B+ Tree.
Multiple Alignment and Phylogenetic Trees
Recitation 5 2/4/09 ML in Phylogeny
Hierarchical clustering approaches for high-throughput data
Inferring phylogenetic trees: Distance and maximum likelihood methods
BNFO 602 Phylogenetics Usman Roshan.
Phylogeny.
Algorithms for Inferring the Tree of Life
Presentation transcript:

Education and Computational Biology Dean L. Zeller Kent State University OCCBIO ‘06 July 28-30, 2006

Education of Computational BiologySlide 2 of 29 “…the great Tree of Life fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications.” Charles Darwin ( ) Father of Evolution

Education of Computational BiologySlide 3 of 29 Initial Inspiration Colloquium by Dr. Lonnie Welsh on March 15 th for KSU department of computer science: Extraterrestrials, Cryptanalysis, and Genomes: Perspectives on Bioinformatics Research Looking for new perspectives in bioinformatics. My perspective: educate a younger audience of computational biologists

Education of Computational BiologySlide 4 of 29 Outline Goals of research Evolution trees Assignment 1 – Atlas of Evolution Trees Assignment 2 – Atlas of Distance Graphs Assignment 3 – Phylogeny Reconstruction Future Work

Education of Computational BiologySlide 5 of 29 Goals of Research Specific Goals Create “teachable” lessons on bioinformatics suitable for a mid-level computer science, mathematics, or biology class. Make use of and create more adequate evolution models. Long Term Goals Discover methods of phylogeny reconstruction from a new perspective. Educate the next generation of computational biologists.

Education of Computational BiologySlide 6 of 29 Evolution Tree example Tree inferred by Unweighted Pair Group Method with Arithmetic mean (UPGMA) clustering of the Sarich (1969) immunological distance data set. [Felsenstein, p166]

Education of Computational BiologySlide 7 of 29 Evolution Tree example

Education of Computational BiologySlide 8 of 29 Class Assignments Assignment 1 – Drawing Trees –The student will use a graphics package to create diagrams of binary evolution trees. Assignment 2 – Phylogenetic Distance Graphs –The student will use a graphics package to construct distance graphs (k-leaf powers) for the evolution trees created in Assignment 1. Assignment 3 – Phylogeny Reconstruction –The student will demonstrate an algorithm of phylogeny reconstruction from the results of theoretical experiments using the incremental k-leaf power. (Tested on CS10051 students, Spring 2006)

Education of Computational BiologySlide 9 of 29 Assumptions By making simple assumptions, the problem complexity is greatly reduced. 1.Redundant nodes removed 2.Multiple splits nodes replaced with isomorphic approximations 3.Only consider isomorphically unique trees

Education of Computational BiologySlide 10 of 29 Assumption #1 Redundant nodes are removed without loss of data. It is already assumed the species is slowly changing over time. It does not add to the problem to consider a single point along the way.

Education of Computational BiologySlide 11 of 29 Assumption #2 Multiple split nodes replaced with isomorphic approximations Some loss of data, but greatly reduces the problem complexity

Education of Computational BiologySlide 12 of 29 Assumption #3 Isomorphically unique trees

Education of Computational BiologySlide 13 of 29 Assignment 1: Atlas of Evolution Trees Inspired by An Atlas of Graphs [Read and Wilson, 1999] Elegant yet simple way to analyze graphs and trees, useful for instructional purposes. Apply same style to phylogenies.

Education of Computational BiologySlide 14 of 29 Atlas of Evolution Trees (  5 leaves)

Education of Computational BiologySlide 15 of 29 Atlas of Evolution Trees (6 leaves)

Education of Computational BiologySlide 16 of 29 Assignment 2: Atlas of Distance Graphs (k-leaf powers) Builds on Assignment 1 – create the associative k-leaf powers for each tree. Useful as a reference for studying relationship between clicks, k-leaf powers, and k-leaf roots.

Education of Computational BiologySlide 17 of 29 Atlas of Distance Graphs k=2 k=3 k=2k=3k=4

Education of Computational BiologySlide 18 of 29 Atlas of Distance Graphs k=2k=3k=4 k=2k=3k=4k=5

Education of Computational BiologySlide 19 of 29 Distance Graph Simulator a b d f g h c i e Graph complete k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8

Education of Computational BiologySlide 20 of 29 Phylogeny Reconstruction from Binary Genetic Data Test returns 1 if species x and y are genetically close to a certain degree, and 0 otherwise. Data collected to form a similarity grid and distance graph (k-leaf power).

Education of Computational BiologySlide 21 of 29 Reconstruction Step 1 – Difference Summary Table abcdef a b 1000 c 100 d 11 e 1 f Step 2 – k-leaf power Step 3 – phylogeny (k-leaf root)

Education of Computational BiologySlide 22 of 29 Reconstruction Linear time solution exists for k = 3 [Brandstädt and Le, 2006] … and k = 4 [Brandstädt et al, 2006] An open problem for k  5 –Severely limits analysis capability.

Education of Computational BiologySlide 23 of 29 Assignment 3: Phylogeny Reconstruction from Discrete Genetic Data Genetic test returns a discrete value (k=2,3,4,…) denoting distance between x and y in tree. Data collected to form a distance grid. Create k-leaf powers incrementally.

Education of Computational BiologySlide 24 of 29 Reconstruction Difference Summary Table abcdef a b 3566 c 455 d 33 e 2 f k  2 k  3 k  4k  5 k  6

Education of Computational BiologySlide 25 of 29 Incremental k-leaf power Distance  2 Direct Neighbors Distance  3 Close relatives Distance  4 Tree complete

Education of Computational BiologySlide 26 of 29 Literature Review of Related Methods Additive and Ultrametric Trees [Wu and Chao, 2004] Minimum Increment Evolution Tree (MEIT) [Wu and Chao, 2004] Evolutionary Tree Insertion with Minimum Increment (ETIMI) [Wu and Chao, 2004] Maximum Homeomorphic Agreement Subtree (MHT) [Gasieniec et al 1997] Maximum Agreement Subtree (MAST) [Gąsieniec et al, 1997] Maximum Inferred Consensus Tree (MICT) [Lingas et al, 1999] Maximum Inferred Local Consensus Tree (MILCT) [Lingas et al, 1999] Balanced Randomized Tree Splitting (BRTS) [Kao et al, 1999] Merging Partial Evolution Trees (MPET) [Lingas et al, 1999]

Education of Computational BiologySlide 27 of 29 Future Work Additional class assignments Implement the Phylogeny Reconstruction Simulator using NetworkX Remove redundant node and isomorphic approximation assumptions

Education of Computational BiologySlide 28 of 29 References [Br06a]Brandstädt, A. and V. B. Le (2006). “Structure and Linear Time Recognition of 3-Leaf Powers”, Information Processing Letters (98), [Br06b]Brandstädt, A., V.B. Le, and R. Sritharan (2005). “Structure and Linear Time Recognition of 4-Leaf Powers”, Unpublished manuscript. [Fe04]J. Felsenstein (2004). Inferring Phylogenies, Sinauer Associates, Inc. [Ga97]L. Gąsieniec, J. Jansson, A. Lingas, and A. Östlin (1997), “On the complexity of computing evolutionary trees,” Proceedings of Computing and Combinatonics Third Annual International Conference COCOON ’97, Shanghai, China, pp. 134 to 145, Aug 97. [Ka99]Y. Kao, A. Lingas, and A. Östlin (1999), “Balanced Randomized Tree Splitting with Applications to Evolutionary Tree Constructions,” Proceedings of the 16 th Annual Symposium on Theoretical Aspects of Computer Science, Trier, Germany, pp. 184 to 196, March [Li99]A. Lingas, H. Olsson, and A. Östlin (1999), “Efficient Merging, Construction, and Maintenance of Evolutionary Trees,” Proceedings of the 26 th International Colloquium on Automata, Languages, and Programming (ICALP) ’99, Prague, Chech Republic, pp. 544 to 553, July [Re99]Read, R.C. and R.J. Wilson (1999). An Atlas of Graphs, Oxford Science Publications. [Wu04]Wu, B.Y. and K.M. Chao (2004). Spanning Trees and Optimization Problems. Chapman & Hall/CRC.

Education of Computational BiologySlide 29 of 29 Thank You The full text of the paper, assignments, this presentation, and student examples are available on the author’s web page: