Incorporating Mutations

Slides:



Advertisements
Similar presentations
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Sampling distributions of alleles under models of neutral evolution.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Effective Population Size Real populations don’t satisfy the Wright-Fisher model. In particular, real populations exhibit reproductive structure, either.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Optimal Phylogenetic Networks with Constrained and Unconstrained Recombination (The root-unknown case) Dan Gusfield UC Davis.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Extensions to Basic Coalescent Chapter 4, Part 2.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Stephane Durocher 1 Debajyoti Mondal 1 Md. Saidur Rahman 2 1 Department of Computer Science, University of Manitoba 2 Department of Computer Science &
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Introduction to Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
The parity bits of linear block codes are linear combination of the message. Therefore, we can represent the encoder by a linear system described by matrices.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Tutorial 5 Phylogenetic Trees.
Comp. Genomics Recitation 7 Clustering and analysis of microarrays.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
COMPSCI 102 Introduction to Discrete Mathematics.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
The Haplotype Blocks Problems Wu Ling-Yun
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Multiple Alignment and Phylogenetic Trees
Phylogenetic Trees.
Mohammed El-Kebir, Gryte Satas, Layla Oesper, Benjamin J. Raphael 
CS 581 Tandy Warnow.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Outline Cancer Progression Models
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Incorporating Mutations Previous we allowed for gene variants (alleles), but without a model of how they came into being Rather than the coalescence of a single gene, next we consider successive generations of gene sets Two things to consider Variants of a gene (Alleles) Variants in allele combinations (Sequences) We begin by treating each independently Gn Gn Gn Gn Gn Gn Gn Gn+1 Gn+2 Gn+3 Gn+4 4/17/2017 Comp 790– Genealogies to Sequences

Infinite Alleles Model Assumes all that is knowable is if alleles are identical or different No Spatial (i.e. sequence position) or quantitative information related to the observed differences Only keeps track of how many of each allele type Number of mutations that result in a variant is lost Two event types, splits and mutations Labels are arbitrary (A) (A,A) (B)(A) (B)(A) (B)(A,A) (B)(A)(C) (B)(A)(C,C) (B,B)(A)(C,C) (B)(D)(A)(C,C) (B)(D)(A)(C,C) B D A C C 4/17/2017 Comp 790– Genealogies to Sequences

Comp 790– Genealogies to Sequences Infinite Sites Model Assumes mutations are rare events Assumes DNA sequences are large Multiple mutations at the same site are extremely rare Infinite Sites Model assumes that multiple mutations never occur at the same sequence position Thus, all genes are “Biallelic” -0-0-0-0-0- -1-0-0-0-0- Lost haplotype -1-1-0-0-0- -0-0-1-0-0- -1-1-0-1-0- -1-1-0-0-0- -1-1-0-1-0- -0-0-0-0-1- -0-0-1-0-0- -0-0-1-0-0- 4/17/2017 Comp 790– Genealogies to Sequences

Comp 790– Genealogies to Sequences SNP Panels Observed Haplotypes and SNPs from previous example Under the Infinite Sites Model the haplotype size equals number of historical mutations While sequences can be lost, alleles cannot, in contrast to the Infinite Alleles Model SNP Diversity Patterns (SDPs) can be repeated (eg. S1 and S2) Since the assignment of 1s and 0s is arbitrary, a SNP and its complement share the same SDP For N haplotypes, there are at most 2N-1 – 1 “possible” SDPs S1 S2 S3 S4 S5 H1 1 H2 H3 H4 4/17/2017 Comp 790– Genealogies to Sequences

A Different Kind of Tree Unrooted “Perfect” Phylogeny Nodes correspond to haplotypes (both visible and historical) Edges correspond to SNPs Removal of an edge creates a bipartition Tree leaves correspond to mutations (allele variants) that are unique to a sequence, i.e. an SDP with only one minority allele instance, a singleton -0-0-1-0-0- -0-0-0-0-0- -1-0-0-0-0- -0-0-0-0-1- -1-1-0-0-0- -1-1-0-1-0- 4/17/2017 Comp 790– Genealogies to Sequences

Build a Phylogenetic Tree Assume we only have direct access to observed haplotypes Construct a pair-wise distance matrix between haplotypes using Hamming distances Add smallest edge between all nodes which do not introduce a loop If the smallest distance is greater than 1 add d-1 “hidden” nodes between the pair so that adjacent nodes have a hamming distance of 1 Augment the distance matrix with the new nodes and claim the introduced edges Repeat finding the smallest distance, and augmenting until the graph is fully connected S1 S2 S3 S4 S5 H1 1 H2 H3 H4 -0-0-1-0-0- -0-0-0-0-1- -0-0-0-0-0- H2 H3 H4 HA HB H1 1 3 2 4 H2 H3 H4 HA H1 1 3 2 4 H2 H3 H4 H1 1 3 4 2 -1-1-0-0-0- -1-1-0-1-0- -1-0-0-0-0- 4/17/2017 Comp 790– Genealogies to Sequences

Comp 790– Genealogies to Sequences Four-Gamete Test Under the assumption of the infinite sites model all SNP pairs exhibit the property no more that 3 out of the possible 4 allele combinations occur Direct consequence of only one mutation per site Showing that all SNP pair combinations satisfy the four gamete test is a necessary and sufficient condition for there to exist a perfect phylogeny tree S1 S2 S3 S4 S5 H1 1 H2 H3 H4 4/17/2017 Comp 790– Genealogies to Sequences

Comp 790– Genealogies to Sequences Hard Questions Which SDPs are compatible with any other SNP? Given N distinct haplotype sequences resulting from an infinite sites model what is minimum number of SDPs? Given N distinct haplotype sequences resulting from an infinite sites model what is maximum number of SDPs? Singleton SNPs are compatible are compatible with any other SNP N-1 edges are the fewest necessary to connect N haplotypes into a “linear” tree. How many singleton SNPs occur in such a tree? 2 2N-3 edges, the number of edges in an unrooted tree with N leaves 4/17/2017 Comp 790– Genealogies to Sequences

Comp 790– Continuous-Time Coalescence Exercise Consider the following SNP panel Satisfies the four gamete test? Construct the tree Is the SDP 11001T possible? S1 S2 S3 S4 S5 H1 1 H2 H3 H4 H5 4/17/2017 Comp 790– Continuous-Time Coalescence