CSE 280A: Advanced Topics in Computational Molecular Biology

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Gene tree analyses of Aboriginal Australians Rosalind Harding University of Oxford.
Recombination and genetic variation – models and inference
CHAPTER 20: HUMAN EVOLUTION Understanding “Mitochondrial Eve” and the Out of Africa hypothesis.
Sampling distributions of alleles under models of neutral evolution.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Molecular Evolution Revised 29/12/06
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
CSE182-L17 Clustering Population Genetics: Basics.
Incorporating Mutations
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Phylogeny - based on whole genome data
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
Calculating branch lengths from distances. ABC A B C----- a b c.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Hybrid MPI/Pthreads Parallelization of the RAxML Phylogenetics Code Wayne Pfeiffer.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
CSE280Vineet Bafna In a ‘stable’ population, the distribution of alleles obeys certain laws – Not really, and the deviations are interesting HW Equilibrium.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
The Human Chromosomes 1. Other Structural Variants Inversion DeletionCopy number variant.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Yufeng Wu and Dan Gusfield University of California, Davis
Equilibria in populations
Gil McVean Department of Statistics
Of Sea Urchins, Birds and Men
Distance based phylogenetics
L4: Counting Recombination events
(better known as the dog)
Estimating Recombination Rates
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
Roadmap Discovering Patterns Analyzing Patterns
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Vineet Bafna/Pavel Pevzner
The coalescent with recombination (Chapter 5, Part 1)
Explore Evolution: Instrument for Analysis
Outline Cancer Progression Models
Approximation Algorithms for the Selection of Robust Tag SNPs
9-3 DNA Typing with Tandem Repeats
The principles of genetic association
Presentation transcript:

Algorithmic strategies for the reconstruction of mitochondrial DNA phylogeny CSE 280A: Advanced Topics in Computational Molecular Biology Harish Nagarajan and Nitin Udpa

Background Finding the ancestry of human populations is an interesting problem that can yield insights into fields as diverse as anthropology and medicine. Eg. Genetic basis of diseases A logical method of determining the ancestry is to construct a phylogenetic tree. Saturday, October 13, 2018 Harish and Nitin

Why mtDNA? Recombination makes the problem of reconstruction almost impossible. mtDNA  maternally inherited. No recombination unlike autosomal chromosomes. Each individual has a unique parent. Saturday, October 13, 2018 Harish and Nitin

Genographic Project IBM and National Geographic : The Atlas of the Human Journey https://www3.nationalgeographic.com/genographic/atlas.html Saturday, October 13, 2018 Harish and Nitin

Algorithmic bottlenecks Infinite sites asumption: At most one mutation can occur at any site. If this is valid, reconstruction could be done easily. Creates a set of dichotomies that, in the ideal case, would lead to a tree with mutations defined on its edges and individuals as its leaves. However, this assumption may not be valid, particularly as the time scale for the phylogeny increases Saturday, October 13, 2018 Harish and Nitin

Goals of the Project Develop an efficient algorithm to reconstruct the phylogeny accounting for recurrent mutations Scalability and robustness Should be able to handle thousands of individuals Apply the algorithm to previous studies (Torroni et al for eg.) and compare the phylogenies. Saturday, October 13, 2018 Harish and Nitin

Previous Work Torroni et al., Harvesting the fruit of the human mtDNA tree (Trends in Genetics). Reconstructed Phylogeny of 25 individuals from Africa. They take recurrent mutations into account. Saturday, October 13, 2018 Harish and Nitin

Torroni et al.(2006) TIG Saturday, October 13, 2018 Harish and Nitin

General Strategy Fetch Sequences Multiple Sequence Alignment Generate SNP matrix Implement the Heuristic developed Generate the tree Saturday, October 13, 2018 Harish and Nitin

The Actual Data Set Downloaded 3900+ mtDNA sequences (~16kb) from NCBI. (http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Search&db=nucleotide&term=Homo%5BOrganism%5D%20AND%20mitochondrion%5BAll%20Fields%5D%20AND%2015000%3A17000%5BSLEN%5D%20NOT%20pseudogene%5BAll%20Fields%5D&dispmax=100&doptcmdl=DocSum). Saturday, October 13, 2018 Harish and Nitin

Perfect Phylogeny Implemented an algorithm by Dan Gusfield to reconstruct a tree given a coalescent population Steps to test for PP Radix sort the columns of the SNP matrix and delete any repeats Saturday, October 13, 2018 Harish and Nitin

Constructing the PP Saturday, October 13, 2018 Harish and Nitin

Progress Implemented the PP algorithm on a simulated sample of 200 individuals and 100 sites using the Hudson’s coalescent simulator (ms) Saturday, October 13, 2018 Harish and Nitin

Generated using TreeView Saturday, October 13, 2018 Harish and Nitin

Future Work Implement mlcoalsim to generate populations with recurrent mutations (for easy testing of the phylogeny algorithm). Implement Fjola Bjornsdottir’s algorithm for aligning the mtDNA sequences. Create a SNP matrix from mtDNA sequences. Implement the PP algorithm on the matrix generated by mlcoalsim. Saturday, October 13, 2018 Harish and Nitin

Accounting for recurrent mutations Columns that have 4-gamete violations with several other columns are candidates for recurrent mutations. Weight of a column is inversely correlated to the likelihood of a recurrent mutation. Weight = number of columns with identical distribution across individuals. Separate SNPs into clades. Analogous to clustering approach. Coarse grain phylogeny at a clade level Saturday, October 13, 2018 Harish and Nitin

Recurrent mutations contd... Implement the afore mentioned heuristics on the sampled data Scale it on the actual SNP matrix generated from the mtDNA sequences. Saturday, October 13, 2018 Harish and Nitin

Acknowledgements Dr. Vineet Bafna Fjola Bjornsdottir Saturday, October 13, 2018 Harish and Nitin