Algorithmic strategies for the reconstruction of mitochondrial DNA phylogeny CSE 280A: Advanced Topics in Computational Molecular Biology Harish Nagarajan and Nitin Udpa
Background Finding the ancestry of human populations is an interesting problem that can yield insights into fields as diverse as anthropology and medicine. Eg. Genetic basis of diseases A logical method of determining the ancestry is to construct a phylogenetic tree. Saturday, October 13, 2018 Harish and Nitin
Why mtDNA? Recombination makes the problem of reconstruction almost impossible. mtDNA maternally inherited. No recombination unlike autosomal chromosomes. Each individual has a unique parent. Saturday, October 13, 2018 Harish and Nitin
Genographic Project IBM and National Geographic : The Atlas of the Human Journey https://www3.nationalgeographic.com/genographic/atlas.html Saturday, October 13, 2018 Harish and Nitin
Algorithmic bottlenecks Infinite sites asumption: At most one mutation can occur at any site. If this is valid, reconstruction could be done easily. Creates a set of dichotomies that, in the ideal case, would lead to a tree with mutations defined on its edges and individuals as its leaves. However, this assumption may not be valid, particularly as the time scale for the phylogeny increases Saturday, October 13, 2018 Harish and Nitin
Goals of the Project Develop an efficient algorithm to reconstruct the phylogeny accounting for recurrent mutations Scalability and robustness Should be able to handle thousands of individuals Apply the algorithm to previous studies (Torroni et al for eg.) and compare the phylogenies. Saturday, October 13, 2018 Harish and Nitin
Previous Work Torroni et al., Harvesting the fruit of the human mtDNA tree (Trends in Genetics). Reconstructed Phylogeny of 25 individuals from Africa. They take recurrent mutations into account. Saturday, October 13, 2018 Harish and Nitin
Torroni et al.(2006) TIG Saturday, October 13, 2018 Harish and Nitin
General Strategy Fetch Sequences Multiple Sequence Alignment Generate SNP matrix Implement the Heuristic developed Generate the tree Saturday, October 13, 2018 Harish and Nitin
The Actual Data Set Downloaded 3900+ mtDNA sequences (~16kb) from NCBI. (http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Search&db=nucleotide&term=Homo%5BOrganism%5D%20AND%20mitochondrion%5BAll%20Fields%5D%20AND%2015000%3A17000%5BSLEN%5D%20NOT%20pseudogene%5BAll%20Fields%5D&dispmax=100&doptcmdl=DocSum). Saturday, October 13, 2018 Harish and Nitin
Perfect Phylogeny Implemented an algorithm by Dan Gusfield to reconstruct a tree given a coalescent population Steps to test for PP Radix sort the columns of the SNP matrix and delete any repeats Saturday, October 13, 2018 Harish and Nitin
Constructing the PP Saturday, October 13, 2018 Harish and Nitin
Progress Implemented the PP algorithm on a simulated sample of 200 individuals and 100 sites using the Hudson’s coalescent simulator (ms) Saturday, October 13, 2018 Harish and Nitin
Generated using TreeView Saturday, October 13, 2018 Harish and Nitin
Future Work Implement mlcoalsim to generate populations with recurrent mutations (for easy testing of the phylogeny algorithm). Implement Fjola Bjornsdottir’s algorithm for aligning the mtDNA sequences. Create a SNP matrix from mtDNA sequences. Implement the PP algorithm on the matrix generated by mlcoalsim. Saturday, October 13, 2018 Harish and Nitin
Accounting for recurrent mutations Columns that have 4-gamete violations with several other columns are candidates for recurrent mutations. Weight of a column is inversely correlated to the likelihood of a recurrent mutation. Weight = number of columns with identical distribution across individuals. Separate SNPs into clades. Analogous to clustering approach. Coarse grain phylogeny at a clade level Saturday, October 13, 2018 Harish and Nitin
Recurrent mutations contd... Implement the afore mentioned heuristics on the sampled data Scale it on the actual SNP matrix generated from the mtDNA sequences. Saturday, October 13, 2018 Harish and Nitin
Acknowledgements Dr. Vineet Bafna Fjola Bjornsdottir Saturday, October 13, 2018 Harish and Nitin