Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 280A: Advanced Topics in Computational Molecular Biology

Similar presentations


Presentation on theme: "CSE 280A: Advanced Topics in Computational Molecular Biology"— Presentation transcript:

1 Algorithmic strategies for the reconstruction of mitochondrial DNA phylogeny
CSE 280A: Advanced Topics in Computational Molecular Biology Harish Nagarajan and Nitin Udpa

2 Background Finding the ancestry of human populations is an interesting problem that can yield insights into fields as diverse as anthropology and medicine. Eg. Genetic basis of diseases A logical method of determining the ancestry is to construct a phylogenetic tree. Saturday, October 13, 2018 Harish and Nitin

3 Why mtDNA? Recombination makes the problem of reconstruction almost impossible. mtDNA  maternally inherited. No recombination unlike autosomal chromosomes. Each individual has a unique parent. Saturday, October 13, 2018 Harish and Nitin

4 Genographic Project IBM and National Geographic :
The Atlas of the Human Journey Saturday, October 13, 2018 Harish and Nitin

5 Algorithmic bottlenecks
Infinite sites asumption: At most one mutation can occur at any site. If this is valid, reconstruction could be done easily. Creates a set of dichotomies that, in the ideal case, would lead to a tree with mutations defined on its edges and individuals as its leaves. However, this assumption may not be valid, particularly as the time scale for the phylogeny increases Saturday, October 13, 2018 Harish and Nitin

6 Goals of the Project Develop an efficient algorithm to reconstruct the phylogeny accounting for recurrent mutations Scalability and robustness Should be able to handle thousands of individuals Apply the algorithm to previous studies (Torroni et al for eg.) and compare the phylogenies. Saturday, October 13, 2018 Harish and Nitin

7 Previous Work Torroni et al., Harvesting the fruit of the human mtDNA tree (Trends in Genetics). Reconstructed Phylogeny of 25 individuals from Africa. They take recurrent mutations into account. Saturday, October 13, 2018 Harish and Nitin

8 Torroni et al.(2006) TIG Saturday, October 13, 2018 Harish and Nitin

9 General Strategy Fetch Sequences Multiple Sequence Alignment
Generate SNP matrix Implement the Heuristic developed Generate the tree Saturday, October 13, 2018 Harish and Nitin

10 The Actual Data Set Downloaded mtDNA sequences (~16kb) from NCBI. ( Saturday, October 13, 2018 Harish and Nitin

11 Perfect Phylogeny Implemented an algorithm by Dan Gusfield to reconstruct a tree given a coalescent population Steps to test for PP Radix sort the columns of the SNP matrix and delete any repeats Saturday, October 13, 2018 Harish and Nitin

12 Constructing the PP Saturday, October 13, 2018 Harish and Nitin

13 Progress Implemented the PP algorithm on a simulated sample of 200 individuals and 100 sites using the Hudson’s coalescent simulator (ms) Saturday, October 13, 2018 Harish and Nitin

14 Generated using TreeView
Saturday, October 13, 2018 Harish and Nitin

15 Future Work Implement mlcoalsim to generate populations with recurrent mutations (for easy testing of the phylogeny algorithm). Implement Fjola Bjornsdottir’s algorithm for aligning the mtDNA sequences. Create a SNP matrix from mtDNA sequences. Implement the PP algorithm on the matrix generated by mlcoalsim. Saturday, October 13, 2018 Harish and Nitin

16 Accounting for recurrent mutations
Columns that have 4-gamete violations with several other columns are candidates for recurrent mutations. Weight of a column is inversely correlated to the likelihood of a recurrent mutation. Weight = number of columns with identical distribution across individuals. Separate SNPs into clades. Analogous to clustering approach. Coarse grain phylogeny at a clade level Saturday, October 13, 2018 Harish and Nitin

17 Recurrent mutations contd...
Implement the afore mentioned heuristics on the sampled data Scale it on the actual SNP matrix generated from the mtDNA sequences. Saturday, October 13, 2018 Harish and Nitin

18 Acknowledgements Dr. Vineet Bafna Fjola Bjornsdottir
Saturday, October 13, 2018 Harish and Nitin


Download ppt "CSE 280A: Advanced Topics in Computational Molecular Biology"

Similar presentations


Ads by Google