Download presentation
Presentation is loading. Please wait.
1
RECOMBINOMICS: Myth or Reality? Laxmi Parida IBM Watson Research New York, USA
2
IBM Computational Biology Center 2 1. Motivation 2. Reconstructability (Random Graphs Framework) 3. Reconstruction Algorithm (DSR Algorithm) 4. Conclusion RoadMap
3
IBM Computational Biology Center 3
4
4 www.nationalgeographic.com/genographic
5
IBM Computational Biology Center 5 www.ibm.com/genographic
6
IBM Computational Biology Center 6 Five year study, launched in April 2005 to address anthropological questions on a global scale using genetics as a tool Although fossil records fix human origins in Africa, little is known about the great journey that took Homo sapiens to the far reaches of the earth. How did we, each of us, end up where we are? Samples all around the world are being collected and the mtDNA and Y-chromosome are being sequenced and analyzed phylogeographic question
7
IBM Computational Biology Center 7 DNA material in use under unilinear transmission 58 mill bp 0.38% 16000 bp
8
IBM Computational Biology Center 8 Missing information in unilinear transmissions past present
9
IBM Computational Biology Center 9 Table Mountain Cape Town, South Africa
10
IBM Computational Biology Center 10 Paradigm Shift in Locus & Analysis Using recombining DNA sequences Why? Nonrecombining gives a partial story 1.represents only a small part of the genome 2.behaves as a single locus 3.unilinear (exclusively male of female) transmission Recombining towards more complete information Challenges Computationally very complex How to comprehend complex reticulations?
11
IBM Computational Biology Center 11 1. Motivation 2. Reconstructability (Random Graphs Framework) 3. Reconstruction Algorithm (DSR Algorithm) 4. Conclusion RoadMap L Parida, Pedigree History: A Reconstructability Perspective using Random-Graphs Framework, Under preparation.
12
IBM Computational Biology Center 12 GRAPH DEF: 1. Infinite number of vertices arranged in finite sized rows 2. Edges introduced via a random process across immediate rows PROPERTIES: Address some topological questions 1.First, identify a Probability Space 2.Then, pose and address specific questions (such as expected depth of LCA etc..) The Random Graphs Framework
13
IBM Computational Biology Center 13 1. Infinite number of vertices with a specific organization 2. Edges introduced via a random process satisfying specific rules 3. Address some topological questions 1.Define a Probability Space 2.Pose and answer specific questions (such as expected depth of LCA etc..) The Random Graphs Framework Wright-Fisher Model 1. Constant population 2. Non-overlapping generations 3. Panmictic
14
IBM Computational Biology Center 14 The Random Graphs Framework
15
IBM Computational Biology Center 15 Properties of this Pedigree Graph 1. DAG Directed Acyclic Graph 2. |E| = O (|V|) for any finite fragment; sparse graph … Vertex-centric view.. 3. Focus on the flow of genetic material: relevant pedigree graph
16
IBM Computational Biology Center 16 Pedigree Graph: G PG (K,N) K no of extant units 2N population size/generation Can the model ignore color of vertex?
17
IBM Computational Biology Center 17 Pedigree Graph: G PG (K,N) K no of extant units 2N population size/generation Can the model ignore color of vertex? Forbidden Structure
18
IBM Computational Biology Center 18 Probability Space Space is non-enumerable Uniform probability measure? WF pop Probability of some event F(h) for a fixed depth, h, & take limit:
19
IBM Computational Biology Center 19 Topological Property of G PG (K,N) Least Common Ancestor (LCA) of ALL (K) extant vertices ------TMRCA or GMRCA------- How many LCA’s ? Expected Depth of the shallowest LCA
20
IBM Computational Biology Center 20 Infinite No. of LCA’s in a G PG (4,3) instance ….. In fact, there exist infinite such instances!
21
IBM Computational Biology Center 21 Topological Property of G PG (K,N) Least Common Ancestor (LCA) ------TMRCA or GMRCA------- How many LCA’s ? Expected Depth of the shallowest “LCA” MEASURE OF RECONSTRUCTABILITY
22
IBM Computational Biology Center 22 (Genetic Exchange) Sexual Reproduction vs Graph Model Ancestor without ancestry
23
IBM Computational Biology Center 23 1. Graph Theoretic (topological): CAcommon ancestor LCALeast CA or Shallowest CA MRCA Most Recent CA TMRCA The MRCA 2. Graph Theoretic + Biology (Genetic Exchange): CAA common ancestor-&-ancestry LCAALeast CAA GMRCAGrand MRCA Unilinear Transmission Graph Theory vis-à-vis Population Genetics
24
IBM Computational Biology Center 24 Different Models as Subgraphs mtDNA Tree NRY Tree Genetic Exchange Model (ARG) Pedigree Graph G PG (K,N) each vertex has 2 parents 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) each vertex has 1 parent 2. Mixed Subgraph G PGE (K,N,M) No of vertices/row no more than KM each vertex has 1 OR 2 parents M is no. of completely linked segs in each extant unit
25
IBM Computational Biology Center 25 Different Models G PG (4,8) G PTY (4,8) G PGE (4,8,2)
26
IBM Computational Biology Center 26 Different Models as Subgraphs LCA g GMRCA LCA h TMRCA LCA g GMRCA Pedigree Graph G PG (K,N) 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) 2. Mixed Subgraph G PGE (K,N,M)
27
IBM Computational Biology Center 27 G PGE (K,N,M) h ARG Ancestral Recombinations Graph Griffiths & Marjoram ‘97 Embellish G PGE (K,N,M) with Genetic Exchanges (GE) Each extant unit has M segments No vertex with zero ancestral segments (to extant units)
28
IBM Computational Biology Center 28 1. Plausible GE assignment? 2. Can G PGE (K,N,M) go colorless? Yes....through algorithmic subsampling… Mixed Subgraph G PGE (K,N,M)
29
IBM Computational Biology Center 29 Algorithm: Embellish G PGE (K,N,M) 1. Assign sequence, s, to an instance eg. s = K, (2K), (2K-7), (2K-15), ………. 2. Construct M sequences s i Each s i is monotonically decreasing; s i [j] no bigger than s[j] 3. Associate each s i with a segment and each element s i [j] = k to k randomly selected vertices at depth j
30
IBM Computational Biology Center 30 Algorithm: Constructing seqs…
31
IBM Computational Biology Center 31 “Topological” Defn of LCAA in G PGE (K,N,M) Input: G PGE (K,N,M) with GE embellishment LCAA 1.CA in all M subgraphs (trees) 2.Least such CA
32
IBM Computational Biology Center 32 Different Models as Subgraphs LCAA h GMRCA LCA h TMRCA LCAA h GMRCA Pedigree Graph G PG (K,N) 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) 2. Mixed Subgraph G PGE (K,N,M)
33
IBM Computational Biology Center 33 Probability of Instances with Unique LCA/LCAA Pedigree Graph G PG (K,N) 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) 2. Mixed Subgraph G PGE (K,N,M)
34
IBM Computational Biology Center 34 GMRCA h LCAA l LCA & lone pair TMRCA h LCA GMRCA h LCAA l LCA & lone node Pedigree Graph G PG (K,N) 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) 2. Mixed Subgraph G PGE (K,N,M) “Topological” Defns of LCAA
35
IBM Computational Biology Center 35 Expected Depth E(D) of LCA/LCAA O(N 2 ) O(K) O(KM) Pedigree Graph G PG (K,N) 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) 2. Mixed Subgraph G PGE (K,N,M)
36
IBM Computational Biology Center 36 RECONSTRUCTABILITY O(N 2 ) O(K) O(KM) Pedigree Graph G PG (K,N) 1. Red Subgraph G PTX (K,N) Blue Subgraph G PTY (K,N) 2. Mixed Subgraph G PGE (K,N,M)
37
IBM Computational Biology Center 37 Summary: History Reconstruction? 1. Mixed Subgraph models recombinations Only fragments of the chromosome 2. In reality, only a minimal structure (HUD) of the G PGE (K,N,M) or ARG can be estimated Forbidden structures ….
38
IBM Computational Biology Center 38 1. Motivation 2. Reconstructability (Random Graph Framework) 3. Reconstruction Algorithm (DSR Algorithm) 4. Conclusion RoadMap L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008 L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009
39
IBM Computational Biology Center 39 OUTPUT: Recombinational Landscape (Recotypes) INPUT: Chromosomes (haplotypes)
40
IBM Computational Biology Center 40 Granularity g Analyze Results YES NO IRiS Acceptable p-value? Our Approach statistical combinatorial M Mele, A Javed, F Calafell, L Parida, J Bertranpetit and Genographic Consortium Recombination-based genomics: a genetic variation analysis in human populations, under submission.
41
IBM Computational Biology Center 41 Preprocess: Dimension reduction via Clustering 11 12 13 14 15 16 0 17 1 18 4 19 6 5 20 8 21 9 10 7 22 23 3 2 24
42
IBM Computational Biology Center 42 Granularity g Analyze Results YES NO IRiS Acceptable p-value? Analysis Flow statistical combinatorial
43
IBM Computational Biology Center 43 p-value Estimation
44
IBM Computational Biology Center 44 Comparison of the Randomization Schemes
45
IBM Computational Biology Center 45 SNP Blocks (granularity g=3)
46
IBM Computational Biology Center 46 Granularity g Analyze Results YES NO IRiS Acceptable p-value? Analysis Flow statistical combinatorial
47
IBM Computational Biology Center 47 Stage Haplotypes: use SNP block patterns Segment along the length: infer trees Infer network (ARG) biological insights computational insights IRiS ( I dentifying R ecombinations i n S equences) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008
48
IBM Computational Biology Center 48 Segmentation 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345 11111111111111111111111111111111111111112222222222222222222222222222222222233333333344444444455555555555555----
49
IBM Computational Biology Center 49 Segmentation
50
IBM Computational Biology Center 50 Consensus of Trees
51
IBM Computational Biology Center 51 Algorithm Design 1. Ensure compatibility of component trees 2. Parsimony model: minimize the no. of recombinations
52
IBM Computational Biology Center 52 Algorithm Design 1. Ensure compatibility of component trees 2. Parsimony model: minimize the no. of recombinations Theorem: The problem is NP-Hard. “It is impossible to design an algorithm that guarantees optimality.”
53
IBM Computational Biology Center 53 DSR Scheme (Dominant—Subdominant---Recombinant)
54
IBM Computational Biology Center 54 DSR Scheme: Level 1
55
IBM Computational Biology Center 55 DSR Assignment Rules 1. At most one D per row and column; if no D, at most one S per row and column 2. At most one non- R in the row and column, but not both
56
IBM Computational Biology Center 56 DSR Assignment Rules 1. Each row and each column has at most one D ELSE has at most one S 2. A non-R can have other non-Rs either in its row or its column but NOT both
57
IBM Computational Biology Center 57 DSR Scheme: Level 1
58
IBM Computational Biology Center 58 DSR Scheme: Level 2
59
IBM Computational Biology Center 59 DSR Scheme: Level 2
60
IBM Computational Biology Center 60 DSR Scheme: Level 3
61
IBM Computational Biology Center 61 DSR Scheme: Level 3
62
IBM Computational Biology Center 62 DSR Scheme: Level 4
63
IBM Computational Biology Center 63 DSR Scheme: Level 5
64
IBM Computational Biology Center 64 Mathematical Analysis: Approximation Factor Greedy DSR Scheme Z and Y are computable functions of the input L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009
65
IBM Computational Biology Center 65 Granularity g Analyze Results YES NO IRiS Acceptable p-value? Analysis Flow statistical combinatorial
66
IBM Computational Biology Center 66 IRiS Output: RECOTYPE Recombination vectors R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 ………. s1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 ………. s2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 ………..
67
IBM Computational Biology Center 67 Quick Sanity Check: Ultrametric Network on RECOTYPES
68
IBM Computational Biology Center 68 Stage Haplotypes: use SNP block patterns Segment along the length: infer trees Infer network (ARG) biological insights computational insights IRiS ( I dentifying R ecombinations i n S equences) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008 IRiS software will be released by the end of summer ’09 Asif Javed
69
IBM Computational Biology Center 69 What’s in a name? 1. Allele-frequency variations between populations is also reflected in the purely recombination-based variations 2. Detects subcontinental divide from short segments based on populations level analysis 3. Detects populations from short segments based on recombination events analysis RECOMBIN-OMICS Jaume Bertranpetit RECOMBIN-OMETRICS Robert Elston
70
IBM Computational Biology Center 70 1. Allele-frequency variations between populations is also reflected in the purely recombination-based variations 2. Detects subcontinental divide from short segments based on populations level analysis 3. Detects populations from short segments based on recombination events analysis Are we ready for the OMICS / OMETRICS? o population-specific signals ? o other critical signals ? o anything we didn’t already know?
71
IBM Computational Biology Center 71 Thank you!!
72
IBM Computational Biology Center 72
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.