1 TB Data Visualization and correlations in TB Patient Networks
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 2
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 3
1. Spoligoforests The 3-step algorithm to decide the deletion events in the spoligoforest uses two assumptions: a) Hidden Parent Assumption: Each spoligotype loses one or more contiguous spacer in a deletion event. b) Single Inheritance: Each spoligotype mutates from one spoligotype. 4
Child node and its possible parents 5 Hidden Parent Assumption assigns possible parents to a child node. Each node represents a spoligotype in a spoligoforest. Before applying Single Inheritance, each node has multiple parents, which means that there are multiple sources of mutation which results in the spoligotype of the child node. We find the unique and most likely source of mutation by Single Inheritance.
1. Spoligoforests - MAKESPOLIGOFOREST algorithm 6
HPA SpolHamming MiruL2 RandomPick MiruHamming MAKESPOLIGOFOREST ALGORITHM
CDC DATA
Indo Oceanic East African Indian East Asian Euro-American M. africanum M. bovis
10
Genetic Diversity of TB in US 11
NYC Isolates 12
Tanaka’s Model 13 Unambiguous edges (mutations, deletions): After applying Hidden Parent Assumption, some nodes in the spoligoforest have exactly one parent node. So, there is no need to apply Single Inheritance rule. Tanaka et al. found out that Length of deletion frequency of unambiguous edges follows Zipf distribution.
Tanaka’s Model: Use of Zipf distribution and Single Inheritance 14 After assigning edge weights to all possible deletions according to this model, Tanaka’ s model pick the unique parent by choosing the deletion with maximum weight.
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 15
2. Correlations in Spoligoforests Outdegree distribution vs. Outdegree: Follows Zipf distribution. Zipf Distribution: Preferential Attachment. Rich-gets-richer model. Outdegree of a spoligotype in the spoligoforest: The number of spoligotypes this spoligotype can mutate into by a deletion event. 16
Outdegree distribution vs. Outdegree 17
Outdegree distribution vs. Outdegree by major lineages 18
2. Correlations in Spoligoforests Length of frequency distribution vs. Length of Frequency: Follows Zipf Distribution Zipf Distribution: Preferential Attachment. Rich-gets-richer model. We take all edges in the spoligoforest into account, compared to unambiguous edges only approach in Tanaka’s model. 19
Length of frequency distribution vs. Length of Frequency 20
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 21
Patient Graphs – NYC Data 4984 Patients 137 Countries 793 Spoligotypes 2648 RFLPs 3235 Distinct Genotypes 594 “Named” Clusters 22
Patient Graphs – Questions Is there a Patient-Pathogen trend that TB transmission follows? Is the demographic distribution of the patients infected by the bacteria of same genotype uneven? How can we fit a TB transmission and mutation model, given that the environment, such as the location on the world map, affects the transmission of TB? 23
M. bovis 24
M. africanum 25
East Asian 26
East-African Indian 27
Euro American 28
Indo Oceanic 29
Named clusters of interest: Cluster 3 Spoligotype: S00030 RFLP: C(3) 166 patients Euro-American 30
Named clusters of interest: Cluster 33 Spoligotype: S00034 RFLP: W(18) 21 patients East Asian W-Beijing 31
Named clusters of interest: Cluster 4 Spoligotype: S00009 RFLP: H(2) 99 patients Euro-American 32
Named clusters of interest: Cluster 29 Spoligotype: S00034 RFLP: N3(13) 21 patients East Asian 33
Questions Does the high transmission rate in an area increase the likelihood of mutation? How do MIRUs mutate? Is there a pattern of deletion events or an assumption such as Hidden Parent Assumption for 12-bit MIRU? Can we map the patterns of mutation events in SNPs of MIRU to 12-bit MIRU? 34