Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 TB Data Visualization and correlations in TB Patient Networks.

Similar presentations


Presentation on theme: "1 TB Data Visualization and correlations in TB Patient Networks."— Presentation transcript:

1 1 TB Data Visualization and correlations in TB Patient Networks

2 Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 2

3 Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 3

4 1. Spoligoforests  The 3-step algorithm to decide the deletion events in the spoligoforest uses two assumptions: a) Hidden Parent Assumption: Each spoligotype loses one or more contiguous spacer in a deletion event. b) Single Inheritance: Each spoligotype mutates from one spoligotype. 4

5 Child node and its possible parents 5 Hidden Parent Assumption assigns possible parents to a child node. Each node represents a spoligotype in a spoligoforest. Before applying Single Inheritance, each node has multiple parents, which means that there are multiple sources of mutation which results in the spoligotype of the child node. We find the unique and most likely source of mutation by Single Inheritance.

6 1. Spoligoforests - MAKESPOLIGOFOREST algorithm 6

7 HPA SpolHamming MiruL2 RandomPick MiruHamming MAKESPOLIGOFOREST ALGORITHM

8 CDC DATA

9 Indo Oceanic East African Indian East Asian Euro-American M. africanum M. bovis

10 10

11 Genetic Diversity of TB in US 11

12 NYC Isolates 12

13 Tanaka’s Model 13 Unambiguous edges (mutations, deletions): After applying Hidden Parent Assumption, some nodes in the spoligoforest have exactly one parent node. So, there is no need to apply Single Inheritance rule. Tanaka et al. found out that Length of deletion frequency of unambiguous edges follows Zipf distribution.

14 Tanaka’s Model: Use of Zipf distribution and Single Inheritance 14 After assigning edge weights to all possible deletions according to this model, Tanaka’ s model pick the unique parent by choosing the deletion with maximum weight.

15 Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 15

16 2. Correlations in Spoligoforests  Outdegree distribution vs. Outdegree: Follows Zipf distribution.  Zipf Distribution: Preferential Attachment. Rich-gets-richer model.  Outdegree of a spoligotype in the spoligoforest: The number of spoligotypes this spoligotype can mutate into by a deletion event. 16

17 Outdegree distribution vs. Outdegree 17

18 Outdegree distribution vs. Outdegree by major lineages 18

19 2. Correlations in Spoligoforests  Length of frequency distribution vs. Length of Frequency: Follows Zipf Distribution  Zipf Distribution: Preferential Attachment. Rich-gets-richer model.  We take all edges in the spoligoforest into account, compared to unambiguous edges only approach in Tanaka’s model. 19

20 Length of frequency distribution vs. Length of Frequency 20

21 Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 21

22 Patient Graphs – NYC Data  4984 Patients  137 Countries  793 Spoligotypes  2648 RFLPs  3235 Distinct Genotypes  594 “Named” Clusters 22

23 Patient Graphs – Questions  Is there a Patient-Pathogen trend that TB transmission follows?  Is the demographic distribution of the patients infected by the bacteria of same genotype uneven?  How can we fit a TB transmission and mutation model, given that the environment, such as the location on the world map, affects the transmission of TB? 23

24 M. bovis 24

25 M. africanum 25

26 East Asian 26

27 East-African Indian 27

28 Euro American 28

29 Indo Oceanic 29

30 Named clusters of interest: Cluster 3  Spoligotype: S00030  RFLP: C(3)  166 patients  Euro-American 30

31 Named clusters of interest: Cluster 33  Spoligotype: S00034  RFLP: W(18)  21 patients  East Asian  W-Beijing 31

32 Named clusters of interest: Cluster 4  Spoligotype: S00009  RFLP: H(2)  99 patients  Euro-American 32

33 Named clusters of interest: Cluster 29  Spoligotype: S00034  RFLP: N3(13)  21 patients  East Asian 33

34 Questions  Does the high transmission rate in an area increase the likelihood of mutation?  How do MIRUs mutate? Is there a pattern of deletion events or an assumption such as Hidden Parent Assumption for 12-bit MIRU?  Can we map the patterns of mutation events in SNPs of MIRU to 12-bit MIRU? 34


Download ppt "1 TB Data Visualization and correlations in TB Patient Networks."

Similar presentations


Ads by Google