1 TB Data Visualization and correlations in TB Patient Networks.

Slides:



Advertisements
Similar presentations
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Modeling Grid Job Time Properties Lovro.
Advertisements

CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Endemic or Outbreak? Differentiating recent transmission of an historic tuberculosis strain in New York City IUATLD-NAR 16 th Annual Meeting February 23-25,
. Exact Inference in Bayesian Networks Lecture 9.
Label Placement and graph drawing Imo Lieberwerth.
Robert Ketcham & Sue Katz Creating a new problem space: Genetic Diversity of Mycobacterium tuberculosis.
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Network Correlated Data Gathering With Explicit Communication: NP- Completeness and Algorithms R˘azvan Cristescu, Member, IEEE, Baltasar Beferull-Lozano,
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
Evolvable by Design Panos Oikonomou James Franck Institute Institute of Biophysical Dynamics The University of Chicago Philippe Cluzel How topology affects.
Learning Multiple Evolutionary Pathways from Cross-sectional Data Niko Beerenwinkel, Jorg Rahnenfuhrer, Martin Daumer, Daniel Hoffmann,Rolf Kaiser, Joachim.
Avoiding Energy Holes in Wireless Sensor Network with Nonuniform Node Distribution Xiaobing Wu, Guihai Chen and Sajal K. Das Parallel and Distributed Systems.
Kristin P. Bennett Dept of Mathematical Sciences and Dept of Computer Sciences Rensselaer Polytechnic Institute
CSE182-L17 Clustering Population Genetics: Basics.
A dynamic program algorithm for haplotype block partitioning Zhang, et. al. (2002) PNAS. 99, 7335.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Restriction Fragment Length Polymorphisms (RFLPs) By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Solving the Concave Cost Supply Scheduling Problem Xia Wang, Univ. of Maryland Bruce Golden, Univ. of Maryland Edward Wasil, American Univ. Presented at.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
1 presentation of article: Small-World File-Sharing Communities Article: Adriana Iamnitchi, Matei Ripeanu, Ian Foster Presentation: Periklis Akritidis.
Fuzzy BSB-neuro-model. «Brain-State-in-a-Box Model» (BSB-model) Dynamic of BSB-model: (1) Activation function: (2) 2.
Introduction to Evolutionary Algorithms Session 4 Jim Smith University of the West of England, UK May/June 2012.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Safeguarding Animal Health Genotyping of Mycobacterium tuberculosis cultured from elephants Tuberculosis in Elephants: Science, Myth, and Beyond APHIS.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
DNA Fingerprinting Project Lead the Way Human Body Systems.
Compositional Assemblies Behave Similarly to Quasispecies Model
CDC's Model for West Africa Ebola Outbreak Summarized by Li Wang, 11/14.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Overview: Molecular Epi
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Lecture 16 Tuesday, April 9, 2013 BiSc 001 Spring 2013 Guest Lecture Dr. Jihye Park.
Application of DNA-based methods to epidemiology of TB Marcel A. Behr Professor, McGill University Director, McGill Int. TB Centre
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Class 2: Graph Theory IST402.
Chapter 11 “The Mechanisms of Evolution” w Section 11.1 “Darwin Meets DNA” Objective: Identify mutations and gene shuffling as the primary sources of inheritable.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
The Haplotype Blocks Problems Wu Ling-Yun
networks and the spread of computer viruses Authors:M. E. J. Newman, S. Forrest, and J. Balthrop. Published:September 10, Physical Review.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Time-Course Network Enrichment
Three lessons I learned
CSE 280A: Advanced Topics in Computational Molecular Biology
Alexander Zelikovsky Computer Science Department
Building and Analyzing Genome-Wide Gene Disruption Networks
A) IS6110 restriction fragment length polymorphism (RFLP) and b) spoligotyping patterns of a multidrug-resistant tuberculosis (MDR-TB) cluster. a) IS6110.
National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention
Re-analysis of epidemiologically linked tuberculosis cases not supported by IS6110- RFLP-based genotyping  A. Martín, J. Iñigo, F. Chaves, M. Herranz,
Vineet Bafna/Pavel Pevzner
Anastasia Baryshnikova  Cell Systems 
Using Quotient Graphs to Model Neutrality in Evolutionary Search Dominic Wilson Devinder Kaur University of Toledo.
S. Godreuil, F. Renaud, M. Choisy, J. J. Depina, E. Garnotel, M
Genetic diversity of Mycobacterium tuberculosis isolates from foreign-born and Japan- born residents in Tokyo  M. Kato-Miyazawa, T. Miyoshi-Akiyama, Y.
(a) Venn diagram showing the degree of overlap of the following different approaches: G-test for significant differences between groups (with Bonferroni.
Effect of VKORC1 Haplotype Combination on Clinical Warfarin Dose - Common haplotypes (H1, H2, H7, H8, and H9) were clustered with use of the UPGMA method.
Division of Tuberculosis Elimination
Haplotype map of S. cerevisiae RNQ1.
Distribution of podocyte gene mutations in patients with genetic congenital nephrotic syndrome (CNS) and steroid–resistant nephrotic syndrome (SRNS). Distribution.
Laura Lane, Epidemiologist
Coexpression of other immune genes with ImSig core signatures.
Presentation transcript:

1 TB Data Visualization and correlations in TB Patient Networks

Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 2

Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 3

1. Spoligoforests  The 3-step algorithm to decide the deletion events in the spoligoforest uses two assumptions: a) Hidden Parent Assumption: Each spoligotype loses one or more contiguous spacer in a deletion event. b) Single Inheritance: Each spoligotype mutates from one spoligotype. 4

Child node and its possible parents 5 Hidden Parent Assumption assigns possible parents to a child node. Each node represents a spoligotype in a spoligoforest. Before applying Single Inheritance, each node has multiple parents, which means that there are multiple sources of mutation which results in the spoligotype of the child node. We find the unique and most likely source of mutation by Single Inheritance.

1. Spoligoforests - MAKESPOLIGOFOREST algorithm 6

HPA SpolHamming MiruL2 RandomPick MiruHamming MAKESPOLIGOFOREST ALGORITHM

CDC DATA

Indo Oceanic East African Indian East Asian Euro-American M. africanum M. bovis

10

Genetic Diversity of TB in US 11

NYC Isolates 12

Tanaka’s Model 13 Unambiguous edges (mutations, deletions): After applying Hidden Parent Assumption, some nodes in the spoligoforest have exactly one parent node. So, there is no need to apply Single Inheritance rule. Tanaka et al. found out that Length of deletion frequency of unambiguous edges follows Zipf distribution.

Tanaka’s Model: Use of Zipf distribution and Single Inheritance 14 After assigning edge weights to all possible deletions according to this model, Tanaka’ s model pick the unique parent by choosing the deletion with maximum weight.

Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 15

2. Correlations in Spoligoforests  Outdegree distribution vs. Outdegree: Follows Zipf distribution.  Zipf Distribution: Preferential Attachment. Rich-gets-richer model.  Outdegree of a spoligotype in the spoligoforest: The number of spoligotypes this spoligotype can mutate into by a deletion event. 16

Outdegree distribution vs. Outdegree 17

Outdegree distribution vs. Outdegree by major lineages 18

2. Correlations in Spoligoforests  Length of frequency distribution vs. Length of Frequency: Follows Zipf Distribution  Zipf Distribution: Preferential Attachment. Rich-gets-richer model.  We take all edges in the spoligoforest into account, compared to unambiguous edges only approach in Tanaka’s model. 19

Length of frequency distribution vs. Length of Frequency 20

Outline  1. Spoligoforests 2. Correlations in Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs 3. Patient graphs 21

Patient Graphs – NYC Data  4984 Patients  137 Countries  793 Spoligotypes  2648 RFLPs  3235 Distinct Genotypes  594 “Named” Clusters 22

Patient Graphs – Questions  Is there a Patient-Pathogen trend that TB transmission follows?  Is the demographic distribution of the patients infected by the bacteria of same genotype uneven?  How can we fit a TB transmission and mutation model, given that the environment, such as the location on the world map, affects the transmission of TB? 23

M. bovis 24

M. africanum 25

East Asian 26

East-African Indian 27

Euro American 28

Indo Oceanic 29

Named clusters of interest: Cluster 3  Spoligotype: S00030  RFLP: C(3)  166 patients  Euro-American 30

Named clusters of interest: Cluster 33  Spoligotype: S00034  RFLP: W(18)  21 patients  East Asian  W-Beijing 31

Named clusters of interest: Cluster 4  Spoligotype: S00009  RFLP: H(2)  99 patients  Euro-American 32

Named clusters of interest: Cluster 29  Spoligotype: S00034  RFLP: N3(13)  21 patients  East Asian 33

Questions  Does the high transmission rate in an area increase the likelihood of mutation?  How do MIRUs mutate? Is there a pattern of deletion events or an assumption such as Hidden Parent Assumption for 12-bit MIRU?  Can we map the patterns of mutation events in SNPs of MIRU to 12-bit MIRU? 34