1 Harvard Medical SchoolMassachusetts Institute of Technology Inferring Nonstationary Gene Networks from Temporal Gene Expression Data Hsun-Hsien Chang 1, Jonathan J. Smith 2, Marco F. Ramoni 1 1 Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2 Department of Mathematics, Massachusetts Institute of Technology IEEE Workshop on Signal Processing Systems October 7, 2010
2 Harvard Medical SchoolMassachusetts Institute of Technology Background Genetic information flows from DNA to RNA through transcription. Modern microarray technologies are able to assess expression of 50K genes in parallel. Gene expression is the measure of RNA abundance in cells, revealing the gene activities.
3 Harvard Medical SchoolMassachusetts Institute of Technology Clinical Applications Thanks to cost down, more samples can be collected in a single study. A new clinical application: –Monitor time-series gene expression in response to drugs, treatments, vaccines, virus infection, etc. T gene expre. T1T2T3T4T5 Multiple patients in distinct biological conditions.
4 Harvard Medical SchoolMassachusetts Institute of Technology Time-Series Gene Expression Analysis Since genes interact each other in cells, an intriguing analysis is to infer gene networks: –Detailed models (e.g., differential equations). –Abstract models (e.g., Boolean networks). –Probabilistic graphical models (e.g., dynamic Bayesian networks). Do not require densely sampled data. Model expression levels by random variables to handle noisy expression measurements and biological variability. Utilize the inferred networks to make prediction. gene on gene off
5 Harvard Medical SchoolMassachusetts Institute of Technology Data Representation by Bayesian Networks Bayesian networks are directed acyclic graphs where: –The network model can serve as a prediction tool. XTXT YTYT Z T+1 given XTXT YTYT predicted Z T+1 –Example: variables X and Y at time T modulate variable Z at time T+1. Dynamic Bayesian networks with arcs indicating temporal dependency. –Nodes correspond to random variables (i.e., expressions of genes, clinical variables). –Directed arcs encode conditional probabilities of the target (child) nodes on the source (parent) nodes. A B C ED
6 Harvard Medical SchoolMassachusetts Institute of Technology Network Inference Engine ATAT BTBT CTCT NTNT VTVT A T+1 B T+1 C T+1 N T+1 V T+1 First-order Markov process: data at time T+1 depends only on the preceding time T. For a variable at a time T+1, search which set of variables at time T has the highest likelihood of modulating its value at T+1. Step-wise search algorithm. Clinical variable Genes
7 Harvard Medical SchoolMassachusetts Institute of Technology Inference of Whole Dynamic Gene Network ATAT BTBT CTCT NTNT VTVT A T+1 B T+1 C T+1 N T+1 V T+1 A T+2 B T+2 C T+2 N T+2 V T+2 Infer a transition network between every pair of times.
8 Harvard Medical SchoolMassachusetts Institute of Technology Parallelize Learning Individual Transition Nets A T+1 B T+1 C T+1 N T+1 V T+1 A T+2 B T+2 C T+2 N T+2 V T+2 ATAT BTBT CTCT NTNT VTVT A T+1 B T+1 C T+1 N T+1 V T+1 A T+2 B T+2 C T+2 N T+2 V T+2
9 Harvard Medical SchoolMassachusetts Institute of Technology Parallelize Parent Searching of Individual Variables ATAT BTBT CTCT NTNT VTVT A T+1 B T+1 C T+1 N T+1 V T+1
10 Harvard Medical SchoolMassachusetts Institute of Technology Step-by-Step Prediction ATAT BTBT CTCT NTNT VTVT A T+1 B T+1 C T+1 N T+1 V T+1 A T+2 B T+2 C T+2 N T+2 V T+2 ATAT BTBT CTCT NTNT VTVT A T+2 B T+2 C T+2 N T+2 V T+2 A T+1 B T+1 C T+1 N T+1 V T+1 A T+1 B T+1 C T+1 N T+1 V T+1 given data predicted given data
11 Harvard Medical SchoolMassachusetts Institute of Technology Forecasting by Initial Data ATAT BTBT CTCT NTNT VTVT A T+1 B T+1 C T+1 N T+1 V T+1 A T+2 B T+2 C T+2 N T+2 V T+2 ATAT BTBT CTCT NTNT VTVT A T+2 B T+2 C T+2 N T+2 V T+2 A T+1 B T+1 C T+1 N T+1 V T+1 given data predicted
12 Harvard Medical SchoolMassachusetts Institute of Technology Clinical Study: HIV Viral Load Tracking Global AIDS epidemic is one of the greatest threats to human health, causing 2 million deaths every year. Viral load (i.e., virus density in blood) is: –associated with clinical outcomes. –an indicator of which treatment physicians should provide. If there is a tool to predict/forecast viral load trajectory, physicians could foresee how patients progress to AIDS and could allocate the best treatments upfront. Enroll viral load gene expre. Data: Fourteen (12 Africans, 2 Americans) untreated adult patients during acute infection.
13 Harvard Medical SchoolMassachusetts Institute of Technology Dynamic Gene Network of HIV Viral Load
14 Harvard Medical SchoolMassachusetts Institute of Technology
15 Harvard Medical SchoolMassachusetts Institute of Technology Accuracy of HIV Viral Load Tracking Fitted Validation (Accuracy) Cross Validation (Robustness) Dynamic Gene Network97.8%95.8% Viral Load Auto-Regression90.1%89.5% Prediction accuracy: Forecasting accuracy: Fitted Validation (Accuracy) Cross Validation (Robustness) Dynamic Gene Network92.9%91.8% Viral Load Auto-Regression88.7%87.0%
16 Harvard Medical SchoolMassachusetts Institute of Technology 30 Genes Dynamically Interact with Viral Load AMY1A: amylase, alpha 1a; salivaryOTOF: otoferlin TNFAIP6 : tumor necrosis factor, alpha-induced protein 6 KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 NBPF14: neuroblastoma breakpoint family, member 14OSBP2: oxysterol binding protein 2 IRF7: interferon regulatory factor 7CFD: complement factor d (adipsin) HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1 HLA-DRB1: major histocompatibility complex, class ii, dr beta 1 RPS23: ribosomal protein s23GPR56: g protein-coupled receptor 56 IFI44L: interferon-induced protein 44-likeCCL23: chemokine (c-c motif) ligand 23 KLRC2: killer cell lectin-like receptor subfamily c, member 2 ITIF3: interferon-induced protein with tetratricopeptide repeats 3 SOS1: son of sevenless homolog 1 (drosophila)G1P2: interferon, alpha-inducible protein (clone ifi-15k) LOC652775: similar to ig kappa chain v-v region l7 precursor CCL3L1: chemokine (c-c motif) ligand 3-like 1 MBP: myelin basic proteinS100P: s100 calcium binding protein p IFITM3: interferon induced transmembrane protein 3 (1-8u) MX1: myxovirus (influenza virus) resistance 1, interferon- inducible protein p78 (mouse) HERC5: hect domain and rld 5NME4: non-metastatic cells 4, protein expressed in HLA-DQB1: major histocompatibility complex, class ii, dq beta 1 LOC653157: similar to iduronate 2-sulfatase precursor (alpha-l- iduronate sulfate sulfatase) (idursulfase) LOC643313: similar to hypothetical protein loc284701RSAD2: radical s-adenosyl methionine domain containing 2
17 Harvard Medical SchoolMassachusetts Institute of Technology Conclusions A Bayesian network framework to infer dynamic gene networks from time-series gene expression microarrays: –Does not require densely sampled microarray data. –Able to handle noise and handle biological variability. –Temporal dependency is captured by first-order Markov process. –The optimal network model is achieved by parallelized search algorithm. Application to HIV viral load tracking shows how our method can be used in clinical studies: –Our network model tracks viral load trajectories with higher accuracy than viral load auto-regressive model. –Our model provides candidate gene targets for drug/vaccine development.
18 Harvard Medical SchoolMassachusetts Institute of Technology Acknowledgements Supported by Center for HIV/AIDS Vaccine Immunology (CHAVI) # U19 AI : National Institute of Allergy and Infectious Diseases (NIAID) National Institutes of Health (NIH) Division of AIDS (DAIDS) U.S. Department of Health and Human Services (HHS)
19 Harvard Medical SchoolMassachusetts Institute of Technology A T+2 B T+2 C T+2 N T+2 VL T+2 A T+3 B T+3 C T+3 N T+3 VL T+3 A T+1 B T+1 C T+1 N T+1 VL T+1 A T+2 B T+2 C T+2 N T+2 VL T+2 Stationary Network Inference ATAT BTBT CTCT NTNT VL T A T+1 B T+1 C T+1 N T+1 VL T+1 A T+2 B T+2 C T+2 N T+2 VL T+2 All networks between pairs of times are identical.
20 Harvard Medical SchoolMassachusetts Institute of Technology
21 Harvard Medical SchoolMassachusetts Institute of Technology Pathway: Immune Response (16/30 genes, p<10 -6 ) AMY1A: amylase, alpha 1a; salivaryOTOF: otoferlin TNFAIP6 : tumor necrosis factor, alpha-induced protein 6 KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 NBPF14: neuroblastoma breakpoint family, member 14OSBP2: oxysterol binding protein 2 IRF7: interferon regulatory factor 7CFD: complement factor d (adipsin) HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1 HLA-DRB1: major histocompatibility complex, class ii, dr beta 1 RPS23: ribosomal protein s23GPR56: g protein-coupled receptor 56 IFI44L: interferon-induced protein 44-likeCCL23: chemokine (c-c motif) ligand 23 KLRC2: killer cell lectin-like receptor subfamily c, member 2 ITIF3: interferon-induced protein with tetratricopeptide repeats 3 SOS1: son of sevenless homolog 1 (drosophila)G1P2: interferon, alpha-inducible protein (clone ifi-15k) LOC652775: similar to ig kappa chain v-v region l7 precursor CCL3L1: chemokine (c-c motif) ligand 3-like 1 MBP: myelin basic proteinS100P: s100 calcium binding protein p IFITM3: interferon induced transmembrane protein 3 (1-8u) MX1: myxovirus (influenza virus) resistance 1, interferon- inducible protein p78 (mouse) HERC5: hect domain and rld 5NME4: non-metastatic cells 4, protein expressed in HLA-DQB1: major histocompatibility complex, class ii, dq beta 1 LOC653157: similar to iduronate 2-sulfatase precursor (alpha-l- iduronate sulfate sulfatase) (idursulfase) LOC643313: similar to hypothetical protein loc284701RSAD2: radical s-adenosyl methionine domain containing 2
22 Harvard Medical SchoolMassachusetts Institute of Technology major histocompatibility complex, class ii, dr beta 1otoferlin tumor necrosis factor, alpha-induced protein 6killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 neuroblastoma breakpoint family, member 14oxysterol binding protein 2 interferon regulatory factor 7complement factor d (adipsin) major histocompatibility complex, class ii, dq alpha 1amylase, alpha 1a; salivary ribosomal protein s23g protein-coupled receptor 56 killer cell lectin-like receptor subfamily c, member 2chemokine (c-c motif) ligand 23 interferon-induced protein 44-likeinterferon-induced protein with tetratricopeptide repeats 3 son of sevenless homolog 1 (drosophila)interferon, alpha-inducible protein (clone ifi-15k) similar to ig kappa chain v-v region l7 precursorchemokine (c-c motif) ligand 3-like 1 myelin basic proteins100 calcium binding protein p interferon induced transmembrane protein 3 (1-8u)myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) hect domain and rld 5non-metastatic cells 4, protein expressed in major histocompatibility complex, class ii, dq beta 1similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase) similar to hypothetical protein loc284701radical s-adenosyl methionine domain containing 2 Pathway: Antiviral Defense (8/30 genes, p<10 -3 )
23 Harvard Medical SchoolMassachusetts Institute of Technology major histocompatibility complex, class ii, dr beta 1otoferlin tumor necrosis factor, alpha-induced protein 6killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 neuroblastoma breakpoint family, member 14oxysterol binding protein 2 interferon regulatory factor 7complement factor d (adipsin) major histocompatibility complex, class ii, dq alpha 1amylase, alpha 1a; salivary ribosomal protein s23g protein-coupled receptor 56 killer cell lectin-like receptor subfamily c, member 2chemokine (c-c motif) ligand 23 interferon-induced protein 44-likeinterferon-induced protein with tetratricopeptide repeats 3 son of sevenless homolog 1 (drosophila)interferon, alpha-inducible protein (clone ifi-15k) similar to ig kappa chain v-v region l7 precursorchemokine (c-c motif) ligand 3-like 1 myelin basic proteins100 calcium binding protein p interferon induced transmembrane protein 3 (1-8u)myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) hect domain and rld 5non-metastatic cells 4, protein expressed in major histocompatibility complex, class ii, dq beta 1similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase) similar to hypothetical protein loc284701radical s-adenosyl methionine domain containing 2 Pathway: Inflammatory Response (5/30 genes, p<0.05)
24 Harvard Medical SchoolMassachusetts Institute of Technology major histocompatibility complex, class ii, dr beta 1otoferlin tumor necrosis factor, alpha-induced protein 6killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 neuroblastoma breakpoint family, member 14oxysterol binding protein 2 interferon regulatory factor 7complement factor d (adipsin) major histocompatibility complex, class ii, dq alpha 1amylase, alpha 1a; salivary ribosomal protein s23g protein-coupled receptor 56 killer cell lectin-like receptor subfamily c, member 2chemokine (c-c motif) ligand 23 interferon-induced protein 44-likeinterferon-induced protein with tetratricopeptide repeats 3 son of sevenless homolog 1 (drosophila)interferon, alpha-inducible protein (clone ifi-15k) similar to ig kappa chain v-v region l7 precursorchemokine (c-c motif) ligand 3-like 1 myelin basic proteins100 calcium binding protein p interferon induced transmembrane protein 3 (1-8u)myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) hect domain and rld 5non-metastatic cells 4, protein expressed in major histocompatibility complex, class ii, dq beta 1similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase) similar to hypothetical protein loc284701radical s-adenosyl methionine domain containing 2 Interferon Family Dominates 3 pathways;2 pathways;1 pathway