Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making sense of large amounts of molecular data Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest.

Similar presentations


Presentation on theme: "Making sense of large amounts of molecular data Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest."— Presentation transcript:

1 Making sense of large amounts of molecular data Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory 1

2 Proteins Nucleic Acids Macromolecular Complex How do components of biological systems interact to produce behavior?

3 Molecular pathways 3 mTOR pathway EGFR pathway http://biocarta.com

4 A Mammoth Problem

5 Scientific Method Overview 5 Hypothesis Experimental design Data generation Analysis/modeling Predictions Interpretation Hypothesis

6 Circumstantial Evidence Traditional experimental approach Cigarette butt on street Neighbor was eyewitness to crime Missing jewelry from the house Fingerprints on doorknob High-throughput experimental approach Cigarette sales in city Testimony from everyone on the block All diamonds sold over last year in 10 mile radius Fingerprints on every surface in the house 6

7 Problem New methods generating mountains of data Very complex systems Traditional methods fail in some cases Progress will be made through better use of this data Objectives Formulate hypotheses for further investigation Identify gene/protein ‘targets’ Identify pathways that drive disease Develop systems-level biological understanding 7

8 What is a ‘target’? ‘Critical nodes’ Regulators of important processes Outcome of modeling (a prediction) that can be used to formulate a hypothesis What are targets used for? Mechanistic understanding of disease processes Potential biomarkers of disease Potential therapeutic treatments: drug development 8

9 Examples I’ll be talking about Bacterial virulence (Salmonella Typhimurium) Viral pathogenesis (avian flu and SARS) Ovarian cancer Approaches I’ll be talking about Machine learning Biological networks Data integration 9

10 LPS TLR4 MEK ERK Egr-1 pH Mg 2+ ROS/ RNS SPI2-T3S Bacterial detection Host defense Environmental response Virulence activation ssrA/B phoP/Q ompR/ envZ ydgT Bacterial survival Invasion Effectors Environmental Modulation Pathogen directed Host directed SPI1+ SCV LPS iNOS NRAMP Fe 2+ Effectors (e.g. SifA, SlrP, SseJ, SspH2) SPI2-T3S Environmental response Virulence activation ssrA/B phoP/Q ompR/ envZ ydgT Effectors (e.g. SifA, SlrP, SseJ, SspH2) Salmonella Typhimurium Pathogen Host

11 Karou Geddes Type-III secretion system secreted effectors SlrP SspH2 SseI SseJ SifA SifB SpvB SseK-1 SopD-1 InvJ SipC +25 other known effectors +??? other unknown effectors http://en.wikipedia.org/

12 Overview of the SVM-based Identification and Evaluation of Virulence Effectors (SIEVE) Method

13 D2 D1 SVM-based Discrimination Positive Negative

14 SIEVE Validation Using CyaA Fusions 14 McDermott, et al. 2011. Infection and Immunity. 79(1):23-32 Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43

15 Biological Networks Types of networks Regulatory networks Protein-protein interaction networks Biochemical reaction networks Association networks Network Node = gene/protein or other component Edge = inferred relationship between components 15 McDermott JE, et al. 2010. Drug Markers, 28(4):253-66.

16 Merging disparate observations of a system to produce a single, more informative view 16 SNVs CNVs mRNA methylation protein phosphorylatio n miRNA Genome Comparison Pathway enrichment LEAP Network analysis metabolome

17 Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions? A B C Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8 Network inference method conditions gene

18 What are networks useful for? Networks can be used for: Pretty figures Hypothesis generation Functional modules and their organization Topological identification of target critical nodes Predicting future states of the network Networks are NOT useful for: Final mechanistic insight Fine distinction of types of interactions between components Causality 18

19 Yu H et al. PLoS Comp Biol 2007, 3(4):e59 Hubs  High centrality, highly connected  Exert regulatory influences  Vulnerable Bottlenecks  High betweenness  Regulate information flow within network  Removal could partition network

20 20 Bottlenecks in Salmonella are essential for virulence McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180

21 21 Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks

22 Respiratory virus pathogenesis What are the causes of pathogenesis in respiratory viruses? Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS Goal: Identify and prioritize potential mediators of high- pathogenecity viral infection Approach: Mouse models of infection Transcriptomics Network-based approach Topological network analysis to define targets Validation studies

23 Ido1/Tnfrsf1b Module Kepi Module SARS-CoV-infected Wild type Mouse Inferred Network

24 Hypotheses for Validation KO Mouse Infection SurvivalDeathNegative Phenotype: Network: Altered Negative

25 Predicted targets abrogate influenza pathogenesis Tnfrsf1b (aka. Tnfr2) Predicted common regulator for influenza and SARS pathogenesis Tnf  binding Negatively regulate TNFR1 signaling, which is proinflammatory Promote endothelial cell activation/migration Activation and proliferation of immune cells 25 H5N1 infection SARS infection

26 0 5 10 -5

27 Biological Drivers in Ovarian Cancer What genomic characteristics of ovarian cancer are executed at the protein level? Can protein expression be used to identify the most important genomic changes? How can we improve the survival of women with ovarian cancer? Can proteomics provide insight into the biological processes associated with poor survival? Can we use a pathway-based approach to suggest novel therapeutic targets? 27

28 Proteomics Chemoresistance in ovarian and breast cancer Tumor samples from The Cancer Genome Atlas Depth of genomic characterization Many tumors Proteomics and phosphoproteomics characterization of these tumors Pathway/network analysis to reveal patterns and biomarkers Integrate data into single view of the system 28

29 Clustering of Proteins and Phosphoproteins Proteins iTRAQ Batch Proteomic Subtypes Transcriptomic Subtype Log2 abundance relative to universal reference pool Phosphoproteins

30 Linear regression of abundance versus days-to-death suggests possible correlations with patient survival Protein Abundance Phosphorylation (normalized to abundance) A Subset of Proteins and Phosphopeptides Correlate with Patient Survival

31 PDGFRB Pathway Correlated with short survival Correlated with long survival mRNA abundance protein abundance Not observed phosphorylation Weak correlation

32 Module 1 (short survival) Correlated with short survival Correlated with long survival Protein Phosphorylated protein mRNA AP-1 pathway NFAT TF pathway Module 2 (long survival) CD8 T cell receptor downstream pathway Il12-2 pathway Il12-STAT4 pathway Integrated Co-abundance Network for Ovarian Cancer

33 P-value 0.007 IGKV1-5 LAX1 AMPD1 IGHM SLAMF7 P-value 0.005 ATF3 DUSP1 FOSB ZFP36 Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations % survival Months survival Survival Analysis from Network Targets

34 Conclusions Several effective ways of big data integration Machine learning approaches Biological network representation Data integration Understanding of disease requires system-level views Relatively simple approaches can yield novel insight Combining different views of system can improve insight Data analysis and modeling is a starting point- not an end point 34

35 Acknowledgements SysBEP (http://www.sysbep.org) NIAID/NIH Y1-AI-8401 PI: Josh Adkins, PNNL Systems Virology (http://www.systemsvirology.org) NIAID/NIH HHSN272200800060C PI: Michael Katze, UW Clinical Proteomics Tumor Analysis Consortium NCI/NIH 1U24CA160019 PIs: Richard Smith, PNNL; Karin Rodland, PNNL Many, many people in these and other projects who helped with this work and made it possible 35

36 About Me Email: Jason.McDermott@pnnl.gov About: http://www.jasonya.com/wp/about/ Twitter: @BioDataGanache Blog: The Mad Scientist’s Confectioner’s Club http://www.jasonya.com/wp/ 36


Download ppt "Making sense of large amounts of molecular data Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest."

Similar presentations


Ads by Google