“Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Supercomputers and Supernetworks” Invited Presentation ESnet CrossConnects Bioinformatics Conference Lawrence Berkeley National Laboratory April 12, 2016 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
Abstract To truly understand the state of the human body in health or disease, we now realize that we must consider a much more complex system than medical science considered heretofore. This is because we now know that the human body is host to 100 trillion microorganisms, ten times the number of DNAbearing cells in the human body and these microbes contain 300 times the number of DNA genes that our human DNA does. The microbial component of our “superorganism” is comprised of hundreds of species with immense biodiversity. Exponential decrease in the cost of genetic sequencing and supercomputing has enabled scientists to finally "read out" the nature of the changes in the microbial ecology in people in health and with disease. We use the fiber optic network of the Pacific Research Platform to rapidly move these large datasets. To put a more personal face on the “patient of the future,” I have been collecting massive amounts of data from my own body over the last five years, which reveals detailed examples of the episodic excursions of my coupled immunemicrobial system. As similar techniques become more widely applied, we can look forward to revolutionary changes in medical practice over the next decade.
From One to a Trillion Data Points Defining Me in 15 Years: The Exponential Rise in Body Data Microbial Genome Time Series Human Genome Improving Body Human Genome SNPs Discovering Disease Blood Biomarker Time Series Weight
My Quarterly Blood Draw As a Model for the Precision Medicine Initiative, I Have Tracked My Internal Biomarkers To Understand My Body’s Dynamics My Quarterly Blood Draw Calit2 64 Megapixel VROOM
Episodic Peaks in Inflammation Followed by Spontaneous Drops Only One of My Blood Measurements Was Far Out of Range--Indicating Chronic Inflammation 27x Upper Limit Episodic Peaks in Inflammation Followed by Spontaneous Drops Normal Range <1 mg/L Complex Reactive Protein (CRP) is a Blood Biomarker for Detecting Presence of Inflammation
Active Inflammatory Bowel Disease Adding Stool Tests Revealed Oscillatory Behavior in an Immune Variable Which is Antibacterial Typical Lactoferrin Value for Active Inflammatory Bowel Disease (IBD) 124x Upper Limit for Healthy Normal Range <7.3 µg/mL Lactoferrin is a Protein Shed from Neutrophils - An Antibacterial that Sequesters Iron
Confirming the IBD (Colonic Crohn’s) Hypothesis: Finding the “Smoking Gun” with MRI Imaging Liver I Obtained the MRI Slices From UCSD Medical Services and Converted to Interactive 3D Working With Calit2 Staff Transverse Colon Small Intestine Descending Colon Sigmoid Colon Threading Iliac Arteries Major Kink Diseased Sigmoid Colon MRI Jan 2012 Cross Section Severe Colon Wall Swelling
Why Did I Have an Autoimmune Disease like Crohn’s Disease? Despite decades of research, the etiology of Crohn's disease remains unknown. Its pathogenesis may involve a complex interplay between host genetics, immune dysfunction, and microbial or environmental factors. --The Role of Microbes in Crohn's Disease I Have Been Quantifying All Three Paul B. Eckburg & David A. Relman Clin Infect Dis. 44:256-262 (2007)
23andme is Now Collecting 10,000 IBD Patient’s SNPs I Found I Had One of the Earliest Known SNPs Associated with Crohn’s Disease From www.23andme.com Polymorphism in Interleukin-23 Receptor Gene — 80% Higher Risk of Pro-inflammatory Immune Response ATG16L1 IRGM NOD2 SNPs Associated with CD 23andme is Now Collecting 10,000 IBD Patient’s SNPs
I Reasoned That The Driver of My Gut Autoimmune Disease Was a Disturbance in My Gut Microbiome Ecology Your Body Has 10 Times As Many Microbe Cells As DNA-Bearing Human Cells 99% of Your DNA Genes Are in Microbe Cells Not Human Cells Inclusion of the “Dark Matter” of the Body Will Radically Alter Medicine
The Carl Woese Tree of Life Shows The Most Life on Earth is Bacterial Source: Carl Woese, et al (1990) Hug, et al. Nature Microbiology
The Human Gut as a Super-Evolutionary Microbial Cauldron Enormous Density 1000x Ocean Water Highly Dynamic Microbial Ecology Hundreds to Thousands of Species Horizontal Gene Transfer Phages Adaptive Selection Pressures (Immune System) Innate Immune System Adaptive Immune System Macrophages and Antimicrobial proteins Constantly Changing Environmental Pressures Diet Antibiotics Pharmaceuticals
Our Team Used 25 CPU-years to Compute Comparative Gut Microbiomes To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Source: Weizhong Li, UCSD Illumina HiSeq 2000 at JCVI Our Team Used 25 CPU-years to Compute Comparative Gut Microbiomes Starting From 2.7 Trillion DNA Bases of My Samples and Healthy and IBD Controls SDSC Gordon Data Supercomputer
We Gathered Raw Illumina Reads on 275 Humans and Generated a Time Series of My Gut Microbiome Each Sample Has 100-200 Million Illumina Short Reads (100 bases) “Healthy” Individuals Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 2 Ulcerative Colitis Patients, 6 Points in Time Larry Smarr (Colonic Crohn’s) 7 Points in Time 5 Ileal Crohn’s Patients, 3 Points in Time Total of 27 Billion Reads Or 2.7 Trillion Bases Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD
PI: (Weizhong Li, CRBS, UCSD): Computational NextGen Sequencing Pipeline: From Sequence to Taxonomy and Function PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)
Results Include Relative Abundance of Hundreds of Microbial Species Average Over 250 Healthy People From NIH Human Microbiome Project Note Log Scale Clostridium difficile
Using Microbiome Profiles to Survey 155 Subjects for Unhealthy Candidates
We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Three Forms of IBD Average HE Most Common Microbial Phyla Average Ileal Crohn’s Disease Average Ulcerative Colitis Average LS Colonic Crohn’s Disease
Time Series Reveals Oscillations in Immune Biomarkers Associated with Time Progression of Autoimmune Disease Immune & Inflammation Variables 2009 2010 2011 2012 2013 2014 2015 Weekly Symptoms Pharma Therapies Stool Samples
Larry’s 40 Stool Samples Over 3.5 Years to Rob’s lab on April 30, 2015 In 2016 We Are Extending My Stool Time Series by Collaborating with the UCSD Knight Lab Larry’s 40 Stool Samples Over 3.5 Years to Rob’s lab on April 30, 2015
Precision Medicine: Coupling Longitudinal Phenotypic Changes to Longitudinal Microbiome Evolution Larry Smarr’s Weight Over 15 Years Time Period of 16S Microbial Sequences Source: Larry Smarr, UCSD
Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy Between Two Time-Stable Equilibriums Correlated to Physical Symptoms Frequent IBD Symptoms Weight Loss 5/1/12 to 12/1/14 Blue Balls on Diagram to the Right Weekly Weight (Red Dots Stool Sample) Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right 12/1/13 to 1/1/14 12/1/13-1/1/14 Antibiotics Prednisone 1/1/12 to 5/1/12 5/1/12 Lialda & Uceris Principal Coordinate Analysis of Microbiome Ecology PCoA by Justine Debelius and Jose Navas, Knight Lab, UCSD Weight Data from Larry Smarr, Calit2, UCSD
8x Compute Resources Over Prior Study To Expand IBD Project the Knight/Smarr Labs Were Awarded ~ 1 CPU-Century Supercomputing Time Smarr Gut Microbiome Time Series From 7 Samples Over 1.5 Years To 50 Samples Over 4 Years IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 Patients 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients New Software Suite from Knight Lab Re-annotation of Reference Genomes, Functional / Taxonomic Variations Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner 8x Compute Resources Over Prior Study
Data Source: David Haussler, Brad Smith, UCSC Cancer Genomics Hub (UCSC) Demonstrates Need for SuperNetworks: Large Data Flows to End Users at UCSC, UCB, UCSF, … 1G 8G 30,000 TB Per Year 15G Jan 2016 Data Source: David Haussler, Brad Smith, UCSC
Building a UC San Diego High Performance Cyberinfrastructure to Support Distributed Integrative Omics FIONA 12 Cores/GPU 128 GB RAM 3.5 TB SSD 48TB Disk 10Gbps NIC Knight Lab 10Gbps Gordon Prism@UCSD Data Oasis 7.5PB, 200GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 1.3Tbps PRP/ 40Gbps
Based on Community Input and on ESnet’s Science DMZ Concept, NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways Red 2012 CC-NIE Awardees Yellow 2013 CC-NIE Awardees Green 2014 CC*IIE Awardees Blue 2015 CC*DNI Awardees Purple Multiple Time Awardees 2012-2015 CC-NIE / CC*IIE / CC*DNI Programs Source: NSF
Source: John Hess, CENIC The Pacific Wave Platform Creates a Regional Science-Driven “Big Data Freeway System” Funded by NSF $5M Oct 2015-2020 PI: Larry Smarr, UC San Diego Calit2 Co-PIs: Camille Crittenden, UC Berkeley CITRIS, Tom DeFanti, UC San Diego Calit2, Philip Papadopoulos, UC San Diego SDSC, Frank Wuerthwein, UC San Diego Physics and SDSC Flash Disk to Flash Disk File Transfer Rate Source: John Hess, CENIC
How Will the Quantified Consumer The Emergence of Precision or P4 Medicine -- Predictive, Preventive, Personalized, Participatory How Will the Quantified Consumer Be Integrated into Healthcare Systems? Lee Hood, Director ISB Systems Biology & Systems Medicine Consumer-Driven Social Networks P4 MEDICINE Digital Revolution Big Data
Thanks to Our Great Team! Calit2@UCSD Future Patient Team Jerry Sheehan Tom DeFanti Joe Keefe John Graham Kevin Patrick Mehrdad Yazdani Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Ernesto Ramirez JCVI Team Karen Nelson Shibu Yooseph Manolito Torralba Ayasdi Devi Ramanan Pek Lum UCSD Metagenomics Team Weizhong Li Sitao Wu SDSC Team Michael Norman Mahidhar Tatineni Robert Sinkovits Dell/R Systems Brian Kucic John Thompson UCSD Health Sciences Team David Brenner Rob Knight Lab Justine Debelius Jose Navas Gail Ackermann Greg Humphrey William J. Sandborn Lab Elisabeth Evans John Chang Brigid Boland