“Creating a High Performance Cyberinfrastructure to Support Analysis of Illumina Metagenomic Data” DNA Day Department of Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
A Systems Approach to Personalized Medicine Talk and Discussion NASA Ames Mountain View, CA March 28, 2013 Dr. Larry Smarr Director, California Institute.
Advertisements

Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director,
Calit2-Living in the Future " Keynote Sharecase 2006 University of California, San Diego March 29, 2006 Dr. Larry Smarr Director, California Institute.
Large Memory High Performance Computing Enables Comparison Across Human Gut Microbiome of Patients with Autoimmune Diseases and Healthy Subjects XSEDE.
“Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Crohn's Disease, and Colon Cancer” USC Monthly Seminar Series Physical Sciences.
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers Physics Department Colloquium UC San Diego October 24, 2013 Dr. Larry Smarr Director,
Discussion Janssen La Jolla Research and Development La Jolla, CA
Leveraging Biomedical Big Data: Quantified Self & Beyond Invited Talk FutureMed Singularity University NASA Ames Campus February 5, 2013 Dr. Larry Smarr.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
SAN DIEGO SUPERCOMPUTER CENTER Emerging HIPAA and Protected Data Requirements for Research Computing at SDSC Ron Hawkins Director of Industry Relations.
“Advances and Breakthroughs in Computing – The Next Ten Years” Invited Talk CTO Forum San Francisco, CA November 5, 2014 Dr. Larry Smarr Director, California.
“The Systems Biology Dynamics of the Human Immune System and Gut Microbiome” Invited Talk UCI Systems Biology Seminar Series Irvine, CA October 14, 2013.
“An Integrated West Coast Science DMZ for Data-Intensive Research” Panel CENIC Annual Conference University of California, Irvine Irvine, CA March 9, 2015.
“Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each of Us” Invited Talk New Applications of Computer Analysis to Biomedical.
“Finding the Patterns in the Big Data From Human Microbiome Ecology” Invited Talk Exponential Medicine November 10, 2014 Dr. Larry Smarr Director, California.
“Introduction to UC San Diego’s Integrated Digital Infrastructure” Opening Talk IDI Showcase 2015 University of California, San Diego May 6-7, 2015 Dr.
“Personalized Medicine, Colorectal Cancer and Gut Bacteria”
“Quantifying Your Superorganism Body Using Big Data Supercomputing” Ken Kennedy Institute Distinguished Lecture Rice University Houston, TX November 12,
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Center for Earth Observations and Applications Advisory Committee.
“The Quantified Self Movement: The Technologies That Are Revolutionizing Health and Fitness” Panel Discussion MIT Enterprise Forum San Diego UC San Diego.
“Discovering the Other 90% of our Human Superorganism” Remote Video Lecture to The eResearch Australasia Conference 2014 Melbourne, Australia October 28,
“Inflammation, Gut Microbiome, Bacteriophages, and the Initiation of Colorectal Cancer” Seminar Lecture City of Hope Pasadena, CA October 20, 2014 Dr.
My N=1 Experience Pioneer Session: "N=1: Pioneers of Self-Tracking“ Panel at the Genomes, Environment, and Traits Conference Harvard Medical School Cambridge,
“Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supercomputing, and Data Analysis” Invited Talk Delivered by Mehrdad Yazdani,
“Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data Challenge” Plenary Talk 45 th Annual Meeting of the Behavior Genetics Association.
“The Digital Transformation of Healthcare”
“Big Data and Superorganism Genomics – Microbial Metagenomics Meets Human Genomics” NGS and the Future of Medicine Illumina Headquarters La Jolla, CA February.
“Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercomputing” Distinguished Lecturer Series Computer Science and Engineering.
“The Deeply Quantified Self: A Case Study” Future Technology Keynote Minimally Invasive Surgery Week 2015 Society of Laparoendoscopic Surgeons New York.
“Quantified Health and Disease” Lecture for the Osher Lifetime Learning Institute UCSD Extension Calit2’s Qualcomm Institute, UCSD La Jolla, CA February.
“Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each of Us” Invited Talk Ayasdi Menlo Park, CA December 5, 2014 Dr. Larry Smarr.
“Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health” Invited Talk Institute for Public Health University of California San.
“Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Inflammation, Crohn's Disease, and Colon Cancer” IBD Conference Speaker Series.
“An Integrated Science Cyberinfrastructure for Data-Intensive Research” Panel CISCO Executive Symposium San Diego, CA June 9, 2015 Dr. Larry Smarr Director,
“Quantified Self- On Being a Personal Genomic Observatory” Keynote in the “Humans as Genomic Observatories” Meeting Session in the Genomics Standards Consortium.
“The Human Microbiome and the Revolution in Digital Health” The Florida Institute for Human and Machine Cognition Pensacola Evening Lecture Series Pensacola,
“Using Supercomputing & Advanced Analytic Software to Discover Radical Changes in the Human Microbiome in Health and Disease” Invited Remote Presentation.
“Comparative Human Microbiome Analysis” Remote Video Talk to CICESE Big Data, Big Network Workshop Ensenada, Mexico October 10, 2013 Dr. Larry Smarr Director,
“Individual, Consumer-Driven Care of the Future -- Taking Wellness One Step Further” Closing Keynote Address The World Congress 2 nd Annual Leadership.
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
“Inspired by Carl: Exploring the Microbial Dynamics Within” Invited Talk Looking in the Right Direction: Carl Woese and the New Biology University of Illinois,
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
“How Studying Astrophysics and Coral Reefs Enabled Me to Become an Empowered, Engaged Patient” Invited Talk FutureMed at the Hotel Del Coronado, CA November.
A High-Performance Campus-Scale Cyberinfrastructure For Effectively Bridging End-User Laboratories to Data-Intensive Sources Presentation by Larry Smarr.
“Deciphering the Dynamic Coupling of the Human Immune System and the Gut Microbiome” Overview Data-Enabled Life Sciences Research (DELSA) DELSA Workshop.
“Observing the Dynamics of the Human Immune System Coupled to the Microbiome in Health and Disease” CASIS Workshop on Biomedical Research Aboard the ISS.
“Quantifying Your Superorganism Body Using Big Data Supercomputing” ACM International Workshop on Big Data in Life Sciences BigLS 2014 Newport Beach, CA.
“Assay Lab Within Your Body: Biometrics and Biomes” Invited Lecture TSensors Summit La Jolla, CA November 12, 2014 Dr. Larry Smarr Director, California.
“Discovering the Other 90% of our Human Superorganism” Remote Video Lecture to The eResearch Australasia Conference 2014 Melbourne, Australia October 28,
“Quantifying the Time Progression of the Interaction of the Human Immune System with the Gut Microbiome” Research Council Presentation UC San Diego Health.
The PRPv1 Architecture Model Panel Presentation Building the Pacific Research Platform Qualcomm Institute, Calit2 UC San Diego October 16, 2015.
“The Pacific Research Platform: a Science-Driven Big-Data Freeway System.” Big Data for Information and Communications Technologies Panel Presentation.
“CAMERA Goes Live!" Presentation with Craig Venter National Press Club Washington, DC March 13, 2007 Dr. Larry Smarr Director, California Institute for.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
Lecture Science & Entertainment Exchange National Academy of Sciences Los Angeles June 13, 2013 Dr. Larry Smarr Director, California Institute for Telecommunications.
UCSD’s Distributed Science DMZ
“Know Thyself: Quantifying Your Human Body and Its One Hundred Trillion Microbes” Understanding Cultures and Addressing Disparities in Society: Degrees.
“Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body” Weekly Bioinformatics Seminar Series UC San Diego La Jolla, CA October 17,
“Adding Consumer-Generated and Microbiome Data to the Electronic Medical Record” Using Big Data to Advance Healthcare Panel National Health Policy Conference.
“Pacific Research Platform Science Drivers” Opening Remarks PRP Science Driver PI Workshop UC Davis March 23, 2016 Dr. Larry Smarr Director, California.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
“Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us From a Sickcare System to a Healthcare System” Invited Presentation Microbiology.
Keynote Presentation Cavendish Global Health Impact Forum
“Connecting Body Time Series to Macro Body Changes”
“Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Supercomputers and Supernetworks” Invited Presentation ESnet CrossConnects Bioinformatics.
“Linking Phenotype Changes to Internal/External Longitudinal Time Series in a Single Human” Invited Presentation at EMBC ‘16 38th International Conference.
“Machine Learning in Healthcare Diagnostics”
Briefing for Dell Analytics Team Calit2’s Qualcomm Institute
Invited Presentation Machine Learning in Healthcare
Presentation transcript:

“Creating a High Performance Cyberinfrastructure to Support Analysis of Illumina Metagenomic Data” DNA Day Department of Computer Science and Engineering University of California, San Diego September 16, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD 1

The National Science Foundation Has Funded Over 100 Campuses to Build “Big Data Freeways” 134 awards, 128 projects - All but 4 states institutions

Creating a “Big Data” Plane on Campus: NSF Funded and CHeruB Phil Papadopoulos, SDSC, Calit2, PI CHERuB, Mike Norman, SDSC PI CHERuB

SDSC Big Data Compute/Storage Facility - Interconnected at Over 1 Tbps 128 COMET VM SC 2 PF 128 Gordon Big Data SC Oasis Data Store 128 Source: Philip Papadopoulos, SDSC/Calit2 Arista Router Can Switch Gps Light Paths 6000 TB > 800 Gbps # of Parallel 10Gbps Optical Light Paths 128 x 10Gbps = 1.3Tbps SDSC Supercomputers

Will Link Computational Mass Spectrometry and Genome Sequencing Cores to the Big Data Freeway ProteoSAFe: Compute-intensive discovery MS at the click of a button MassIVE: repository and identification platform for all MS data in the world Source: proteomics.ucsd.edu

IDI Enhanced Cyberinfrastructure Supporting Knight Lab FIONA 12 Cores/GPU 128 GB RAM 3.5 TB SSD 48TB Disk 10Gbps NIC Knight Lab 10Gbps Gordon Data Oasis 7.5PB, 100GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 40Gbps

The Pacific Wave Platform Creates a Regional Science-Driven “Big Data Freeway System” Source: John Hess, CENIC Funded by NSF $5M Oct Flash Disk to Flash Disk File Transfer Rate

Coupling Supercomputing to Illumina Metagenomics Sequencing 5 Ileal Crohn’s Patients, 3 Points in Time 2 Ulcerative Colitis Patients, 6 Points in Time “Healthy” Individuals Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD Total of 27 Billion Reads Or 2.7 Trillion Bases Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 7 Points in Time Each Sample Has Million Illumina Short Reads (100 bases) Larry Smarr (Colonic Crohn’s)

We Created a Reference Database Of Known Gut Genomes NCBI April 2013 –2471 Complete Draft Bacteria & Archaea Genomes –2399 Complete Virus Genomes –26 Complete Fungi Genomes –309 HMP Eukaryote Reference Genomes Total 10,741 genomes, ~30 GB of sequences Now to Align Our 27 Billion Reads Against the Reference Database Source: Weizhong Li, Sitao Wu, CRBS, UCSD

Computational NextGen Sequencing Pipeline: From Sequence to Taxonomy and Function PI: (Weizhong Li, CRBS, UCSD): NIH R01HG ( , $1.1M)

To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Source: Weizhong Li, UCSD Our Team Used 25 CPU-years to Compute Comparative Gut Microbiomes Starting From 2.7 Trillion DNA Bases of My Samples and Healthy and IBD Controls Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer

Next Step Programmability, Scalability and Reproducibility using bioKeplerwww.kepler-project.org National Resources (Gordon) (Comet) (Stampede) (Lonestar) Cloud Resources Optimized Local Cluster Resources Source: Ilkay Altintas, SDSC

We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Two Forms of IBD Most Common Microbial Phyla Average HE Average Ulcerative ColitisAverage LS Average Crohn’s Disease Collapse of Bacteroidetes Explosion of Actinobacteria Explosion of Proteobacteria Hybrid of UC and CD High Level of Archaea

Our Relative Abundance Results Across ~300 People Reveal Potential Diagnostic Species UC 100x Healthy UC 100x CD We Produced Similar Results for ~2500 Microbial Species Healthy 100x CD

Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software Healthy Ulcerative Colitis Colonic Crohn’s Ileal Crohn’s

I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Healthy Ileal Crohn’s Seven Time Samples Over 1.5 Years Colonic Crohn’s

Time Series Reveals Oscillations in Immune Biomarkers Associated with Time Progression of Autoimmune Disease Immune & Inflammation Variables Weekly Symptoms Pharma Therapies Stool Samples

UC San Diego Will Be Carrying Out a Major Clinical Study of IBD Using These Techniques Inflammatory Bowel Disease Biobank For Healthy and Disease Patients Drs. William J. Sandborn, John Chang, & Brigid Boland UCSD School of Medicine, Division of Gastroenterology Over 200 Enrolled Announced November 7, 2014

Next Step Knight/Smarr Lab Collaboration Smarr Gut Microbiome Time Series –From 7 to 50 Times Over Four Years Healthy Human Microbiome –Use 255+ Raw Reads from NIH Human Microbiome Project IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 –50 Carefully Phenotyped Patients Drawn from Sandborn BioBank –43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients, Illumina Reagent Grant Key –Enables Deep Metagenomic (and 16S) Sequencing at IGM of Smarr + Sandborn Samples New Software Suite from Knight Lab –Major Re-annotation of Reference Genomes, Functional and Taxonomic Variations –Novel Assembly Algorithms from Pavel Pevzner-Very Computationally Intensive –See Talk Later This Morning Supercomputer Grant On SDSC Comet (Awarded from XSEDE) –From 25 Gordon to 100 Comet Core-Years –Each Comet Core 40GF Peak=2x Gordon Core: 8X Increase in Compute