Advancing the Metagenomics Revolution Invited Talk Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society.

Slides:



Advertisements
Similar presentations
E-AIRS : e-Science Aerospace Integrated Research System Nam Gyu KIM e-Science Division KISTI SC08 PRAGMA 08 Update
Advertisements

Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
A Systems Approach to Personalized Medicine Talk and Discussion NASA Ames Mountain View, CA March 28, 2013 Dr. Larry Smarr Director, California Institute.
Three Disruptive Leadership Opportunities for Washington State to Live in the Future Keynote Talk Washington Innovation Summit: New Decade, New Partnerships,
The Emerging Global Collaboratory for Microbial Metagenomics Researchers Invited Talk Delivered From Monash University MURPA Lecture Melbourne,
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research Invited Presentation Sanford Consortium for Regenerative.
Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director,
Calit2: Past, Present, and Future University Librarians Advisory Board Luncheon Seminar UC San Diego Library January 4, 2012 Dr. Larry Smarr Director,
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences Joint Presentation UCSD School of Medicine Research Council.
Uses of the OptIPortal Presentation to the Minority Serving Institutions Cyberinfrastructure Empowerment Coalition June 10, 2010 Dr. Larry.
Calit2-Living in the Future " Keynote Sharecase 2006 University of California, San Diego March 29, 2006 Dr. Larry Smarr Director, California Institute.
Supercomputers and Supernetworks are Transforming Research Invited Talk Computing Research that Changed the World: Reflections and Perspectives Washington,
Calit2s Program in Nano-science, Nano-engineering, and Nano-medicine Invited Talk Review of Nano-cancer project April 11, 2006 Dr. Larry Smarr Director,
Bringing Mexico Into the Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber.
The OptIPlanet Collaboratory -- a Global CineGrid Testbed Invited Presentation CineGrid International Workshop 2008 December 8, 2008 Dr. Larry.
Deep Self - Quantifying the State of Your Body Invited Talk NextMed / MMVR20 San Diego February 21, 2013 Dr. Larry Smarr Director, California Institute.
Creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (a.k.a. CAMERA) Invited Talk Honoring David Kingsbury.
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Talk CONNECT Board Meeting La Jolla, CA April 26, 2006.
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers Physics Department Colloquium UC San Diego October 24, 2013 Dr. Larry Smarr Director,
Discussion Janssen La Jolla Research and Development La Jolla, CA
The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
"The OptIPuter: an IP Over Lambda Testbed" Invited Talk NREN Workshop VII: Optical Network Testbeds (ONT) NASA Ames Research Center Mountain View, CA August.
The First Year of Cal-(IT) 2 Report to The University of California Regents UCSF San Francisco, CA March 13, 2002 Dr. Larry Smarr Director, California.
DESIGNING THE MICROBIAL RESEARCH COMMONS: AN INTERNATIONAL SYMPOSIUM NATIONAL ACADEMY OF SCIENCES, WASHINGTON, DC, 8-9 OCTOBER 2009 Paul Gilna, B.Sc.,
Genomics at the Speed of Light: Understanding the Living Ocean The Gordon and Betty Moore Foundation 2nd Annual Marine Microbiology Investigator Symposium.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Center for Earth Observations and Applications Advisory Committee.
Presentation Title April 4, 2002 CAMERA- Metagenomics meets the Cyberinfrastructure David T. Kingsbury Gordon and Betty Moore Foundation BERAC - October.
“Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supercomputing, and Data Analysis” Invited Talk Delivered by Mehrdad Yazdani,
“An Integrated Science Cyberinfrastructure for Data-Intensive Research” Panel CISCO Executive Symposium San Diego, CA June 9, 2015 Dr. Larry Smarr Director,
“Quantified Self- On Being a Personal Genomic Observatory” Keynote in the “Humans as Genomic Observatories” Meeting Session in the Genomics Standards Consortium.
“Calit2: A UC Experiment for Living in the Future" Talk to UCSD Near You La Jolla, CA April 11, 2006 Dr. Larry Smarr Director, California Institute.
Developing a North American Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E.
“Comparative Human Microbiome Analysis” Remote Video Talk to CICESE Big Data, Big Network Workshop Ensenada, Mexico October 10, 2013 Dr. Larry Smarr Director,
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Cal-(IT) 2 : A Public-Private Partnership in Southern California U.S. Business Council for Sustainable Development Year-End Meeting December 11, 2003 Institute.
Introduction to Calit2 Visit by NASA Ames February 29, 2008 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology.
Innovative Research Alliances Invited Talk IUCRP Fellows Seminar UCSD La Jolla, CA July 10, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications.
-- Don Preuss NCBI/NLM/NIH
“Metagenomics Over Lambdas: Update on the CAMERA Project" Invited Talk 6 th Annual ON*VECTOR International Photonics Workshop UCSD February 27,
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
A High-Performance Campus-Scale Cyberinfrastructure For Effectively Bridging End-User Laboratories to Data-Intensive Sources Presentation by Larry Smarr.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
The Interaction of UCSD Industrial Partners, the Jacobs School of Engineering, and Cal-(IT) 2 Dr. Larry Smarr Director, California Institute for Telecommunications.
The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.
“ Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning " Talk to the UCSD Representative Assembly La Jolla, CA November 29, 2005 Dr.
“CAMERA Goes Live!" Presentation with Craig Venter National Press Club Washington, DC March 13, 2007 Dr. Larry Smarr Director, California Institute for.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
“ OptIPuter Year Five: From Research to Adoption " OptIPuter All Hands Meeting La Jolla, CA January 22, 2007 Dr. Larry Smarr Director, California.
es/by-sa/2.0/. Metagenomics Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room.
Southern California Infrastructure Philip Papadopoulos Greg Hidley.
“Genomics: The CAMERA Project" Invited Talk 5 th Annual ON*VECTOR International Photonics Workshop UCSD February 28, 2006 Dr. Larry Smarr Director,
University of Illinois at Chicago Lambda Grids and The OptIPuter Tom DeFanti.
Integrate access to advanced computational resources and high-level services (resource scheduling, automated data management) to accelerate and improve.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
“ Building an Information Infrastructure to Support Microbial Metagenomic Sciences " Presentation to the NBCR Research Advisory Committee UCSD La Jolla,
Invited Talk Metagenomics 2006 UCSD La Jolla, CA October 4, 2006
What is the Darwin Project? Goals Investigators Funding
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Lennart Johnsson Professor CSC Director, PDC
Optical SIG, SD Telecom Council
The OptIPortal, a Scalable Visualization, Storage, and Computing Termination Device for High Bandwidth Campus Bridging Presentation by Larry Smarr to.
Presentation transcript:

Advancing the Metagenomics Revolution Invited Talk Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society San Diego, CA February 2010 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD

Abstract The vast majority of life on earth is microbial. Virtually all ecologies rely on the intricate biochemistry of microbial life to sustain themselves. Historically most research on microbes depended on laboratory cultures, but since 99% of microbes cannot be cultured, it is only recently that modern genetic sequencing techniques have allowed determination of the hundreds to thousands of microbial species present at a specific environmental location. The amount of data specifying the metagenomics of these microbial ecologies is explosively growing as researchers everywhere are acquiring next generation sequencing devices. Since many genes are related across microbial species, the community needs repositories in which diverse environmental metagenomics samples can be quickly compared, both by comparing genomic data or environmental metadata. I will give a quantitative example of the computing, storage, software, and networking architecture needed to handle this exponentially growing data flood by describing the Gordon and Betty Moore Foundation funded Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) which is hosted by The CAMERA repository currently contains over 500 microbial metagenomics datasets (including Craig Venters Global Ocean Survey), as well as the full genomes of ~166 marine microbes. Registered end users, over 3000 from 70 countries, can access existing and contribute new metagenomics data either via the web or over novel dedicated 10 Gb/s light paths. The users BLAST requests transparently activate programs on dedicated and shared parallel computing resources at UCSD. To better support the CAMERA user community, we developed a new component- based cyberinfrastructure, CAMERA Version 2.0. This new cyberinfrastructure will support future needs for data acquisition, data access through diverse modalities, the addition of externally developed tools, and the orchestration of these tools into reproducible analytical pipelines. The management of remote applications and analyses is accomplished via the Kepler workflow engine which supports the natural interaction of automated computational tools that can then be re-utilized and openly shared. Finally, CAMERA 2.0 includes an effective, flexible, and intuitive user interface that facilitates and enhances the process of collaborative scientific discovery for biosciences. I will conclude by examining future trends in metagenomics data generation, data standardization, and the possible use of cloud computing and storage.

Most of Evolutionary Time Was in the Microbial World You Are Here Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences

The New Science of Metagenomics The emerging field of metagenomics, where the DNA of entire communities of microbes is studied simultaneously, presents the greatest opportunity -- perhaps since the invention of the microscope – to revolutionize understanding of the microbial world. – National Research Council March 27, 2007 NRC Report: Metagenomic data should be made publicly available in international archives as rapidly as possible.

Enormous Increase in Scale of Known Genes Over Last Decade 1995 First Microbe Genome 2007 Ocean Microbial Metagenomics 6.3 Billion Bases 5.6 Million Genes 1.8 Million Bases 1749 Genes ~3300x

PI Larry Smarr Grant Announced January 17, 2006

Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched / Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit2

Marine Genome Sequencing Project – CAMERA Anchor Dataset Launched March 13, 2007 Measuring the Genetic Diversity of Ocean Microbes Specify Ocean Data Each Sample ~2000 Microbial Species

Moore Foundation Enabled the Sequencing of the Full Genome Sequence of 155+ Marine Microbes

CAMERA Houses the Communitys Expanding Environmental Metagenomics Datasets Rapidly Expanding to Include New Community Datasets Now Releasing An Additional Dataset Per Week! March 16, 2008

Current CAMERA Interface February 19,

The CAMERA Project Has Established a Global Marine Microbial Metagenomics Cyber-Community 3387 Registered Users From Over 75 Countries

Creating CAMERA Advanced Cyberinfrastructure Service Oriented Architecture Source: CAMERA CTO Mark Ellisman

Metagenomic Data Ingestion Growing Rapidly! Number of readsNumber of base pairs CAMERA 1 st release (Mar. 2006) 8.23m8.67b CAMERA 1.3 (Dec. 2008) 13.42m12.35b CAMERA (Jul. 2009) 36.97m19.27b CAMERA * (Dec. 2009) 47.87m22.08b * All the reference datasets including newly released All NCBI Environmental Samples (ENV_NT) were not counted

Investigator submits proposal to GBMF Investigator submits metadata to CAMERA CAMERA sends acknowledgement to Investigator, Seq. Group, GBMF Seq. Group send barcoded sample kit to investigators Seq. Group Upload data to CAMERA (& Investigator) Data & Metadata Released in six months Metadata now collected before sequence data: GSC-compliant Project-ID serves as acceptance-proof Sample is Received and Sequenced Solexa and SOLiD Next! Webb Miller and Stephan C. Schuster, and Roche / 454 Genome Sequencer Prototyping a Data Acquisition Pipeline: A New Data Submission Paradigm-Metadata First! Source: Paul Gilna, Calit2

Conceptual Architecture to Physically Connect Campus Resources Using Fiber Optic Networks UCSD Storage OptIPortal Research Cluster Digital Collections Manager PetaScale Data Analysis Facility HPC System Cluster Condo UC Grid Pilot Research Instrument N x 10Gbps Source:Phil Papadopoulos, SDSC/Calit2 DNA Arrays, Mass Spec., Microscopes, Genome Sequencers

The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI), SDSC, and UIC LeadsLarry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent Now in Sixth and Final Year Scalable Adaptive Graphics Environment (SAGE)

Visual Analytics--Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome (5 Million Bases) Acidobacteria bacterium Ellin345 Soil Bacterium 5.6 Mb; ~5000 Genes Source: Raj Singh, UCSD

Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD

Use of Tiled Display Wall OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD

MITs Ed DeLong and Darwin Project Team Using OptIPortal to Analyze 10km Ocean Microbial Simulation cross-disciplinary research at MIT, connecting systems biology, microbial ecology, global biogeochemical cycles and climate

Prototyping Next Generation User Access and Analysis- Between Calit2 and U Washington Ginger Armbrusts Diatoms: Micrographs, Chromosomes, Genetic Assembly Photo Credit: Alan Decker Feb. 29, 2008 iHDTV: 1500 Mbits/sec Calit2 to UW Research Channel Over NLR

You Can Download This Presentation at lsmarr.calit2.net