A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research Seminar Presentation Princeton Institute for Computational.

Slides:



Advertisements
Similar presentations
Grids and Biology: A Natural and Happy Pairing Rick Stevens Director, Mathematics and Computer Science Division Argonne National Laboratory Professor,
Advertisements

OptIPuter Goal: Removing Bandwidth Barriers to e-Science ATLAS Sloan Digital Sky Survey LHC ALMA.
The OptIPuter – Toward a Terabit LAN Talk at The ON*VECTOR Terabit LAN Workshop Hosted by Calit2 University of California, San Diego January 29, 2005 Dr.
Calit2 " Talk Nortel Visiting Team December 12, 2005 Dr. Larry Smarr Director, California Institute for Telecommunications and Information.
The Emerging Global Collaboratory for Microbial Metagenomics Researchers Invited Talk Delivered From Monash University MURPA Lecture Melbourne,
Calit2: a SoCal UC Infrastructure for Innovation Welcoming Talk Visit to Calit2 by The Trusteeship April 24, 2010 Dr. Larry Smarr Director, California.
Advancing the Metagenomics Revolution Invited Talk Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society.
Health Sciences Driving UCSD Research Cyberinfrastructure Invited Talk UCSD Health Sciences Faculty Council UC San Diego April 3, 2012 Dr. Larry Smarr.
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research Invited Presentation Sanford Consortium for Regenerative.
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters, Grids, and Clouds Invited Keynote Presentation 11 th IEEE/ACM International.
“End-to-end Optical Fiber Cyberinfrastructure for Data-Intensive Research: Implications for Your Campus” Featured Speaker EDUCAUSE 2010 Anaheim Convention.
Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director,
Calit2: Past, Present, and Future University Librarians Advisory Board Luncheon Seminar UC San Diego Library January 4, 2012 Dr. Larry Smarr Director,
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences Joint Presentation UCSD School of Medicine Research Council.
Calit2's UCSD Building – A "Living Laboratory" For The Future
The Strongly Coupled LambdaCloud Tour TTI-Vanguard February 20, 2009 Dr. Larry Smarr Director, California Institute for Telecommunications.
High Performance Cyberinfrastructure Required for Data Intensive Scientific Research Invited Presentation National Science Foundation Advisory Committee.
Uses of the OptIPortal Presentation to the Minority Serving Institutions Cyberinfrastructure Empowerment Coalition June 10, 2010 Dr. Larry.
Calit2-Living in the Future " Keynote Sharecase 2006 University of California, San Diego March 29, 2006 Dr. Larry Smarr Director, California Institute.
Supercomputers and Supernetworks are Transforming Research Invited Talk Computing Research that Changed the World: Reflections and Perspectives Washington,
Bringing Mexico Into the Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber.
The OptIPlanet Collaboratory -- a Global CineGrid Testbed Invited Presentation CineGrid International Workshop 2008 December 8, 2008 Dr. Larry.
Introduction to the UCSD Division of Calit2" Calit2 Tour NextMed / MMVR20 UC San Diego February 20, 2013 Dr. Larry Smarr Director, California.
Creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (a.k.a. CAMERA) Invited Talk Honoring David Kingsbury.
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Talk CONNECT Board Meeting La Jolla, CA April 26, 2006.
High Resolution Multimedia in a Ultra Bandwidth World After Dinner Talk IEEE ISM2005 Irvine, CA December 13, 2005 Dr. Larry Smarr Director, California.
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers Physics Department Colloquium UC San Diego October 24, 2013 Dr. Larry Smarr Director,
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
"The OptIPuter: an IP Over Lambda Testbed" Invited Talk NREN Workshop VII: Optical Network Testbeds (ONT) NASA Ames Research Center Mountain View, CA August.
The First Year of Cal-(IT) 2 Report to The University of California Regents UCSF San Francisco, CA March 13, 2002 Dr. Larry Smarr Director, California.
The OptIPuter and Its Applications Invited Talk IEEE/LEOS Summer 2009 Topicals Meeting on Future Global Networks July 27, 2009 Dr. Larry Smarr Director,
The UCSD/Calit2 NSF GreenLight MRI Tom DeFanti, PI.
AHM Overview OptIPuter Overview Third All Hands Meeting OptIPuter Project San Diego Supercomputer Center University of California, San Diego January 26,
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * Calit2 * LBNL * NICS * ORNL * SDSC Report to the Dept. of Energy Advanced Scientific.
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political, and Economic Presentation by Larry Smarr to the NSF Campus Bridging Workshop.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
HIPerSpace The Highly Interactive Parallelized Display Space.
“High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World” Invited Speaker Grand Challenges in Data-Intensive Discovery.
“How LambdaGrids are Transforming Science" Keynote iGrid2005 La Jolla, CA September 29, 2005 Dr. Larry Smarr Director, California Institute.
Science and Cyberinfrastructure in the Data-Dominated Era Symposium #1610, How Computational Science Is Tackling the Grand Challenges Facing Science and.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Center for Earth Observations and Applications Advisory Committee.
The OptIPlanet Collaboratory Supporting Researchers Worldwide Talk Australian American Leadership Dialogue January 15, 2008 Dr. Larry Smarr.
Source: Jim Dolgonas, CENIC CENIC is Removing the Inter-Campus Barriers in California ~ $14M Invested in Upgrade Now Campuses Need to Upgrade.
“An Integrated Science Cyberinfrastructure for Data-Intensive Research” Panel CISCO Executive Symposium San Diego, CA June 9, 2015 Dr. Larry Smarr Director,
Developing a North American Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E.
“Comparative Human Microbiome Analysis” Remote Video Talk to CICESE Big Data, Big Network Workshop Ensenada, Mexico October 10, 2013 Dr. Larry Smarr Director,
Cal-(IT) 2 : A Public-Private Partnership in Southern California U.S. Business Council for Sustainable Development Year-End Meeting December 11, 2003 Institute.
Introduction to Calit2 Visit by NASA Ames February 29, 2008 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology.
Innovative Research Alliances Invited Talk IUCRP Fellows Seminar UCSD La Jolla, CA July 10, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications.
“Metagenomics Over Lambdas: Update on the CAMERA Project" Invited Talk 6 th Annual ON*VECTOR International Photonics Workshop UCSD February 27,
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
A High-Performance Campus-Scale Cyberinfrastructure For Effectively Bridging End-User Laboratories to Data-Intensive Sources Presentation by Larry Smarr.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
Project GreenLight Overview Thomas DeFanti Full Research Scientist and Distinguished Professor Emeritus California Institute for Telecommunications and.
The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.
“ Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning " Talk to the UCSD Representative Assembly La Jolla, CA November 29, 2005 Dr.
“CAMERA Goes Live!" Presentation with Craig Venter National Press Club Washington, DC March 13, 2007 Dr. Larry Smarr Director, California Institute for.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
“ OptIPuter Year Five: From Research to Adoption " OptIPuter All Hands Meeting La Jolla, CA January 22, 2007 Dr. Larry Smarr Director, California.
“Genomics: The CAMERA Project" Invited Talk 5 th Annual ON*VECTOR International Photonics Workshop UCSD February 28, 2006 Dr. Larry Smarr Director,
University of Illinois at Chicago Lambda Grids and The OptIPuter Tom DeFanti.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
“ Building an Information Infrastructure to Support Microbial Metagenomic Sciences " Presentation to the NBCR Research Advisory Committee UCSD La Jolla,
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * Calit2 * LBNL * NICS * ORNL * SDSC Report to the Dept. of Energy Advanced.
The OptIPortal, a Scalable Visualization, Storage, and Computing Termination Device for High Bandwidth Campus Bridging Presentation by Larry Smarr to.
Presentation transcript:

A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research Seminar Presentation Princeton Institute for Computational Science and Engineering (PICSciE) Princeton University Princeton, NJ December 12, 2011 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD 1

Abstract Campuses are experiencing an enormous increase in the quantity of data generated by scientific instruments and computational clusters and stored in massive data repositories. The shared Internet, engineered to enable interaction with megabyte-sized data objects is not capable of dealing with the typical gigabytes to terabytes of modern scientific data. Instead, a high performance cyberinfrastructure is emerging to support data-intensive research. Fortunately, multi-channel optical fiber can support both the traditional internet and this new data utility. I will give examples of early prototypes which integrate data generation, transmission, storage, analysis, visualization, curation, and sharing, driven by applications as diverse as genomics, ocean observatories, and cosmology.

Large Data Challenge: Average Throughput to End User on Shared Internet is Mbps Transferring 1 TB: --50 Mbps = 2 Days --10 Gbps = 15 Minutes Tested December 2011

OptIPuter Solution: Give Dedicated Optical Channels to Data-Intensive Users (WDM) Source: Steve Wallach, Chiaro Networks Lambdas Parallel Lambdas are Driving Optical Networking The Way Parallel Processors Drove 1990s Computing 10 Gbps per User ~ 100x Shared Internet Throughput

The Global Lambda Integrated Facility-- Creating a Planetary-Scale High Bandwidth Collaboratory Research Innovation Labs Linked by 10G Dedicated Lambdas

Academic Research OptIPlanet Collaboratory: A 10Gbps End-to-End Lightpath Cloud National LambdaRail Campus Optical Switch Data Repositories & Clusters HPC HD/4k Video Repositories End User OptIPortal 10G Lightpaths HD/4k Live Video Local or Remote Instruments

The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data Picture Source: Mark Ellisman, David Lee, Jason Leigh Calit2 (UCSD, UCI), SDSC, and UIC LeadsLarry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent Scalable Adaptive Graphics Environment (SAGE) OptIPortal

MITs Ed DeLong and Darwin Project Team Using OptIPortal to Analyze 10km Ocean Microbial Simulation Cross-Disciplinary Research at MIT, Connecting Systems Biology, Microbial Ecology, Global Biogeochemical Cycles and Climate

AESOP Display built by Calit2 for KAUST-- King Abdullah University of Science & Technology 40-Tile 46 Diagonal Narrow-Bezel AESOP Display at KAUST Running CGLX

The Latest OptIPuter Innovation: Quickly Deployable Nearly Seamless OptIPortables 45 minute setup, 15 minute tear-down with two people (possible with one) Shipping Case Image From the Calit2 KAUST Lab

The OctIPortable Being Checked Out Prior to Shipping to the Calit2/KAUST Booth at SIGGRAPH 2011 Photo:Tom DeFanti

3D Stereo Head Tracked OptIPortal: NexCAVE Source: Tom DeFanti, Array of JVC HDTV 3D LCD Screens KAUST NexCAVE = 22.5MPixels

High Definition Video Connected OptIPortals: Virtual Working Spaces for Data Intensive Research Source: Falko Kuester, Kai Doerr Calit2; Michael Sims, Larry Edwards, Estelle Dodson NASA 10Gbps Link to NASA Ames Lunar Science Institute, Mountain View, CA NASA Supports Two Virtual Institutes LifeSize HD 2010

Blueprint for the Digital University--Report of the UCSD Research Cyberinfrastructure Design Team A Five Year Process Begins Pilot Deployment This Year research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf No Data Bottlenecks --Design for Gigabit/s Data Flows April 2009

Calit2 Sunlight OptIPuter Exchange Connects 60 Campus Sites Each Dedicated at 10Gbps Maxine Brown, EVL, UIC OptIPuter Project Manager

UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage Source: Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Display Wall Campus Lab Cluster Digital Data Collections N x 10Gb/s Triton – Petascale Data Analysis Gordon – HPD System Cluster Condo WAN 10Gb: CENIC, NLR, I2 Scientific Instruments DataOasis (Central) Storage GreenLight Data Center

NSF Funds a Big Data Supercomputer: SDSCs Gordon-Dedicated Dec. 5, 2011 Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW –Emphasizes MEM and IOPS over FLOPS –Supernode has Virtual Shared Memory: –2 TB RAM Aggregate –8 TB SSD Aggregate –Total Machine = 32 Supernodes –4 PB Disk Parallel File System >100 GB/s I/O System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science Source: Mike Norman, Allan Snavely SDSC

Gordon Bests Previous Mega I/O per Second by 25x

Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable $80K/port Chiaro (60 Max) $ 5K Force 10 (40 max) $ 500 Arista 48 ports ~$1000 (300+ Max) $ 400 Arista 48 ports Port Pricing is Falling Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects Source: Philip Papadopoulos, SDSC/Calit2

Arista Enables SDSCs Massive Parallel 10G Switched Data Analysis Resource 2 12 OptIPuter 32 Co-Lo UCSD RCI CENIC/ NLR Trestles 100 TF 8 Dash 128 Gordon Oasis Procurement (RFP) Phase0: > 8GB/s Sustained Today Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012) Source: Philip Papadopoulos, SDSC/Calit2 Triton 32 Radical Change Enabled by Arista G Switch G Capable 8 Existing Commodity Storage 1/3 PB 2000 TB > 50 GB/s 10Gbps

The Next Step for Data-Intensive Science: Pioneering the HPC Cloud

Data Oasis – 3 Different Types of Storage HPC Storage (Lustre-Based PFS) Purpose: Transient Storage to Support HPC, HPD, and Visualization Access Mechanisms: Lustre Parallel File System Client Project (Traditional File Server) Storage Purpose: Typical Project / User Storage Needs Access Mechanisms: NFS/CIFS Network Drives Cloud Storage Purpose: Long-Term Storage of Data that will be Infrequently Accessed Access Mechanisms: S3 interfaces, DropBox-esq web interface, CommVault

Examples of Applications Built on UCSD RCI DOE Remote Use of Petascale HPC Moore Foundation Microbial Metagenomics Server NSF GreenLight Instrumented Data Center NIH Next Generation Gene Sequencers NIH Shared Scientific Instruments

Exploring Cosmology With Supercomputers, Supernetworks, and Supervisualization Particle/Cell Hydrodynamic Cosmology Simulation NICS Kraken (XT5) –16,384 cores Output –148 TB Movie Output (0.25 TB/file) –80 TB Diagnostic Dumps (8 TB/file) Science: Norman, Harkness,Paschos SDSC Visualization: Insley, ANL; Wagner SDSC ANL * Calit2 * LBNL * NICS * ORNL * SDSC Intergalactic Medium on 2 GLyr Scale Source: Mike Norman, SDSC

Providing End-to-End CI for Petascale End Users Two 64K Images From a Cosmological Simulation of Galaxy Cluster Formation Mike Norman, SDSC October 10, 2008 log of gas temperature log of gas density

NICS ORNL NSF TeraGrid Kraken Cray XT5 8,256 Compute Nodes 99,072 Compute Cores 129 TB RAM simulation Argonne NL DOE Eureka 100 Dual Quad Core Xeon Servers 200 NVIDIA Quadro FX GPUs in 50 Quadro Plex S4 1U enclosures 3.2 TB RAM rendering SDSC Calit2/SDSC OptIPortal (2560 x 1600 pixel) LCD panels 10 NVIDIA Quadro FX 4600 graphics cards > 80 megapixels 10 Gb/s network throughout visualization ESnet 10 Gb/s fiber optic network *ANL * Calit2 * LBNL * NICS * ORNL * SDSC Using Supernetworks to Couple End Users OptIPortal to Remote Supercomputers and Visualization Servers Source: Mike Norman, Rick Wagner, SDSC Real-Time Interactive Volume Rendering Streamed from ANL to SDSC

Most of Evolutionary Time Was in the Microbial World You Are Here Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences Earth is a Microbial World: For Every Human Cell There are 100 Million Microbes

The New Science of Microbial Metagenomics The emerging field of metagenomics, where the DNA of entire communities of microbes is studied simultaneously, presents the greatest opportunity – perhaps since the invention of the microscope – to revolutionize understanding of the microbial world. – National Research Council March 27, 2007 NRC Report: Metagenomic data should be made publicly available in international archives as rapidly as possible.

Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched / Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit2 Grant Announced January 17, 2006

Calit2 CAMERA: Over 4000 Registered Users From Over 80 Countries Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis

Creating CAMERA Advanced Cyberinfrastructure Service Oriented Architecture Source: CAMERA CTO Mark Ellisman

The GreenLight Project: Instrumenting the Energy Cost of Computational Science Focus on 5 Communities with At-Scale Computing Needs: –Metagenomics –Ocean Observing –Microscopy –Bioinformatics –Digital Media Measure, Monitor, & Web Publish Real-Time Sensor Outputs –Via Service-oriented Architectures –Allow Researchers Anywhere To Study Computing Energy Cost –Enable Scientists To Explore Tactics For Maximizing Work/Watt Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing Source: Tom DeFanti, Calit2; GreenLight PI

GreenLight Project: Remote Visualization of Data Center

GreenLight Projects Airflow dynamics Live fan speeds Live fan speeds Airflow dynamics 34

GreenLight Project Heat Distribution Combined heat + fans Realistic correlation

Cost Per Megabase in Sequencing DNA is Falling Much Faster Than Moores Law

BGIThe Beijing Genome Institute is the Worlds Largest Genomic Institute Main Facilities in Shenzhen and Hong Kong, China –Branch Facilities in Copenhagen, Boston, UC Davis 137 Illumina HiSeq 2000 Next Generation Sequencing Systems –Each Illumina Next Gen Sequencer Generates 25 Gigabases/Day Supported by High Performance Computing and Storage –~160TF, 33TB Memory –Large-Scale (12PB) Storage

From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 in Less Than 5,000 sq. ft.! 4 Million Newborns / Year in U.S.

Needed: Interdisciplinary Teams Made From Computer Science, Data Analytics, and Genomics

Calit2 Brings Together Computer Science and Bioinformatics National Biomedical Computation Resource an NIH supported resource center

GreenLight Project Allows for Testing of Novel Architectures on Bioinformatics Algorithms Our version of MS-Alignment [a proteomics algorithm] is more than 115x faster than a single core of an Intel Nehalem processor, is more than 15x faster than an eight-core version, and reduces the runtime for a few samples from 24 hours to just a few hours. From Computational Mass Spectrometry in a Reconfigurable Coherent Co-processing Architecture, IEEE Design & Test of Computers, Yalamarthy (ECE), Coburn (CSE), Gupta (CSE), Edwards (Convey), and Kelly (Convey) (2011) June 23,

Using UCSD RCI to Store and Analyze Next Gen Sequencer Datasets Source: Chris Misleh, SOM/Calit2 UCSD Stream Data from Genomics Lab to GreenLight Storage, NFS Mount Over 10Gbps to Triton Compute Cluster

NIH National Center for Microscopy & Imaging Research Integrated Infrastructure of Shared Resources Source: Steve Peltier, Mark Ellisman, NCMIR Local SOM Infrastructure Scientific Instruments End User Workstations Shared Infrastructure

UCSD Planned Optical Networked Biomedical Researchers and Instruments Cellular & Molecular Medicine West National Center for Microscopy & Imaging Leichtag Biomedical Research Center for Molecular Genetics Pharmaceutical Sciences Building Cellular & Molecular Medicine East CryoElectron Microscopy Facility Radiology Imaging Lab Bioengineering San Diego Supercomputer Center GreenLight Data Center Connects at 10 Gbps : –Microarrays –Genome Sequencers –Mass Spectrometry –Light and Electron Microscopes –Whole Body Imagers –Computing –Storage