“Building a Regional 100G Collaboration Infrastructure” Keynote Presentation CineGrid International Workshop 2015 Calit2’s Qualcomm Institute University.

Slides:



Advertisements
Similar presentations
Calit2-Living in the Future " Keynote Sharecase 2006 University of California, San Diego March 29, 2006 Dr. Larry Smarr Director, California Institute.
Advertisements

Bringing Mexico Into the Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
The First Year of Cal-(IT) 2 Report to The University of California Regents UCSF San Francisco, CA March 13, 2002 Dr. Larry Smarr Director, California.
“A California-Wide Cyberinfrastructure for Data-Intensive Research” Invited Presentation CENIC Annual Retreat Santa Rosa, CA July 22, 2014 Dr. Larry Smarr.
“Advances and Breakthroughs in Computing – The Next Ten Years” Invited Talk CTO Forum San Francisco, CA November 5, 2014 Dr. Larry Smarr Director, California.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
PRISM: High-Capacity Networks that Augment Campus’ General Utility Production Infrastructure Philip Papadopoulos, PhD. Calit2 and SDSC.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
“An Integrated West Coast Science DMZ for Data-Intensive Research” Panel CENIC Annual Conference University of California, Irvine Irvine, CA March 9, 2015.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
“Introduction to UC San Diego’s Integrated Digital Infrastructure” Opening Talk IDI Showcase 2015 University of California, San Diego May 6-7, 2015 Dr.
An Introduction to the Open Science Data Cloud Heidi Alvarez Florida International University Robert L. Grossman University of Chicago Open Cloud Consortium.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Why Optical Networks Will Become the 21 st Century Driver Scientific American, January 2001 Number of Years Performance per Dollar Spent Data Storage.
“An Integrated Science Cyberinfrastructure for Data-Intensive Research” Panel CISCO Executive Symposium San Diego, CA June 9, 2015 Dr. Larry Smarr Director,
“Creating a High Performance Cyberinfrastructure to Support Analysis of Illumina Metagenomic Data” DNA Day Department of Computer Science and Engineering.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Developing a North American Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E.
“Comparative Human Microbiome Analysis” Remote Video Talk to CICESE Big Data, Big Network Workshop Ensenada, Mexico October 10, 2013 Dr. Larry Smarr Director,
“The Pacific Research Platform: a Science-Driven Big-Data Freeway System.” Invited Presentation 2015 Campus Cyberinfrastructure PI Workshop Austin, TX.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Cal-(IT) 2 : A Public-Private Partnership in Southern California U.S. Business Council for Sustainable Development Year-End Meeting December 11, 2003 Institute.
Chicago/National/International OptIPuter Infrastructure Tom DeFanti OptIPuter Co-PI Distinguished Professor of Computer Science Director, Electronic Visualization.
Innovative Research Alliances Invited Talk IUCRP Fellows Seminar UCSD La Jolla, CA July 10, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications.
A Wide Range of Scientific Disciplines Will Require a Common Infrastructure Example--Two e-Science Grand Challenges –NSF’s EarthScope—US Array –NIH’s Biomedical.
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
Russ Hobby Program Manager Internet2 Cyberinfrastructure Architect UC Davis.
A High-Performance Campus-Scale Cyberinfrastructure For Effectively Bridging End-User Laboratories to Data-Intensive Sources Presentation by Larry Smarr.
Project GreenLight Overview Thomas DeFanti Full Research Scientist and Distinguished Professor Emeritus California Institute for Telecommunications and.
The Interaction of UCSD Industrial Partners, the Jacobs School of Engineering, and Cal-(IT) 2 Dr. Larry Smarr Director, California Institute for Telecommunications.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
“The Pacific Research Platform: a Science-Driven Big-Data Freeway System.” Opening Presentation Pacific Research Platform Workshop Calit2’s Qualcomm Institute.
Ocean Sciences Cyberinfrastructure Futures Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
The PRPv1 Architecture Model Panel Presentation Building the Pacific Research Platform Qualcomm Institute, Calit2 UC San Diego October 16, 2015.
“The Pacific Research Platform: a Science-Driven Big-Data Freeway System.” Big Data for Information and Communications Technologies Panel Presentation.
“ Ultra-Broadband and Peta-Scale Collaboration Opportunities Between UC and Canada Summary Talk Canada - California Strategic Innovation Partnership Summit.
“CAMERA Goes Live!" Presentation with Craig Venter National Press Club Washington, DC March 13, 2007 Dr. Larry Smarr Director, California Institute for.
Slide 1 UCSC 100 Gbps Science DMZ – 1 year 9 month Update Brad Smith & Mary Doyle.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
“ OptIPuter Year Five: From Research to Adoption " OptIPuter All Hands Meeting La Jolla, CA January 22, 2007 Dr. Larry Smarr Director, California.
UCSD’s Distributed Science DMZ
Advanced research and education networking in the United States: the Internet2 experience Heather Boyles Director, Member and Partner Relations Internet2.
“The Pacific Research Platform”
Southern California Infrastructure Philip Papadopoulos Greg Hidley.
University of Illinois at Chicago Lambda Grids and The OptIPuter Tom DeFanti.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
“Pacific Research Platform Science Drivers” Opening Remarks PRP Science Driver PI Workshop UC Davis March 23, 2016 Dr. Larry Smarr Director, California.
“The Pacific Research Platform” Opening Keynote Lecture 15th Annual ON*VECTOR International Photonics Workshop Calit2’s Qualcomm Institute University of.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
“The Pacific Research Platform Two Years In”
Thomas Hutton – SDSC/Cailit2, University of California, San Diego
Volunteer Computing for Science Gateways
“A National Big Data Cyberinfrastructure Supporting Computational Biomedical Research” Invited Presentation Symposium on Computational Biology and Bioinformatics:
Optical SIG, SD Telecom Council
The OptIPortal, a Scalable Visualization, Storage, and Computing Termination Device for High Bandwidth Campus Bridging Presentation by Larry Smarr to.
Presentation transcript:

“Building a Regional 100G Collaboration Infrastructure” Keynote Presentation CineGrid International Workshop 2015 Calit2’s Qualcomm Institute University of California, San Diego December 11, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD 1

Vision: Creating a West Coast “Big Data Freeway” Connected by CENIC/Pacific Wave to Internet2 & GLIF Use Lightpaths to Connect All Data Generators and Consumers, Creating a “Big Data” Freeway Integrated With High Performance Global Networks “The Bisection Bandwidth of a Cluster Interconnect, but Deployed on a 20-Campus Scale.” This Vision Has Been Building for 25 Years

Interactive Supercomputing End-to-End Prototype: Using Analog Communications to Prototype the Fiber Optic Future “We’re using satellite technology… to demo what It might be like to have high-speed fiber-optic links between advanced computers in two different geographic locations.” ―Al Gore, Senator Chair, US Senate Subcommittee on Science, Technology and Space Illinois Boston SIGGRAPH 1989 “What we really have to do is eliminate distance between individuals who want to interact with other people and with other computers.” ―Larry Smarr, Director, NCSA

NSF’s OptIPuter Project: Using Supernetworks to Meet the Needs of Data-Intensive Researchers OptIPortal– Termination Device for the OptIPuter Global Backplane Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13,500,000 In August 2003, Jason Leigh and his students used RBUDP to blast data from NCSA to SDSC over the TeraGrid DTFnet, achieving18Gbps file transfer out of the available 20Gbps LS Slide 2005

Integrated “OptIPlatform” Cyberinfrastructure System: A 10Gbps Lightpath Cloud National LambdaRail Campus Optical Switch Data Repositories & Clusters HPC HD/4k Video Images HD/4k Video Cams End User OptIPortal 10G Lightpath HD/4k Telepresence Instruments LS 2009 Slide

So Why Don’t We Have a National Big Data Cyberinfrastructure? “Research is being stalled by ‘information overload,’ Mr. Bement said, because data from digital instruments are piling up far faster than researchers can study. In particular, he said, campus networks need to be improved. High-speed data lines crossing the nation are the equivalent of six-lane superhighways, he said. But networks at colleges and universities are not so capable. “Those massive conduits are reduced to two-lane roads at most college and university campuses,” he said. Improving cyberinfrastructure, he said, “will transform the capabilities of campus-based scientists.” -- Arden Bement, the director of the National Science Foundation May 2005

DOE ESnet’s Science DMZ: A Scalable Network Design Model for Optimizing Science Data Transfers A Science DMZ integrates 4 key concepts into a unified whole: –A network architecture designed for high-performance applications, with the science network distinct from the general-purpose network –The use of dedicated systems for data transfer –Performance measurement and network testing systems that are regularly used to characterize and troubleshoot the network –Security policies and enforcement mechanisms that are tailored for high performance science environments Science DMZ Coined 2010 The DOE ESnet Science DMZ and the NSF “Campus Bridging” Taskforce Report Formed the Basis for the NSF Campus Cyberinfrastructure Network Infrastructure and Engineering (CC-NIE) Program

Creating a “Big Data” Freeway on Campus: NSF-Funded CC-NIE Grants and CHeruB Phil Papadopoulos, SDSC, Calit2, PI ( ) CHERuB, Mike Norman, SDSC PI CHERuB

A UCSD Integrated Digital Infrastructure Project for Big Data Requirements of Rob Knight’s Lab – PRP Does This on a Sub-National Scale FIONA 12 Cores/GPU 128 GB RAM 3.5 TB SSD 48TB Disk 10Gbps NIC Knight Lab 10Gbps Gordon Data Oasis 7.5PB, 200GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 40Gbps 1.3Tbps

Based on Community Input and on ESnet’s Science DMZ Concept, NSF Has Funded Over 100 Campuses to Build Local Big Data Freeways Red 2012 CC-NIE Awardees Yellow 2013 CC-NIE Awardees Green 2014 CC*IIE Awardees Blue 2015 CC*DNI Awardees Purple Multiple Time Awardees Source: NSF

The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Freeway System” NSF CC*DNI Grant $5M 10/ /2020 PI: Larry Smarr, UC San Diego Calit2 Co-Pis: Camille Crittenden, UC Berkeley CITRIS, Tom DeFanti, UC San Diego Calit2, Philip Papadopoulos, UC San Diego SDSC, Frank Wuerthwein, UC San Diego Physics and SDSC

What About the Cloud? PRP Connects with the 2 NSF Experimental Cloud Grants –Chameleon Through Chicago –CloudLab Through Clemson CENIC/PW Has Multiple 10Gbps into Amazon Web Services –First 10Gbps Connection 5-10 Years Ago –Today, Seven 10Gbps Paths Plus a 100Gbps Path –Peak Usage is <10% –Lots of Room for Experimenting with Big Data –Interest from Microsoft and Google as well Clouds Useful for Lots of Small Data No Business Model for Small Amounts of Really Big Data Also Very High Financial Barriers to Exit

PRP Allows for Multiple Secure Independent Cooperating Research Groups Any Particular Science Driver is Comprised of Scientists and Resources at a Subset of Campuses and Resource Centers We Term These Science Teams with the Resources and Instruments they Access as Cooperating Research Groups (CRGs). Members of a Specific CRG Trust One Another, But They Do Not Necessarily Trust Other CRGs

FIONA – Flash I/O Network Appliance: Linux PCs Optimized for Big Data UCOP Rack-Mount Build: FIONAs Are Science DMZ Data Transfer Nodes & Optical Network Termination Devices UCSD CC-NIE Prism Award & UCOP Phil Papadopoulos & Tom DeFanti Joe Keefe & John Graham Cost$8,000$20,000 Intel Xeon Haswell Multicore E v3 6-Core 2x E v3 14-Core RAM128 GB256 GB SSDSATA 3.8 TB Network Interface10/40GbE Mellanox 2x40GbE Chelsio+Mellanox GPUNVIDIA Tesla K80 RAID Drives 0 to 112TB (add ~$100/TB) John Graham, Calit2’s QI

FIONAs as Uniform DTN End Points Existing DTNs As of October 2015 FIONA DTNs UC FIONAs Funded by UCOP “Momentum” Grant

Ten Week Sprint to Demonstrate the West Coast Big Data Freeway System: PRPv0 Presented at CENIC 2015 March 9, 2015 FIONA DTNs Now Deployed to All UC Campuses And Most PRP Sites

PRP Timeline PRPv1 (Years 1 and 2) –A Layer 3 System –Completed In 2 Years –Tested, Measured, Optimized, With Multi-domain Science Data –Bring Many Of Our Science Teams Up –Each Community Thus Will Have Its Own Certificate-Based Access To its Specific Federated Data Infrastructure. PRPv2 (Years 3 to 5) –Advanced IPv6-Only Version with Robust Security Features –e.g. Trusted Platform Module Hardware and SDN/SDX Software –Support Rates up to 100Gb/s in Bursts And Streams –Develop Means to Operate a Shared Federation of Caches

Why is PRPv1 Layer 3 Instead of Layer 2 like PRPv0? In the OptIPuter Timeframe, with Rare Exceptions, Routers Could Not Route at 10Gbps, But Could Switch at 10Gbps. Hence for Performance, L2 was Preferred. Today Routers Can Route at 100Gps Without Performance Degradation. Our Prism Arista Switch Routes at 40Gbps Without Dropping Packets or Impacting Performance. The Biggest Advantage of L3 is Scalability via Information Hiding. Details of the End- to-End Pathways are Not Needed, Simplifying the Workload of the Engineering Staff. Also Advantage of L3 is Engineered Path Redundancy within the Transport Network. Thus, a 100Gbps Routed Layer3 Backbone Architecture Has Many Advantages: –A Routed Layer3 Architecture Allows the Backbone to Stay Simple - Big, Fast, and Clean. –Campuses can Use the Connection Without Significant Effort and Complexity on the End Hosts. –Network Operators do not Need to Focus on Getting Layer 2 to Work and Later Diagnosing End-to- End Problems with Less Than Good Visibility. –This Leaves Us Free to Focus on the Applications on The Edges, on The Science Outcomes, and Less on The Backbone Network Itself. These points from Eli Dart, John Hess, Phil Papadopoulos, Ron Johnson, and others

Pacific Research Platform Multi-Campus Science Driver Teams Jupyter Hub Biomedical –Cancer Genomics Hub/Browser –Microbiome and Integrative ‘Omics –Integrative Structural Biology Earth Sciences –Data Analysis and Simulation for Earthquakes and Natural Disasters –Climate Modeling: NCAR/UCAR –California/Nevada Regional Climate Data Analysis –CO2 Subsurface Modeling Particle Physics Astronomy and Astrophysics –Telescope Surveys –Galaxy Evolution –Gravitational Wave Astronomy Scalable Visualization, Virtual Reality, and Ultra-Resolution Video 19

PRP First Application: Distributed IPython/Jupyter Notebooks: Cross-Platform, Browser-Based Application Interleaves Code, Text, & Images IJulia IHaskell IFSharp IRuby IGo IScala IMathics Ialdor LuaJIT/Torch Lua Kernel IRKernel (for the R language) IErlang IOCaml IForth IPerl IPerl6 Ioctave Calico Project kernels implemented in Mono, including Java, IronPython, Boo, Logo, BASIC, and many others IScilab IMatlab ICSharp Bash Clojure Kernel Hy Kernel Redis Kernel jove, a kernel for io.js IJavascript Calysto Scheme Calysto Processing idl_kernel Mochi Kernel Lua (used in Splash) Spark Kernel Skulpt Python Kernel MetaKernel Bash MetaKernel Python Brython Kernel IVisual VPython Kernel Source: John Graham, QI

PRP Has Deployed Powerful FIONA Servers at UCSD and UC Berkeley to Create a UC-Jupyter Hub 40Gbps Backplane FIONAs Have GPUs and Can Spawn Jobs to SDSC’s Comet Using inCommon CILogon Authenticator Module for Jupyter. Deep Learning Libraries Have Been Installed And Run on Applications Source: John Graham, QI Jupyter Hub FIONA: 2 x 14-core CPUs 256GB RAM 1.2TB FLASH 3.8TB SSD Nvidia K80 GPU Dual 40GbE NICs And a Trusted Platform Module

Cancer Genomics Hub (UCSC) is Housed in SDSC CoLo: Large Data Flows to End Users at UCSC, UCB, UCSF, … 1G 8G 15G Cumulative TBs of CGH Files Downloaded Data Source: David Haussler, Brad Smith, UCSC 30 PB

Large Hadron Collider Data Researchers Across Eight California Universities Benefit From Petascale Data & Compute Resources across PRP Aggregate Petabytes of Disk Space & Petaflops of Compute Transparently Compute on Data at Their Home Institutions & Systems at SLAC, NERSC, Caltech, UCSD, SDSC SLAC Data & Compute Resource Caltech Data & Compute Resource UCSD & SDSC Data & Compute Resources UCSB UCSC UCD UCR CSU Fresno UCI Source: Frank Wuerthwein, UCSD Physics; SDSC; co-PI PRP PRP Builds on SDSC’s LHC-UC Project

Two Automated Telescope Surveys Creating Huge Datasets Will Drive PRP 300 images per night. 100MB per raw image 30GB per night 120GB per night 250 images per night. 530MB per raw image 150 GB per night 800GB per night When processed at NERSC Increased by 4x Source: Peter Nugent, Division Deputy for Scientific Engagement, LBL Professor of Astronomy, UC Berkeley Precursors to LSST and NCSA PRP Allows Researchers to Bring Datasets from NERSC to Their Local Clusters for In-Depth Science Analysis- see UCSC’s Brad Smith Talk

Dan Cayan USGS Water Resources Discipline Scripps Institution of Oceanography, UC San Diego much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues Sponsors: California Energy Commission NOAA RISA program California DWR, DOE, NSF Planning for climate change in California substantial shifts on top of already high climate variability SIO Campus Climate Researchers Need to Download Results from NCAR Remote Supercomputer Simulations to Make Regional Climate Change Forecasts

Calit2’s Qualcomm Institute Has Established a Pattern Recognition Lab Investigating Using Brain-Inspired Processors “On the drawing board are collections of 64, 256, 1024, and 4096 chips. ‘It’s only limited by money, not imagination,’ Modha says.” Source: Dr. Dharmendra Modha Founding Director, IBM Cognitive Computing Group August 8, 2014

UCSD ECE Professor Ken Kreutz-Delgado Brings the IBM TrueNorth Chip to Calit2’s Qualcomm Institute September 16, 2015

A Brain-Inspired Cyberinstrument: Pattern Recognition Co-Processors Coupled to Today’s von Neumann Processors “If we think of today’s von Neumann computers as akin to the “left-brain” —fast, symbolic, number-crunching calculators, then IBM’s TrueNorth chip can be likened to the “right-brain” —slow, sensory, pattern recognizing machines.” - Dr. Dhamendra Modha, IBM Cognitive Computing The Pattern Recognition Laboratory’s Cyberinstrument Will be a PRP Computational Resource Exploring Realtime Pattern Recognition in Streaming Media & Discovering Patterns in Massive Datasets

Collaboration Between EVL’s CAVE2 and Calit2’s VROOM Over 10Gb Wavelength EVL Calit2 Source: NTT Sponsored ON*VECTOR Workshop at Calit2 March 6, 2013

Optical Fibers Link Australian and US Big Data Researchers-Also Korea, Japan, and the Netherlands

Next Step: Use AARnet/PRP to Set Up Planetary-Scale Shared Virtual Worlds Digital Arena, UTS Sydney CAVE2, Monash U, Melbourne CAVE2, EVL, Chicago

The Pacific Research Platform Creates a Regional End-to-End Science-Driven “Big Data Freeway System” Opportunities for Collaboration with CineGrid Systems