High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research Invited Presentation Sanford Consortium for Regenerative.

Slides:



Advertisements
Similar presentations
Grids and Biology: A Natural and Happy Pairing Rick Stevens Director, Mathematics and Computer Science Division Argonne National Laboratory Professor,
Advertisements

OptIPuter Goal: Removing Bandwidth Barriers to e-Science ATLAS Sloan Digital Sky Survey LHC ALMA.
High Performance Cyberinfrastructure Is Required for the Era of Big Data Opening Workshop Presentation Whither Science in Mexico: an Analysis for Action.
Three Disruptive Leadership Opportunities for Washington State to Live in the Future Keynote Talk Washington Innovation Summit: New Decade, New Partnerships,
Health Sciences Driving UCSD Research Cyberinfrastructure Invited Talk UCSD Health Sciences Faculty Council UC San Diego April 3, 2012 Dr. Larry Smarr.
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters, Grids, and Clouds Invited Keynote Presentation 11 th IEEE/ACM International.
“End-to-end Optical Fiber Cyberinfrastructure for Data-Intensive Research: Implications for Your Campus” Featured Speaker EDUCAUSE 2010 Anaheim Convention.
Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director,
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research Seminar Presentation Princeton Institute for Computational.
Calit2: Past, Present, and Future University Librarians Advisory Board Luncheon Seminar UC San Diego Library January 4, 2012 Dr. Larry Smarr Director,
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences Joint Presentation UCSD School of Medicine Research Council.
The Strongly Coupled LambdaCloud Tour TTI-Vanguard February 20, 2009 Dr. Larry Smarr Director, California Institute for Telecommunications.
High Performance Cyberinfrastructure Required for Data Intensive Scientific Research Invited Presentation National Science Foundation Advisory Committee.
Uses of the OptIPortal Presentation to the Minority Serving Institutions Cyberinfrastructure Empowerment Coalition June 10, 2010 Dr. Larry.
Creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (a.k.a. CAMERA) Invited Talk Honoring David Kingsbury.
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globally Networked World Keynote Presentation Sequencing Data Storage and Management.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
Supercomputing Institute for Advanced Computational Research © 2009 Regents of the University of Minnesota. All rights reserved. The Minnesota Supercomputing.
Xsede eXtreme Science and Engineering Discovery Environment Ron Perrott University of Oxford 1.
Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
The UCSD/Calit2 NSF GreenLight MRI Tom DeFanti, PI.
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. GridScaler™ Overview Vic Cornell Application Support Consultant.
SAN DIEGO SUPERCOMPUTER CENTER Emerging HIPAA and Protected Data Requirements for Research Computing at SDSC Ron Hawkins Director of Industry Relations.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Gordon: NSF Flash-based System for Data-intensive Science Mahidhar Tatineni 37.
SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)
PRISM: High-Capacity Networks that Augment Campus’ General Utility Production Infrastructure Philip Papadopoulos, PhD. Calit2 and SDSC.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political, and Economic Presentation by Larry Smarr to the NSF Campus Bridging Workshop.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Presentation for the 7th ITU Symposium on ICTs, the Environment and Climate Change Greening ICT Infrastructures Session 5/30/12 Dr. Gregory Hidley California.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
and beyond Office of Vice President for Information Technology.
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
Source: Jim Dolgonas, CENIC CENIC is Removing the Inter-Campus Barriers in California ~ $14M Invested in Upgrade Now Campuses Need to Upgrade.
“An Integrated Science Cyberinfrastructure for Data-Intensive Research” Panel CISCO Executive Symposium San Diego, CA June 9, 2015 Dr. Larry Smarr Director,
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Michael L. Norman Principal Investigator Interim Director, SDSC Allan Snavely.
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
A High-Performance Campus-Scale Cyberinfrastructure For Effectively Bridging End-User Laboratories to Data-Intensive Sources Presentation by Larry Smarr.
Information Technology Infrastructure Committee (ITIC) Report to the NAC March 8, 2012 Larry Smarr Chair ITIC.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
Project GreenLight Overview Thomas DeFanti Full Research Scientist and Distinguished Professor Emeritus California Institute for Telecommunications and.
The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
Tackling I/O Issues 1 David Race 16 March 2010.
Southern California Infrastructure Philip Papadopoulos Greg Hidley.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
ChinaGrid: National Education and Research Infrastructure Hai Jin Huazhong University of Science and Technology
Organizations Are Embracing New Opportunities
What is HPC? High Performance Computing (HPC)
Clouds , Grids and Clusters
Grid Computing.
TeraScale Supernova Initiative
The OptIPortal, a Scalable Visualization, Storage, and Computing Termination Device for High Bandwidth Campus Bridging Presentation by Larry Smarr to.
Presentation transcript:

High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research Invited Presentation Sanford Consortium for Regenerative Medicine Salk Institute, La Jolla Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2 May 13,

Academic Research OptIPlanet Collaboratory: A 10Gbps End-to-End Lightpath Cloud National LambdaRail Campus Optical Switch Data Repositories & Clusters HPC HD/4k Video Repositories End User OptIPortal 10G Lightpaths HD/4k Live Video Local or Remote Instruments

Blueprint for the Digital University--Report of the UCSD Research Cyberinfrastructure Design Team A Five Year Process Begins Pilot Deployment This Year research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf No Data Bottlenecks --Design for Gigabit/s Data Flows April 2009

UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage Source: Philip Papadopoulos, SDSC, UCSD OptIPortal Tiled Display Wall Campus Lab Cluster Digital Data Collections N x 10Gb/s Triton – Petascale Data Analysis Gordon – HPD System Cluster Condo WAN 10Gb: CENIC, NLR, I2 Scientific Instruments DataOasis (Central) Storage GreenLight Data Center

SDSC Large Memory Nodes 256/512 GB/sys 8TB Total 128 GB/sec ~ 9 TF x28 SDSC Shared Resource Cluster 24 GB/Node 6TB Total 256 GB/sec ~ 20 TF x256 UCSD Research Labs SDSC Data Oasis Large Scale Storage 2 PB 50 GB/sec 3000 – 6000 disks Phase 0: 1/3 PB, 8GB/s Moving to Shared Enterprise Data Storage & Analysis Resources: SDSC Triton Resource & Calit2 GreenLight Campus Research Network Calit2 GreenLight N x 10Gb/s Source: Philip Papadopoulos, SDSC, UCSD

NCMIRs Integrated Infrastructure of Shared Resources Source: Steve Peltier, NCMIR Local SOM Infrastructure Scientific Instruments End User Workstations Shared Infrastructure

The GreenLight Project: Instrumenting the Energy Cost of Computational Science Focus on 5 Communities with At-Scale Computing Needs: –Metagenomics –Ocean Observing –Microscopy –Bioinformatics –Digital Media Measure, Monitor, & Web Publish Real-Time Sensor Outputs –Via Service-oriented Architectures –Allow Researchers Anywhere To Study Computing Energy Cost –Enable Scientists To Explore Tactics For Maximizing Work/Watt Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing Source: Tom DeFanti, Calit2; GreenLight PI

Next Generation Genome Sequencers Produce Large Data Sets Source: Chris Misleh, SOM

The Growing Sequencing Data Load Runs over RCI Connecting GreenLight and Triton Data from the Sequencers Stored in GreenLight SOM Data Center –Data Center Contains Cisco Catalyst 6509-connected to Campus RCI at 2 x 10Gb. –Attached to the Cisco Catalyst is a 48 x 1Gb switch and an Arista 7148 switch which has 48 x 10Gb ports. –The two Sun Disks connect directly to the Arista switch for 10Gb connectivity. With our current configuration of two Illumina GAIIx, one GAII, and one HiSeq 2000, we can produce a maximum of 3TB of data per week. Processing uses a combination of local compute nodes and the Triton resource at SDSC. –Triton comes in particularly handy when we need to run 30 seqmap/blat/blast jobs. On a standard desktop computer this analysis could take several weeks. On Triton, we have the ability submit these jobs in parallel and complete computation in a fraction of the time. Typically within a day. In the coming months we will be transitioning another lab to the 10Gbit Arista switch. In total we will have 6 Sun Disks connected at 10Gbit speed, and mounted via NFS directly on the Triton resource.. The new PacBio RS is scheduled to arrive in May, which will also utilize the Campus RCI in Leichtag and the SOM GreenLight Data Center. Source: Chris Misleh, SOM

Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis

Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server 512 Processors ~5 Teraflops ~ 200 Terabytes Storage 1GbE and 10GbE Switched / Routed Core ~200TB Sun X4500 Storage 10GbE Source: Phil Papadopoulos, SDSC, Calit Users From 90 Countries

UCSD CI Features Kepler Workflow Technologies Fully Integrated UCSD CI Manages the End-to-End Lifecycle of Massive Data from Instruments to Analysis to Archival

NSF Funds a Data-Intensive Track 2 Supercomputer: SDSCs Gordon-Coming Summer 2011 Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW –Emphasizes MEM and IOPS over FLOPS –Supernode has Virtual Shared Memory: –2 TB RAM Aggregate –8 TB SSD Aggregate –Total Machine = 32 Supernodes –4 PB Disk Parallel File System >100 GB/s I/O System Designed to Accelerate Access to Massive Data Bases being Generated in Many Fields of Science, Engineering, Medicine, and Social Science Source: Mike Norman, Allan Snavely SDSC

Data Mining Applications will Benefit from Gordon De Novo Genome Assembly from Sequencer Reads & Analysis of Galaxies from Cosmological Simulations & Observations Will Benefit from Large Shared Memory Federations of Databases & Interaction Network Analysis for Drug Discovery, Social Science, Biology, Epidemiology, Etc. Will Benefit from Low Latency I/O from Flash Source: Mike Norman, SDSC

IF Your Data is Remote, Your Network Better be Fat Data Oasis (100GB/sec) OptIPuter Quartzite Research 10GbE Network OptIPuter Partner Labs 50 Gbit/s (6GB/sec) Campus Production Research Network Campus Labs 20 Gbit/s (2.5 GB/sec) 10 Gbit/sec = ~20 Minutes 10 Mbit/sec = ~10 Days >10 Gbit/s each 1 or 10 Gbit/s each

Calit2 Sunlight OptIPuter Exchange Contains Quartzite Maxine Brown, EVL, UIC OptIPuter Project Manager

Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable $80K/port Chiaro (60 Max) $ 5K Force 10 (40 max) $ 500 Arista 48 ports ~$1000 (300+ Max) $ 400 Arista 48 ports Port Pricing is Falling Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects Source: Philip Papadopoulos, SDSC/Calit2

10G Switched Data Analysis Resource: SDSCs Data Oasis – Scaled Performance 2 12 OptIPuter 32 Co-Lo UCSD RCI CENIC/ NLR Trestles 100 TF 8 Dash 128 Gordon Oasis Procurement (RFP) Phase0: > 8GB/s Sustained Today Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012) Source: Philip Papadopoulos, SDSC/Calit2 Triton 32 Radical Change Enabled by Arista G Switch G Capable 8 Existing Commodity Storage 1/3 PB 2000 TB > 50 GB/s 10Gbps

Data Oasis – 3 Different Types of Storage HPC Storage (Lustre-Based PFS) Purpose: Transient Storage to Support HPC, HPD, and Visualization Access Mechanisms: Lustre Parallel File System Client Project (Traditional File Server) Storage Purpose: Typical Project / User Storage Needs Access Mechanisms: NFS/CIFS Network Drives Cloud Storage Purpose: Long-Term Storage of Data that will be Infrequently Accessed Access Mechanisms: S3 interfaces, DropBox-esq web interface, CommVault

Campus Now Starting RCI Pilot (