The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.

Slides:



Advertisements
Similar presentations
April 19, 2015 CASC Meeting 7 Sep 2011 Campus Bridging Presentation.
Advertisements

Bill Barnett, Bob Flynn & Anurag Shankar Pervasive Technology Institute and University Information Technology Services, Indiana University CASC. September.
Data Gateways for Scientific Communities Birds of a Feather (BoF) Tuesday, June 10, 2008 Craig Stewart (Indiana University) Chris Jordan.
International Network CI for Biological and Medical Research James Williams Director, International Networking University Information Technology Services.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Jetstream Overview.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Prepared for the.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Rockhopper: Penguin on Demand at Indiana.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
FutureGrid: an experimental, high-performance grid testbed Craig Stewart Executive Director, Pervasive Technology Institute Indiana University
Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network Stephen Simms Manager, High Performance File Systems.
Craig Stewart 23 July 2009 Cyberinfrastructure in research, education, and workforce development.
Goodbye from Indianapolis, IUPUI, and Craig A. Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies Indiana.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
I-Light: A Network for Collaboration between Indiana University and Purdue University Craig Stewart Associate Vice President Gary Bertoline Associate Vice.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Leveraging the National Cyberinfrastructure for Top Down Mass Spectrometry Richard LeDuc.
XSEDE12 Closing Remarks Craig Stewart XSEDE12 General Chair Executive Director, Indiana University Pervasive Technology Institute.
September 6, 2013 A HUBzero Extension for Automated Tagging Jim Mullen Advanced Biomedical IT Core Indiana University.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. The IQ-Table & Collection Viewer A.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
1 BioGrids in the US: Current status and future opportunities Craig A. Stewart 15 April 2004 Director, Research and Academic Computing Director,
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream - A self-provisioned, scalable science and.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
July 18, 2012 Campus Bridging Security Challenges from “Panel: Security for Science Gateways and Campus Bridging”
Making Campus Cyberinfrastructure Work for Your Campus Guy Almes Patrick Dreher Craig Stewart Dir. Academy for Dir. Advanced Computing Associate Dean Advanced.
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream Overview – XSEDE ’15 Panel - New and emerging.
Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu.
Research Computing Archived Presentation Title:Indiana Economic Development From Indiana Economic Development Corporation to Indiana and Purdue.
INDIANAUNIVERSITYINDIANAUNIVERSITY Spring 2000 Indiana University Information Technology University Information Technology Services Please cite as: Stewart,
November 18, 2015 Quarterly Meeting 30Aug2011 – 1Sep2011 Campus Bridging Presentation.
February 27, 2007 University Information Technology Services Research Computing Craig A. Stewart Associate Vice President, Research Computing Chief Operating.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Craig Stewart ORCID ID Jetstream Principal Investigator Executive Director, Indiana University Pervasive Technology Institute 30 September.
UITS Research Technologies – Services Available to Regenstrief Institute 13 Oct 2015 Craig Stewart ORCID ID Executive Director, Indiana.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI
Recent key achievements in research computing at IU Craig Stewart Associate Vice President, Research & Academic Computing Chief Operating Officer, Pervasive.
The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers. William Barnett, Thomas.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Update on EAGER: Best Practices and.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
Jetstream: A new national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)
Craig Stewart ORCID ID Jetstream Principal Investigator Executive Director, Indiana University Pervasive Technology Institute Presented.
1 A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Informatics Tools at the Indiana CTSI.
Jetstream Overview Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor,
1 Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration and.
Research & Academic Computing Indiana University Statewide IT Conference 11 September 2003 Indianapolis IN.
New Ventures in Research, Engineering, and Educational Computing
Jetstream: A science & engineering cloud Mike Lowe
Joslynn Lee – Data Science Educator
Matt Link Associate Vice President (Acting) Director, Systems
funded by the National Science Foundation Award #ACI
National Center for Genome Analysis Support
Richard LeDuc, Ph.D. (Manager)
Presentation transcript:

The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused Technical Workshop. Berkeley, CA July 17-18, 2013 William K. Barnett, Ph.D. National Center for Genome Analysis Support

Summary The NGS Big Data Problem NCGAS as a National Model Current State and Prospects

The Next Generation Sequencing Big Data Problem

Changing genomics data environment Sequencing is:  Becoming commoditized at large centers,  Multiplying at individual labs, and  Generating (MUCH) more data Analytical capability lacks:  Bioinformatics support  Computational support  Storage support

Sequencing is getting cheaper Source:

NGS Data growth is outstripping Storage growth Source: Stein Genome Biology :207 doi: /gb

7 omicsmaps.com listed 931 Next Generation Sequencers in the US on April 29, 2013

Source: Nancy Spinner Keynote at 2012 Internet2 meeting: mm/agenda.cfm?go=session&id= &event=1149http://events.internet2.edu/2012/fall- mm/agenda.cfm?go=session&id= &event=1149 Genomics Data are Big

One Degree Imager: 1.4 Petabytes per year Large Hadron Collider: 15 Petabytes per year. The Open Science Grid moves 2 Petabytes per day The Square Kilometer Array:2,000 Petabytes per year in 2013 But other disciplines already handle Bigger

The National Center for Genome Analysis Support as a National Model

1.Bioinformatics consulting for biologists 2.Large memory clusters for assembly 3.Optimized software for better efficiency Initially Funded by the National Science Foundation Collaboration across multiple institutions Open for business at:

Research networks are now (much) faster than hard drives

NCGAS Service Model Hardware Layer OS Layer Services Layer Applications Bioinformatics Network Layer Infrastructure as a Service Public Cloud Supercomputer Cntrs Professional Admins Parallel Optimized Envs. Galaxy and Tuned Apps Expert Consulting 100 Gbps Internet2 NEEDSNCGAS Platform as a Service Software as a Service Aligned with HIPAA for Clinical research at IU

NSF-Funded or XSEDE Allocation Federally Funded NCGAS Galaxy Portal POD Galaxy Portal 5 PB D.C. 6 PB Storage 5.5 PB Storage 4 PB Storage TACC SDSC PSC Mason POD Sequencing Center NCBI 100 Gig Internet2 10 Gig NLR NCGAS Virtual Instrument IU

How does this work at scale? 1.Researchers use Galaxy to transfer files and run jobs Files transferred to Data Capacitor over Internet2 Data Capacitor mounted on several Clusters Data Capacitor mounts reference data from NCBI 2.Workflows execute on best system for that analysis 3.Results delivered through web interfaces and to visualization or other science tools

Rates For NSF Funded Researchers at IU or XSEDE Allocations: Consulting: $0 – currently 2.5 FTE bioinformaticians Data Transfer: $0 Data Storage: $0 – contemplating 25 TB allocation/project Computation: $0 Other Federally funded Researchers running on the POD Consulting: $60/hour for short consults Data Transfer: $20 per transfer Data Storage: $.10/GB/month Computation: $.09/core hour (128GB/node cluster)

Current State and Prospects

What have we done so far? 1.Partnering on 37 research projects, managing 73 TB 2.IU infrastructure aligned with HIPAA 3.XSEDE Tier 2 Service Provider 4.Optimized Trinity with Broad Institute 5.LustreWAN node at U. Hawaii at Hilo Next: 1.LustreWAN node at NCBI (July 2013) 2.More Science 3.More LustreWAN nodes at sequencing sites

NCGAS Partners Funding From:

Questions? Bill Barnett The NCGAS Team at IU: Rich LeDuc Le-Shin Wu Thomas Doak Carrie Ganote

This material is based upon work supported by the National Science Foundation under Grant No. ABI , Craig Stewart, PI. William Barnett, Matthew Hahn, and Michael Lynch, co-PIs. This work was supported in part by the Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute Any opinions presented here are those of the presenter(s) and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies Acknowledgements and Disclaimers

License Terms Please cite as: Barnett, William K, The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists, presented at the Internet2 Network Infrastructure for the Life Sciences Focused Technical Workshop. Berkeley, CA. Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, contents of this presentation are copyright 2011 by the Trustees of Indiana University. This document is released under the Creative Commons Attribution 3.0 Unported license ( This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

JIM WILLIAMS DIRECTOR, INTERNATIONAL NETWORKING

US DOMESTIC NETWORKING  Very strong high-performance backbone (100G national footprint) both from ESnet and Internet2  Performance to your institution and your lab may vary due to local considerations  Internet2 and ESnet have network related scientist support functions and services

US AND INTERNATIONAL NETWORKING  NSF funded IRNC program provides 10G connectivity to Europe, Asia and South/Latin America  Within those regions national networks provide varying degrees of connectivity  IRNC investigators anxious to assist researchers with their connectivity issues

FUTURES  Domestic networking in many countries is 100G. International networking moving in that direction.  ANA-100G supplies 100G TA connectivity between NYC and AMS.  ESnet interested in extending the 100G ESnet network to Europe (EEX)

SUPPORT IS ALWAYS THE ISSUE  To contact IRNC investigators:  To contact Internet2: To contact ESnet: