Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.

Slides:



Advertisements
Similar presentations
Internet Information Services 7.0 and Internet Information Services 7.5 Infrastructure Planning and Design Published: June 2008 Updated: November 2011.
Advertisements

IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
2  Industry trends and challenges  Windows Server 2012: Modern workstyle, enabled  Access from virtually anywhere, any device  Full Windows experience.
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
Module 1: Demystifying Software Defined Networking Module 2: Realizing SDN - Microsoft’s Software Defined Networking Solutions with Windows Server 2012.
VADE - Virtual Assembly Design Environment Virtual Reality & Computer Integrated Manufacturing Lab.
Networks, Grids and Service Oriented Architectures eInfrastructures Workshop.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
HUBZERO AT INDIANA UNIVERSITY: THE INDIANA CTSI HUB Bill Barnett EDUCAUSE October 14, 2010.
Extreme Networks Confidential and Proprietary. © 2010 Extreme Networks Inc. All rights reserved.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network Stephen Simms Manager, High Performance File Systems.
A Pervasive Technology Institute Center What is The National Center for Genome Analysis Support? NCGAS is a national center dedicated to providing scientists.
1 Developing a Data Management Plan C&IT Resources for Data Storage and Data Security Patrick Gossman Deputy CIO for Research January 16, 2014.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
IT Infrastructure Chap 1: Definition
An emerging computing paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Bioinformatics Core Facility Guglielmo Roma January 2011.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Cloud Strategy made Simple David G. Fletcher. 2 Hybrid Cloud Approach Utah is building a private cloud to provision services from its virtualized infrastructure.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers. William Barnett, Thomas.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 Automate your way to.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)
Communications & Networks National 4 & 5 Computing Science.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Architecture & Cybersecurity – Module 3 ELO-100Identify the features of virtualization. (Figure 3) ELO-060Identify the different components of a cloud.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
Our Mission. Computer Purchasing Website Design and Development Services.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Galaxy based BLAST submission to distributed high throughput computing resources Rob Quick and Soichi Hayashi Open Science Grid Operations Indiana University.
Canadian Bioinformatics Workshops
February 3, 2009 Bridging Academic and Medical Cultures Academic Research Systems and HIPAA William K. Barnett Anurag Shankar.
Unit 3 Virtualization.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
National Center for Genome Analysis Support
Recap: introduction to e-science
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Introduction to D4Science
Richard LeDuc, Ph.D. (Manager)
SDM workshop Strawman report History and Progress and Goal.
IBM Power Systems.
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Distributing META-pipe on ELIXIR compute resources
Presentation transcript:

Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales William K. Barnett, Ph.D. Richard LeDuc, Ph.D. National Center for Genome Analysis Support

Bio-IT World Asia, June 7, 2012National Center for Genome Analysis Support: Summary Changing genomics analytical needs NCGAS and its mission NCGAS cyberinfrastructure The 100 Gigabit demonstration Scaling genomics analysis The NCGAS research model Outcomes for life sciences research

National Center for Genome Analysis Support: Changing genomics analytical needs Next Gen sequencers are generating more data and getting cheaper Sequencing is:  Becoming commoditized at large centers and  Multiplying at individual labs Analytical capacity has not kept up  Bioinformatics support  Computational support (thousand points solution)  Storage support Bio-IT World Asia, June 7, 2012

National Center for Genome Analysis Support: NCGAS widening the analytical bottleneck Funded by National Science Foundation (grant # ABI ) Large memory clusters for assembly Bioinformatics consulting for biologists Optimized software for better efficiency Providing services at: Bio-IT World Asia, June 7, 2012

National Center for Genome Analysis Support: Making it easier for Biologists Galaxy interface provides a “user friendly” window to NCGAS resources Supports many bioinformatics tools Available for both research and instruction. Common Rare Computational Skills LOW HIGH Bio-IT World Asia, June 7, 2012

National Center for Genome Analysis Support: NCGAS Service Model Hardware Layer OS Layer Services Layer Applications Bioinformatics Network Layer Public Cloud Providers NCGAS Mason (512 GB/node) Systems Administration Galaxy, Parallelization Hardened Applications and Workflows Expert Consulting 100 Gbps I2 Bio-IT World Asia, June 7, 2012 NEED APIs

National Center for Genome Analysis Support: NCGAS Galaxy Applications Model Virtual box hosting Galaxy.Indiana.edu The host for each tool is configured to meet IU needs Quarry Mason Data Capacitor RFS Virtual box hosting Galaxy.NCGAS.org The host for each tool is configured to meet National needs Custom Site Hosting Galaxy.YourSite.??? The host for each tool is configured to meet Your needs Bio-IT World Asia, June 7, 2012

National Center for Genome Analysis Support: NCGAS Workflow Demo at SC 11 STEP 1: data pre- processing, to evaluate and improve the quality of the input sequence STEP 2: sequence alignment to a known reference genome STEP 3: SNP detection to scan the alignment result for new polymorphisms Bloomington, INSeattle, WA Bio-IT World Asia, June 7, 2012

10 Gbps 100 Gbps Mason IU POD Data Capacitor NCBI Reference Data Lustre WAN File System Large Sequencing Center NCGAS Virtual Genomics Science Instrument International Collaborators via TransPAC, Geant Smaller Sequencing Centers FTP

Commodity Internet (1Gbps but highly variable) Internet2 (100Gbps) Gbps NLR to Sequencing Centers (10Gbps/link) IU Data Capacitor (20 Gbps throughput) Ultra SCSI 160 Disk (1.2 Gbps, 160 MBps) DDR3 SDRAM (51.2 Gbps, 6.4GBps, ) This Architecture Scales! Bio-IT World Asia, June 7, 2012 National Center for Genome Analysis Support:

National Center for Genome Analysis Support: How would this work at scale? 1.Biologists anywhere use Galaxy 2.Sequence data transferred over Research Nets 3.Lustre WAN flows data into Data Capacitor 4.Data Capacitor mounts reference data 5.Results available on Data Capacitor for subsequent analyses (secure to HIPAA standards) Bio-IT World Asia, June 7, 2012

National Center for Genome Analysis Support: Outcomes for Life Sciences Research… National and international networks have the capacity to handle genomics data. Distributed workflow tools lower the bar for biologists to accomplish genomic science. NCGAS is an extensible model of a scaled and integrated infrastructure for biological research. This model can extend internationally Bio-IT World Asia, June 7, 2012

National Center for Genome Analysis Support: Thank You Questions? Bill Barnett Rich LeDuc Bio-IT World Asia, June 7, 2012