Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.

Slides:



Advertisements
Similar presentations
April 19, 2015 CASC Meeting 7 Sep 2011 Campus Bridging Presentation.
Advertisements

Bill Barnett, Bob Flynn & Anurag Shankar Pervasive Technology Institute and University Information Technology Services, Indiana University CASC. September.
Data Gateways for Scientific Communities Birds of a Feather (BoF) Tuesday, June 10, 2008 Craig Stewart (Indiana University) Chris Jordan.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Jetstream Overview.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Prepared for the.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Rockhopper: Penguin on Demand at Indiana.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network Stephen Simms Manager, High Performance File Systems.
Craig Stewart 23 July 2009 Cyberinfrastructure in research, education, and workforce development.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Using the Purdue DB Technology to build.
Goodbye from Indianapolis, IUPUI, and Craig A. Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies Indiana.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
I-Light: A Network for Collaboration between Indiana University and Purdue University Craig Stewart Associate Vice President Gary Bertoline Associate Vice.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
Leveraging the National Cyberinfrastructure for Top Down Mass Spectrometry Richard LeDuc.
September 6, 2013 A HUBzero Extension for Automated Tagging Jim Mullen Advanced Biomedical IT Core Indiana University.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. The IQ-Table & Collection Viewer A.
The Animated Sequence Chapter 5.1 in Sketching User Experiences: The Workbook.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
1 BioGrids in the US: Current status and future opportunities Craig A. Stewart 15 April 2004 Director, Research and Academic Computing Director,
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream - A self-provisioned, scalable science and.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
July 18, 2012 Campus Bridging Security Challenges from “Panel: Security for Science Gateways and Campus Bridging”
©2013 Core Knowledge Foundation. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream Overview – XSEDE ’15 Panel - New and emerging.
Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu.
INDIANAUNIVERSITYINDIANAUNIVERSITY Spring 2000 Indiana University Information Technology University Information Technology Services Please cite as: Stewart,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
November 18, 2015 Quarterly Meeting 30Aug2011 – 1Sep2011 Campus Bridging Presentation.
February 27, 2007 University Information Technology Services Research Computing Craig A. Stewart Associate Vice President, Research Computing Chief Operating.
Win8 on Intel Programming Course Paul Guermonprez Intel Software
Craig Stewart ORCID ID Jetstream Principal Investigator Executive Director, Indiana University Pervasive Technology Institute 30 September.
UITS Research Technologies – Services Available to Regenstrief Institute 13 Oct 2015 Craig Stewart ORCID ID Executive Director, Indiana.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Recent key achievements in research computing at IU Craig Stewart Associate Vice President, Research & Academic Computing Chief Operating Officer, Pervasive.
The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers. William Barnett, Thomas.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Update on EAGER: Best Practices and.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
Jetstream: A new national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)
Craig Stewart ORCID ID Jetstream Principal Investigator Executive Director, Indiana University Pervasive Technology Institute Presented.
1 A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Informatics Tools at the Indiana CTSI.
Jetstream Overview Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor,
Galaxy based BLAST submission to distributed high throughput computing resources Rob Quick and Soichi Hayashi Open Science Grid Operations Indiana University.
1 Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration and.
Research & Academic Computing Indiana University Statewide IT Conference 11 September 2003 Indianapolis IN.
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Matt Link Associate Vice President (Acting) Director, Systems
funded by the National Science Foundation Award #ACI
National Center for Genome Analysis Support
Introduction to electronic resources management
E-resource evaluation tips
Richard LeDuc, Ph.D. (Manager)
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Presentation transcript:

Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012

Central Dogma of Molecular Biology DNA ATGGC ATAC C DNA Replicates itself mRNA DNA is transcribed to RNA Protein RNA is translated to protein

Central Dogma of Molecular Biology DNAmRNAProtein Genomics Transcriptomics Proteomics

Tools of the Trade Instruments Next-Generation Sequencers  Illumina  454  PacBio Mass Spectrometers  5 kinds of mass analyzers  Hybrid analyzers + separation technology Techniques 4

5 Zhao et al. BMC Bioinformatics 2011, 12(Suppl 14):S2 Figure © Vincent Montoya / wikipedia

Analysis as Data Reduction Proteomics Shotgun Bottom-up  3.4 GB of instrument data  172 MB (x1/20) of unstructured files (5,219 files in 67 folders)  13 MB of publishable results (x1/260).  Improved technology increases the size of the instrument files, but not usually the intermediate or final file sizes. DNA Sequencing  Often on the order of x1/2500 from start to finish Instrument Data

Options for Computational Support Compute at the Instrument Supercomputer in a box  Many commercial venders are entering with turn-key solutions to specific problems.  Limited variety of analytic expertise. Build Your Own Computational Center  A rack or two, a few servers, and you are good to go.  Only a subset of HPC skills are present in staff. Computer Centers

Funded by National Science Foundation 1. Large memory clusters for assembly 2. Bioinformatics consulting for biologists 3. Optimized software for better efficiency Open for business at:

Making it easier for Biologists Web interface to NCGAS resources Supports many bioinformatics tools Available for both research and instruction. Common Rare Computational Skills LOW HIGH

10

GALAXY.NCGAS.ORG Model Virtual box hosting Galaxy.ncgas.org The host for each tool is configured individually Quarry Mason Data Capacitor Archive NCGAS establishes tools, hardens them, and moves them into production. Custom Galaxy tools can be made for moving data Individual projects can get duplicate boxes – provided they support it themselves. Policies on the DC guarantee that untouched data is removed with time.

NCGAS Sandbox Demo at SC 11 STEP 1: data pre- processing, to evaluate and improve the quality of the input sequence STEP 2: sequence alignment to a known reference genome STEP 3: SNP detection to scan the alignment result for new polymorphisms

10 Gbps 100 Gbps NCGAS Mason (Free for NSF users) IU POD (12 cents per core hour) Data Capacitor NO data storage Charges Your Friendly Neighborhood Sequencing Center Your Friendly Neighborhood Sequencing Center Your Friendly Neighborhood Sequencing Center Moving Forward Other NCGAS XSEDE Resources… Lustre WAN File System Globus On-line and other tools Optimized Software

How would this work at scale? 1.Biologists use Galaxy and other web portals to move data and execute workflows 2.Instrument data transferred across Internet2 3.Data Capacitor flows data into Mason or other computational clusters 4.Data reduction allows “compute in place” to work 5.Data Capacitor mounts or mirrors reference data from NCBI or other sources

In Sum… Modern molecular biology – specifically the omics such as genomics, transcriptomics, and proteomics, provides many tools for answering many questions, but no single solution meets all needs. The amount of data generated decreases along a workflow. This has implications in both storage and analysis. NCGAS can provide a national scale infrastructure to better serve the needs of biologists who cannot become bioinformaticians to accomplish their research. Increasingly specialized skills are needed to provide best-practice solutions at all steps in a workflow.

Thank You Questions? Bill Barnett Rich LeDuc Le-Shin Wu Carrie Ganote

NCGAS Cyberinfrastructure at IU Mason large memory cluster (512 GB/node) Quarry cluster (16 GB/node) Data Capacitor (1 PB at 20 Gbps throughput) Research File System (RFS) for data storage Research Database Cluster for managing data sets. All interconnected with a high speed internal network (40 Gbps)

Acknowledgements & disclaimer This material is based upon work supported by the National Science Foundation under Grants No. ABI This work was supported in part by the Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute Any opinions presented here are those of the presenter(s) and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies

License terms Please cite as: LeDuc, R.D., Genomics, Transcriptomics, and Proteomics: Engaging Biologists, presented at Extending High-Performance Computing Beyond its Traditional User Communities, Co-located with the 8th IEEE International Conference on eScience, Chicago, USA, October 8, Available from: Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, contents of this presentation are copyright 2011 by the Trustees of Indiana University. This document is released under the Creative Commons Attribution 3.0 Unported license ( This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.