Richard LeDuc, Ph.D. (Manager)

Slides:



Advertisements
Similar presentations
QCloud Queensland Cloud Data Storage and Services 27Mar2012 QCloud1.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
FutureGrid: an experimental, high-performance grid testbed Craig Stewart Executive Director, Pervasive Technology Institute Indiana University
Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network Stephen Simms Manager, High Performance File Systems.
LARGE SCALE DEPLOYMENT OF DAP AND DTS Rob Kooper Jay Alemeda Volodymyr Kindratenko.
1 Developing a Data Management Plan C&IT Resources for Data Storage and Data Security Patrick Gossman Deputy CIO for Research January 16, 2014.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
1 Florida Cyberinfrastructure Development: SSERCA Fall Internet2 Meeting Raleigh, Va October 3, 2011 Paul Avery University of Florida
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Enabling Science Through Campus Bridging A case study with mlRho Scott Michael July 24, 2013.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers. William Barnett, Thomas.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)
© 2010 Pittsburgh Supercomputing Center Pittsburgh Supercomputing Center RP Update July 1, 2010 Bob Stock Associate Director
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
NCGAS provides A specific goal is to provide dedicated access to memory rich supercomputers customized for genomics studies, including Mason and other.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
1 Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Ben Rogers August 18,  High Performance Computing  Data Storage  Hadoop Pilot  Secure Remote Desktop  Training Opportunities  Grant Collaboration.
Extreme Scale Infrastructure
High Performance Computing (HPC)
Accessing the VI-SEEM infrastructure
A Brief Introduction to NERSC Resources and Allocations
What is HPC? High Performance Computing (HPC)
CyVerse Tools and Services
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
What is the National Data Service?
A Few Questions Before We Begin
Bioinformatics Community of CNGrid A New Approach to Utilizing Grids
National Center for Genome Analysis Support
Recap: introduction to e-science
LQCD Computing Operations
Bioinformatic analysis using Jetstream, a cloud computing environment
XSEDE’s Campus Bridging Project
Shared Research Computing Policy Advisory Committee (SRCPAC)
Introduce yourself Presented by
Cyberinfrastructure for the Life Sciences
OGCE Portal Applications for Grid Computing
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Trip report: Visit to UPPNEX
Presentation transcript:

Richard LeDuc, Ph.D. (Manager) National Center for Genome Analysis Support Leverages XSEDE Resources to Support Life Scientists William K. Barnett, Ph.D. (Director) Richard LeDuc, Ph.D. (Manager) National Center for Genome Analysis Support XSEDE 2013, San Diego CA, 7/23/2013

Summary What is NCGAS? What do we do? How do we do it? I will assume you know more about HPC than biology. National Center for Genome Analysis Support: http://ncgas.org

Funded by National Science Foundation Large memory clusters for assembly Bioinformatics consulting for biologists Optimized software for better efficiency Collaboration across IU, TACC, SDSC, and PSC. Open for business at: http://ncgas.org

Making it easier for Biologists Common Rare Computational Skills LOW HIGH Web interface to NCGAS resources Supports many bioinformatics tools Available for both research and instruction.

We Provide: Large RAM computational resources Appropriate storage Data transport assistance IT (help-desk-like) Support Bioinformatic Consultation and support

The services announced today include: Storage of up to 50 terabytes of research data on IU's Scientific Data Archive tape storage system. Services for curation and long-term storage of data sets and final results from genome research in the IUScholarWorks… • NCGAS will write letters of commitment for consulting, computation, and data storage resources to include with grant proposals …

Service Continuum Focus

Staffing 3.7 FTE Direct Staff I’m a biologist with 15 years software engineering experience PhD Computer Scientist Full-time Bioinformatic Analyst 50% of a PhD Genomicist 20% of people above me But we have direct access to the rest of our partner supercomputing centers Customize footer: View menu/Header and Footer 11/22/2018

NCGAS Cyberinfrastructure at IU Rockhopper: 11 servers with 48 cores and 128 GB RAM. Mason large memory cluster: 16 nodes with 32 cores each and 512 GB RAM per node. Data Capacitor: 1 PB at 20 Gbps throughput. SDA – 17(+) PB hierarchical tape archive Additional resources through our XSEDE partners XRAC allocation National Center for Genome Analysis Support: http://ncgas.org

Rockhopper Penguin Computing's Penguin-On-Demand (POD) supercomputing cloud appliance hosted by Indiana University. A collaborative effort between Penguin Computing, IU, the University of Virginia, the University of California Berkeley, and the University of Michigan. Provides supercomputing cloud services in a secure US facility. Researchers at US institutions of higher education and Federally Funded Research and Development Centers (FFRDCs) can purchase computing time from Penguin Computing, and receive access via high-speed national research networks operated by IU. National Center for Genome Analysis Support: http://ncgas.org

Standardized Trinity Analyses National Center for Genome Analysis Support: http://ncgas.org

Who do we serve?

Zhao et al. BMC Bioinformatics 2011, 12(Suppl 14):S2 http://www.biomedcentral.com/1471-2105/12/S14/S2 Haas, B., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P., Bowden, J., Couger, M., Eccles, D., Li, B., Lieber, M., MacManes, M., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C., Henschel, R., LeDuc, R., Friedman, N., and Regev, A. (2013) De novo transcript sequence reconstruction from RNA-Seq using the Trinity platform for reference generation and analysis, Nature Protocols, in press.

The Project A project represents a single NSF grant, or similar for Server-on-Demand services. Projects are inherent organic organizational structures used in biology. Researchers know what projects they are on, who else is on the project, what the project is trying to accomplish etc. Projects are frequently widely distributed.

Projects “SUGAR”: Schaack lab Undergraduate Genome Analyses at Reed

Projects as of Early July 2013

How do we support Projects

Support varies by Project need Most projects just need access to the resources, and some technical support. Other projects require more staff interaction; up to and including intellectual contributions to the project. Several projects have utilized “private” Galaxy instances.

GALAXY.NCGAS.ORG Model Quarry Mason NCGAS establishes tools, hardens them, and moves them into production. Virtual box hosting Galaxy.ncgas.org The host for each tool is configured individually Individual projects can get duplicate VMs. Quarry Mason Each project can get 50 TB of archive space for raw data. Policies guarantee that untouched data is removed with time. Data Capacitor Archive

Current System Globus On-line and other tools Lustre WAN File System NCGAS Mason (Free for NSF users) 100 Gbps Your Friendly Neighborhood Sequencing Center Optimized Software Lustre WAN File System Globus On-line and other tools IU POD (12 cents per core hour) Data Capacitor NO data storage Charges Your Friendly Neighborhood Sequencing Center Other NCGAS XSEDE Resources… How this works at scale: Biologists use Galaxy to execute workflows Sequence data mounted via Lustre WAN or automatically transferred using Internet2 Data Capacitor flows data into Mason or other computational clusters Data Capacitor mounts or mirrors reference data from NCBI or other sources Results delivered through web interfaces and to visualization or other science tools 10 Gbps Your Friendly Neighborhood Sequencing Center

Future Direction

NCGAS gives back to XSEDE NCGAS is a Tier 2 XSEDE Partner. XSEDE allocations are available on Mason (our large RAM cluster). 300,000 SU’s of 0.5 TB RAM nodes with NCGAS support software.

Thank You Questions? Bill Barnett Rich LeDuc Le-Shin Wu Carrie Ganote Tom Doak