High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A. Stewart, Ph.D. Director, Research and.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Overview of IU activities in supercomputing, grids, and computational biology Dr. Craig A. Stewart Director, Research and Academic Computing,
Bill Barnett, Bob Flynn & Anurag Shankar Pervasive Technology Institute and University Information Technology Services, Indiana University CASC. September.
Data Gateways for Scientific Communities Birds of a Feather (BoF) Tuesday, June 10, 2008 Craig Stewart (Indiana University) Chris Jordan.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
© The Trustees of Indiana University Centralize Research Computing to Drive Innovation…Really Thomas J. Hacker Research & Academic Computing University.
INDIANAUNIVERSITYINDIANAUNIVERSITY 1 Getting More for Less: A Software Distribution Model John V. Samuel, Craig A. Stewart, and Kevin J. Wilhite University.
© Copyright High Performance Concepts, Inc. 12 Criteria for Software Vendor Selection July 14, 2014 prepared by: Brian Savoie Vice President HIGH.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Prepared for the.
This chapter is extracted from Sommerville’s slides. Text book chapter
Research & Academic IU Bradley C. Wheeler Associate Vice President & Dean Office of the VP for Information Technology & CIO
INDIANAUNIVERSITYINDIANAUNIVERSITY April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart
Current challenges and opportunities in Biogrids Dr. Craig A. Stewart Director, Research and Academic Computing, University Information.
Computational Biology: Data, computation, and visualization Dr. Craig A. Stewart & Dr. Eric Wernert 7 August 2003.
Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Delivering a New Desktop and Application Deployment Strategy Indiana University and the New Emerging Personal Computing Model Duane Schau
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Research & Academic Computing Bradley C. Wheeler Associate Vice President & Dean.
Information technology, collaboration, and achieving IU ’ s research goals Craig A. Stewart 13 November 2003 Director, Research and Academic.
Florida Advanced Computing Consortium A vision and a plan for research computing in Florida.
Craig Stewart 23 July 2009 Cyberinfrastructure in research, education, and workforce development.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Using the Purdue DB Technology to build.
INDIANAUNIVERSITYINDIANAUNIVERSITY January 2002 INGEN's advanced IT facilities Craig A. Stewart
Supporting the local research data environment via cross-campus collaboration and leveraging of national expertise Hannah F. Norton, Rolando Garcia Milian,
Goodbye from Indianapolis, IUPUI, and Craig A. Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies Indiana.
Computational Biology: Practical lessons and thoughts for the future Dr. Craig A. Stewart Visiting Scientist, Höchstleistungsrechenzentrum.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
I-Light: A Network for Collaboration between Indiana University and Purdue University Craig Stewart Associate Vice President Gary Bertoline Associate Vice.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Leveraging the National Cyberinfrastructure for Top Down Mass Spectrometry Richard LeDuc.
September 6, 2013 A HUBzero Extension for Automated Tagging Jim Mullen Advanced Biomedical IT Core Indiana University.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. The IQ-Table & Collection Viewer A.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
1 BioGrids in the US: Current status and future opportunities Craig A. Stewart 15 April 2004 Director, Research and Academic Computing Director,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Making Campus Cyberinfrastructure Work for Your Campus Guy Almes Patrick Dreher Craig Stewart Dir. Academy for Dir. Advanced Computing Associate Dean Advanced.
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream Overview – XSEDE ’15 Panel - New and emerging.
INDIANAUNIVERSITYINDIANAUNIVERSITY 1 Parallel implementation and performance of fastDNAml - a program for maximum likelihood phylogenetic inference Craig.
Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Research Computing Archived Presentation Title:Indiana Economic Development From Indiana Economic Development Corporation to Indiana and Purdue.
INDIANAUNIVERSITYINDIANAUNIVERSITY Spring 2000 Indiana University Information Technology University Information Technology Services Please cite as: Stewart,
November 18, 2015 Quarterly Meeting 30Aug2011 – 1Sep2011 Campus Bridging Presentation.
February 27, 2007 University Information Technology Services Research Computing Craig A. Stewart Associate Vice President, Research Computing Chief Operating.
UITS Research Technologies – Services Available to Regenstrief Institute 13 Oct 2015 Craig Stewart ORCID ID Executive Director, Indiana.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Recent key achievements in research computing at IU Craig Stewart Associate Vice President, Research & Academic Computing Chief Operating Officer, Pervasive.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Update on EAGER: Best Practices and.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
Jetstream: A new national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration.
Jonathan Carroll-Nellenback.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
1 A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Informatics Tools at the Indiana CTSI.
Computational Biology: Practical lessons and thoughts for the future Dr. Craig A. Stewart Visiting Scientist, Höchstleistungsrechenzentrum.
High throughput biology data management and data intensive computing drivers George Michaels.
Indiana University - IBM Visit IT at IU. n Please cite as: Stewart, C.A IU. Presentation. Presented at IBM T.J. Watson Research Center, Feb.
Jetstream Overview Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor,
1 Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Research & Academic Computing Indiana University Statewide IT Conference 11 September 2003 Indianapolis IN.
Matt Link Associate Vice President (Acting) Director, Systems
Research and Academic Computing Division
Presentation transcript:

High Performance Computing for University Medical Research: A Successful Implementation Dr. Craig A. Stewart, Ph.D. Director, Research and Academic Computing, University Information Technology Services Director, Information Technology Core, Indiana Genomics Initiative Dr. Richard Repasky, Ph.D. Bioinformatics Specialist

License Terms Please cite this presentation as: Stewart, C.A. and R. Repasky. High Performance Computing for University Medical Research: A Successful Implementation Presentation. Presented at: Bio-IT World Conference & Expo (Boston, MA, Apr 2007). Available from: Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, the contents of this presentation are copyright 2007 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license ( This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Bioinformatics and Biomedical Research Bioinformatics, Genomics, Proteomics, ____ics all promise to radically change our understanding of biological function and the way biomedical research is done. Traditional biomedical researchers must take advantage of new possibilities “Post-genomic” research must take advantage of the tremendous store of detailed knowledge held by traditional biomedical researchers

Anopheles gambiae From Source Library:Centers for Disease Control PHIL Photo Credit:Jim Gathany

IU’s goals for the Indiana Genomics Initiative (INGEN) Build on traditional strengths of IU School of Medicine Build on IU's strength in Information Technology Add new programs of research made possible by the sequencing of the human genome Perform the research that will generate new treatments for human disease in the post-genomic era Improve human health generally and in the State of Indiana particularly Enhance economic growth in Indiana INGEN was created by a $105M grant from the Lilly Endowment, Inc. and launched December, 2000 The goal of this talk is to explain how advanced information technology was implemented to aid in the meeting of these goals.

Outline Background information about IU The Indiana Genomics Initiative (INGEN) The INGEN Information Technology Core Facilities Service Some key projects Status and summary of success factors Acknowledgements

IU in a nutshell $2B Annual Budget 8 campuses, 90,000 students, 3,900 faculty 878 degree programs; > 100 programs ranked within top 20 of their type nationally Nation’s second largest school of medicine 1,347 M.D., Ph.D. and M.D./Ph.D students Sole school of medicine in Indiana Traditional strengths in human genetic diseases (e.g., Alcoholism, Huntingtons) and medical records (Regenstrief Institute)

IU in a nutshell CIO: Vice President Michael A. McRobbie ~$100M annual budget Technology services offered university- wide Networking IU Operates network Operations Center for Abilene High Performance Computing First university in US to own a 1 TFLOPS supercomputer Top 500 list has for past several years included at least one IU supercomputer

INGEN Structure Programs Bioethics Genomics Medical Informatics Education Training Cores Tech Transfer Gene Expression Cell & Protein Expression Human Expression Information Technology Proteomics Integrated Imaging In vivo Imaging Animal

Education Training Bioinformatics MedicalInformatics Genomics Bioethics Proteomics IntegratedMicroscopy CellandProteinExpression TechnologyTransfer InformationTechnology Drosophila Genotyping and Gene Expression HumanExpression In Vivo Imaging Animal Indiana Genomics Initiative Programs Cores

Information Technology Core Foci: High Performance Computing Visualization (esp. 3D) Massive Data Storage Support for use of all of the above $6.7M budget for IT Core Baseline IT services for School of Medicine responsibility of School of Medicine CIO

Challenges for UITS and the INGEN IT Core Assist traditional biomedical researchers in adopting use of advanced information technology (massive data storage, visualization, and high performance computing) Assist bioinformatics researchers in use of advanced computing facilities Questions we are asked: Why wouldn't it be better just to buy me a newer PC? Questions we ask: What do you do now with computers that you would like to do faster? What would you do if computer resources were not a constraint?

Steps in meeting the challenge Use INGEN funding to enhance IU’s high performance computing hardware environment Use INGEN funding to add dedicated staff supporting INGEN researchers Proof of concept projects showing advanced capabilities of IU’s IT environment Outreach to get many people using at least the basic capabilities of IU’s advanced IT environment

Hardware Environment I-Light network High Performance Computing IBM SP – TFLOPS Sun E GFLOPS Large, distributed Linux cluster – 1.1 TFLOPS Massive Data Storage system Advanced Visualization Systems CAVE John-E-Box

IBM Research SP (Aries/Orion Complex) Acquired 9/96, expanded in 1998, 1999, 2000,2001,2002 with help of IU IT Strategic Plan funds, IBM SUR grants and INGEN grant from Lilly Endowment, Inc. Geographically distributed at IUB and IUPUI 632 cpus, TeraFLOPS First University-owned supercomputer in US to exceed 1 TFLOPS processing capacity Initially 50 th, now 112 th in Top 500 supercomputer list Distributed memory system with shared memory nodes AIX 5.1, wealth of software including SAS, SPSS, S-Plus, Mathematica, Matlab, Maple, Gaussian, GIS, scientific/numerical libraries, Oracle and DB2, and more

IBM Research SP (Aries/Orion) ©2000 Tyagan Miller

Sun E10000 (Solar) Acquired 4/00 Shared memory architecture ~52 GFLOPS MHz cpus, 64GB memory > 2 TB external disk Solaris 2.8 Supports some bioinformatics software not available under AIX (e.g. GCG/SeqWeb)

Sun E10000 (Solar) ©2000 Tyagan Miller

Distributed Linux Cluster AVIDD (Analysis and Visualization of Instrument-Driven Data) 1.1 TFLOPS, 0.5 TB RAM, 10 TB Disk Tuned, configured, and optimized for handling real-time data streams

Massive Data Storage System Based on HPSS (High Performance Software System) 180 TB capacity with existing tapes; total capacity of 480 TB First distributed HPSS installation; STK 9310 Silos in Bloomington and Indianapolis Automatic replication of data between Indianapolis and Bloomington, via I-light, overnight. Critical for biomedical data, which is often irreplaceable.

STK Silo ©2000 Tyagan Miller

Advanced Visualization Advanced Visualization Lab – recognized as leader in implementation of 3D and other advanced visualization technologies CAVE – Immersive 3D environment John-E-Box – IU designed, low-cost passive 3D device. Under construction now, planned for installation in multiple INGEN-affiliated labs

John-E-Box Invented by John N. Huffman, John C. Huffman, and Eric Wernert

Specific benefits in hardware environment as a result of INGEN funding: Funded significant fraction of upgrade of IU’s IBM SP to 1 TFLOPS Funded addition of STK Silo in Indianapolis (and tapes) to provide redundant storage of data Funded placement of visualization equipment within the School of Medicine

So, what now that we have all of this hardware? Strategic relationships with vendors University Information Technology Services has a history of excellent customer support and long-term, collaborative research. Focus on provision of facilities and services as a competitive advantage. Annual customer satisfaction survey – user satisfaction typically > 95%. These results probably not representative of SoM as of More information available at It’s people – consulting staff – that make the hardware useful for researchers

INGEN IT Core Support Staff Visualization programmer, HPC programmer, and bioinformatics database specialist hired to support INGEN Staff added to existing management units within UITS economy of scale (management, exchange of expertise) Assures addition rather than substitution for base- funded consulting support

So, why is this better than just buying me a new PC? Unique facilities provided by IT Core Redundant data storage HPC – better uniprocessor performance; trivially parallel programming, parallel programming Visualization in the research laboratories Hardcopy document – INGEN's advanced IT facilities: The least you need to know Outreach efforts Demonstration projects

Example projects Multiple simultaneous Matlab jobs for brain imaging. Installation of many commercial and open source bioinformatics applications. Site licenses for several commercial packages Evaluation of several software products that were not implemented.

Creation of new software Gamma Knife – Penelope. Modified existing version for more precise targeting with IU's Gamma Knife. Karyote (TM) Cell model. Developed a portion of the code used for model cell function. PiVNs. Software to visualize human family trees 3-DIVE (3D Interactive Volume Explorer). fastDNAml – maximum likelihood phylogenies ( Protein Family Annotator – collaborative development with IBM, Inc.

Data Integration Goal set by IU School of Medicine: Any research within the IU School of Medicine should be able to transparently query all relevant public external data sources and all sources internal to the IU School of Medicine to which the researcher has read privileges IU has more than 1 TB of biomedical data stored in massive data storage system There are many public data sources Different labs were independently downloading, subsetting, and formatting data Solution: IBM DiscoveryLink, DB/2 Information Integrator

Centralized Life Science Database (CSLD) Based on use of IBM DiscoveryLink (TM) and DB/2 Information Integrator (TM) Public data is still downloaded, parsed, and put into a database, but now the process is automated and centralized. Lab data and programs like BLAST are included via DL’s wrappers. Implemented in partnership with IBM Life Sciences via IU-IBM strategic relationship in the life sciences IU contributed writing of data parsers

Status Overall So far, so good 108 users of IU’s supercomputers 104 users of massive data storage system Six new software packages created or enhanced, more than 20 packages installed for use by INGEN-affiliated researchers 1 TB of biomedical data stored in the massive data storage system Three software packages made available as open source software as direct result of INGEN The INGEN IT Core is providing services valued by traditionally trained biomedical researchers as well as researchers in bioinformatics, genomics, proteomics, etc.

Success in meeting goals? Work on Penelope code for Gamma Knife likely to be first major transferable technology development. Stands to improve efficacy of Gamma Knife treatment at IU Excellent success in supporting basic research Development of open source software (licensed under terms similar to Lesser GNU) provide opportunities for technology transfer Participation in grants and industrial partnerships provides economic benefit for IU

Success factors Creation of new position, Chief Information Officer and Associate Dean, within IU School of Medicine, and significant improvement in basic IT infrastructure within the IU School of Medicine INGEN has permitted IU to build on excellent IT infrastructure Dedicated (but not isolated) staff supporting INGEN researchers Commitment to customer service Outreach (in the proper formats)

Success factors, con't Scientific collaborations Strategy research on behalf of IU School of Medicine Accountability Leveraging of industrial partnerships

Funding Support This research was supported in part by the Indiana Genomics Initiative (INGEN). The Indiana Genomics Initiative (INGEN) of Indiana University is supported in part by Lilly Endowment Inc. Joint Study Agreement with IBM, Inc. Protein Family Annotator: School of Informatics - M Dalkilic, Center for Genomics and Bioinformatics - P Cherbas, Univ. Information Technology Services & INGEN IT Core - C Stewart. This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University. This material is based upon work supported by the National Science Foundation under Grant No and Grant No. CDA Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Additional Information Further information is available at ingen.iu.edu

Acknowledgements (People) UITS Research and Academic Computing Division managers: Mary Papakhian, David Hart, Stephen Simms, Richard Repasky, Matt Link, John Samuel, Eric Wernert, Anurag Shankar INGEN Staff: Andy Arenson, Chris Garrison, Huian Li, Jagan Lakshmipathy, David Hancock UITS Senior Management: Associate Vice President and Dean Christopher Peebles, RAC(Data) Director Gerry Bernbom Assistance with this presentation: John Herrin, Malinda Lingwall