1 Get the convenience of cloud while keeping your rights – through the IU / Penguin Computing partnership Craig A. Stewart - Executive.

Slides:



Advertisements
Similar presentations
April 19, 2015 CASC Meeting 7 Sep 2011 Campus Bridging Presentation.
Advertisements

Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –
CASC Spring Meeting 2012 Craig A. Stewart
Bill Barnett, Bob Flynn & Anurag Shankar Pervasive Technology Institute and University Information Technology Services, Indiana University CASC. September.
NSF ACCI Task Force on Campus Bridging CASC Meeting 16 March 2011, Arlington VA Craig Stewart Von Welch This material.
Data Gateways for Scientific Communities Birds of a Feather (BoF) Tuesday, June 10, 2008 Craig Stewart (Indiana University) Chris Jordan.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Plan Introduction What is Cloud Computing?
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Prepared for the.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Rockhopper: Penguin on Demand at Indiana.
1 Penguin Computing and Indiana University partner for “above campus” and campus bridging services to the community Craig A. Stewart Executive.
Effectively and Securely Using the Cloud Computing Paradigm.
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
CLOUD COMPUTING & COST MANAGEMENT S. Gurubalasubramaniyan, MSc IT, MTech Presented by.
Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Overview of NSF ACCI Task Force on Campus Bridging Report Craig Stewart Von Welch Presented at Coalition for Academic.
DISTRIBUTED COMPUTING
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
XSEDE Campus Bridging Birds Of a Feather Rich Knepper Craig Stewart James Wade Ferguson Presented at TeraGrid ‘11,
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Leveraging the National Cyberinfrastructure for Top Down Mass Spectrometry Richard LeDuc.
September 6, 2013 A HUBzero Extension for Automated Tagging Jim Mullen Advanced Biomedical IT Core Indiana University.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. The IQ-Table & Collection Viewer A.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream - A self-provisioned, scalable science and.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
July 18, 2012 Campus Bridging Security Challenges from “Panel: Security for Science Gateways and Campus Bridging”
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream Overview – XSEDE ’15 Panel - New and emerging.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
INDIANAUNIVERSITYINDIANAUNIVERSITY Spring 2000 Indiana University Information Technology University Information Technology Services Please cite as: Stewart,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
November 18, 2015 Quarterly Meeting 30Aug2011 – 1Sep2011 Campus Bridging Presentation.
February 27, 2007 University Information Technology Services Research Computing Craig A. Stewart Associate Vice President, Research Computing Chief Operating.
PaaSport Introduction on Cloud Computing PaaSport training material.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Recent key achievements in research computing at IU Craig Stewart Associate Vice President, Research & Academic Computing Chief Operating Officer, Pervasive.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Update on EAGER: Best Practices and.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
Jetstream: A new national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
1 A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Web Technologies Lecture 13 Introduction to cloud computing.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Informatics Tools at the Indiana CTSI.
Jetstream Overview Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor,
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
1 Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Research & Academic Computing Indiana University Statewide IT Conference 11 September 2003 Indianapolis IN.
Matt Link Associate Vice President (Acting) Director, Systems
funded by the National Science Foundation Award #ACI
CNIT131 Internet Basics & Beginning HTML
Campus Bridging at XSEDE
Cloud Computing: Concepts
Presentation transcript:

1 Get the convenience of cloud while keeping your rights – through the IU / Penguin Computing partnership Craig A. Stewart - Executive Director, Pervasive Technology Institute; Associate Dean, Research Technologies; Associate Director, Matthew Jacobs – Senior Vice President, Corporate Development, Penguin Computing Barbara Hallock – Senior Systems Analyst Richard Knepper - Manager, Campus Bridging and Research Infrastructure William K. Barnett - Director, National Center for Genome Analysis Support; Director, Science Community Tools; Associate Director, Center for Applied Cybersecurity Research

2 Based on: Welch, V.; Sheppard, R.; Lingwall, M.J.; Stewart, C. A Current structure and past history of US cyberinfrastructure (data set and figures). hdl.handle.net/2022/13136

Adequacy of research CI 3 Stewart, C.A., D.S. Katz, D.L. Hart, D. Lantrip, D.S. McCaulay and R.L. Moore. Technical Report: Survey of cyberinfrastructure needs and interests of NSF-funded principal investigators hdl.handle.net/2022/9917 Responses to asking if researchers had sufficient access to cyberinfrastructure resources – survey sent to 5,000 researchers selected randomly from 34,623 researchers funded by NSF as Principal Investigators ; results based on 1,028 responses

Photo by creativecommons.org/licenses/by/2.0 Clouds look serene enough 4

Cloud computing - NIST Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics (on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service); three service models (Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS)); and four deployment models (private cloud, community cloud, public cloud, hybrid cloud). Key enabling technologies include: – Fast wide-area networks – Powerful, inexpensive server computers – High-performance virtualization for commodity hardware 5

But is cloud computing all the pundits claim? Where are your data? What laws prevail over the physical location of your data? What license did you agree to? Did you read the license terms? – “When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works … communicate, publish, publicly perform, publicly display and distribute such content.” - What is the security (electronic / physical) around your data? And how exactly do you get to that cloud, or get things out of it? How secure is your provider financially? (The fact that something seems unimaginable, like a cloud provider going out of business abruptly, does not mean it is impossible!) If you care about parallel performance, is a cloud provider the right solution? 6

Above-campus services – not exactly clouds Above-campus services – "We are seeing the early emergence of a meta-university – a transcendent, accessible, empowering, dynamic, communally constructed framework of open materials and platforms on which much of higher education worldwide can be constructed or enhanced.” Charles Vest, president emeritus of MIT, 2006 Goal: achieve economy of scale and retain reasonable measure of control See: Brad Wheeler and Shelton Waggener Above-Campus Services: Shaping the Promise of Cloud Computing for Higher Education. EDUCAUSE Review, vol. 44, no. 6 (November/December 2009): neVolume44AboveCampusServicesShapingtheP/

Penguin Computing and IU partner for “Cluster as a Service” Just what it says: Cluster as a Service Cluster physically located on IU’s campus, in IU’s Data Center Available to anyone at a.edu or FFRDC (Federally Funded Research and Development Center) To use it: – Go to podiu.penguincomputing.com – Fill out registration form – Verify via your – Get out your credit card – Go computing This builds on Penguin’s experience – currently hosting Life Technologies' BioScope and LifeScope in the cloud (lifescopecloud.com) 8

We know where the data are … and they are secure 9

POD IU (Rockhopper) specifications Server Information ArchitecturePenguin Computing Altus 1804 TFLOPS4.4 Clock speed2.1GHz Nodes11 compute; 2 login; 4 management; 3 servers CPUs4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node Memory typeDistributed and shared Total memory1408 GB Memory per node128GB 1333MHz DDR3 ECC Local scratch storage6TB locally attached SATA2 Cluster scratch100TB Lustre Further Details OSCentOS 5 NetworkQDR (40Gb/s) Infiniband, 1Gb/s ethernet Job management softwareSGE Job scheduling softwareSGE Job scheduling policyFair share AccessKeybased ssh login to head nodes remote job control via Penguin's PODShell 10

Package nameSummary COAMPSCoupled ocean / atmosphere meoscale prediction system Desmond Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. GAMESSGAMESS is a program for ab initio molecular quantum chemistry. GalaxyGalaxy is an open, web-based platform for data intensive biomedical research. GROMACS GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. HMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. IntelCompilers and libraries LAMMPS LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. MM5 The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is supported by several pre- and post-processing programs, which are referred to collectively as the MM5 modeling system. mpiBLASTmpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. NAMDNAMD is a parallel molecular dynamics code for large biomolecular systems. Applications on POD IU (Rockhopper) 11

Package nameSummary NCBI-Blast The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. OpenAtom OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations at the quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method. OpenFoam The OpenFOAM® (Open Field Operation and Manipulation) CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. It has a large user base across most areas of engineering and science, from both commercial and academic organisations. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics. OpenMPIInfiniband-based Message Passing Interface - 2 (MPI-2) implementation POP POP is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner, and Chervin in which depth is used as the vertical coordinate. The model solves the three-dimensional primitive equations for fluid motions on the sphere under hydrostatic and Boussinesq approximations. Portland GroupCompilers RR is a language and environment for statistical computing and graphics. WRF The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. It features multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation system, and a software architecture allowing for computational parallelism and system extensibility. Applications on POD IU (Rockhopper) (2) 12

On-demand HPC system –Compute, storage, low latency fabrics, GPU, non-virtualized  Robust software infrastructure –Full automation –User and administration space controls –Secure and seamless job migration –Extensible framework –Complete billing infrastructure Services –Custom product design –Site and workflow integration –Managed services –Application support HPC support expertise –Skilled HPC administrators –Leverage 13 yrs serving HPC market More about POD – underlying technology Internet (150Mb, burstable to 1Gb) 13

Created by POD Developers and Administrators Scyld HPC Cloud Management System Create and Manage User and Group HierarchiesSimultaneously Manage Multiple Collocated ClustersCreate Customer Facing Web PortalsUse Web Services to Integrate with Back-End SystemsDeploy HTML5 Based Cluster Management ToolsSecurely Migrate User WorkloadsEfficiently Schedule and Manage Cluster ResourcesCreate and Deploy Virtual Headnodes for User-Specific Clusters 14

Current data centers: Salt Lake City, Indiana University, Mountain View 1,500 cores (AMD and Intel) 240 TB on-demand storage 12 Million Commercial Jobs and Counting… Replaced in-house image analysis cluster with POD and co-located storage Provides cloud analysis services on POD for world-wide bioinformatics customers Replaced Amazon AWS cloud usage with PODTools workflow migration system Nihon ESI provides crash analysis analyses to Honda R&D during Japan’s brown-outs 15

Persistent, customized user environment High-speed Intel and AMD compute nodes (physical) Fast access to local storage (data guaranteed to be local) Highly secure (https, shared key authentication, IP matching, VPN) Billed by the fractional core hour HPC expertise included (Penguin’s core business for many years) Cluster software stack included Troubleshooting included in support Collocated storage options available Highly dependable and dynamically scalable The POD Advantage 16

IU / POD - an example campus bridging The goal of campus bridging is virtual proximity … The biggest problems: – Not enough CI resources available to most researchers – When you go from your campus to the national cyberinfrastructure it can feel like you are falling off a cliff! That’s why you need bridging…. More info on campus bridging at pti.iu.edu/campusbridging IU is collaborating with Penguin Computing to support the national research community in general and particularly with two NSF-funded projects: – eXtreme Science and Engineering Discovery Environment (XSEDE) – National Center for Genome Analysis Support 17

XSEDE and Penguin – part 1 XSEDE (eXtreme Science and Engineering Discovery Environment) is a project, an institution, and a set of services. – As a project, XSEDE is a five-year, $121 million grant award made by the National Science Foundation (NSF) to the National Center for Supercomputing Applications (NCSA) at the University of Illinois and its partners via program solicitation NSF – XSEDE is a successor to the NSF-funded TeraGrid project – As an institution, XSEDE is a collaboration led by NCSA and 18 partner organizations to deliver a series of instantiations of services, each instantiation being developed through a formal systems engineering process. – As a set of services, XSEDE integrates supercomputers, visualization and data analysis resources, data collections, and software into a single virtual system for enhancing the productivity of scientists, engineers, social scientists, and humanities experts. 18

XSEDE and Penguin – part 2 Under TeraGrid, it was never possible to buy “TeraGrid-like” cycles, and many people viewed the allocation process as very slow XSEDE is speeding up the allocation process considerably IU is working with Penguin Computing to install the basic open source XSEDE software environment on Rockhopper It is for the first time ever possible to buy “XSEDE-like” cycles in a matter of minutes using a credit card In some circumstances this will be a much better way to meet peak needs, or use startup funds, than buying and installing “clusters in a closet.” 19

NCGAS & POD IU The National Center for Genome Analysis Support A Cyberinfrastructure Service Center affiliated with the Indiana University Pervasive Technology Institute (pti.iu.edu) Dedicated to supporting life science researchers who need computational support for genomics analysis Initially funded by the National Science Foundation Advances in Biological Informatics (ABI) program, grant # Provides access to genomics analysis software on supercomputers customized for genomics studies including POD IU Particularly focused on supporting genome assembly codes such as: – de Bruijn graph methods: SOAPdeNovo, Velvet, ABySS, – Consensus methods: Celera, Newbler, Arachne 2 For more information, see ncgas.org 20

Summary IU and its partners are collaborating with Penguin Computing Inc. to implement a new model of above-campus services that provides many of the advantages of cloud services, while avoiding many of the drawbacks. The service provided is Cluster as a Service – a real, high performance supercomputer cluster Access is simple – if you are at a.edu or a FFRDC, get out your credit card and go computing As examples of effective campus bridging: – This service is being supported by the IU National Center for Genome Analysis Support – IU is providing the open source components of the XSEDE software environment to provide an XSEDE-like environment that you can access in minutes with a credit card Establishing this partnership is possible through the involvement of our key academic partners: University of California Berkeley, University of Virginia, University of Michigan 21

For more information… podiu.penguincomputing.com pti.iu.edu/ci/systems/rockhopper 22

License terms Please cite this presentation as: Stewart, C.A., M. Jacobs, B. Hallock, R. Knepper and W.K. Barnett. Get the convenience of cloud while keeping your rights – through the IU / Penguin Computing partnership Presentation. Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, the contents of this presentation are copyright 2011 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (creativecommons.org/licenses/by/3.0). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. 23

Thanks Penguin Computing, Inc. for their willingness to forge new paths with IU Staff of the Research Technologies division of University Information Technology Services, affiliated with the Pervasive Technology Institute, who were involved in the implementation of Rockhopper: George Turner, Robert Henschel, David Y. Hancock, Matthew R. Link, Richard Knepper Those involved in campus bridging activities: Guy Almes, Von Welch, Patrick Dreher, Jim Pepin, Dave Jent, Stan Ahalt, Bill Barnett, Therese Miller, Malinda Husk, Maria Morris, Gabrielle Allen, Jennifer Schopf, Ed Seidel All of the IU Research Technologies and Pervasive Technology Institute staff who have contributed to the development of IU’s advanced cyberinfrastructure and its support NSF for funding support (Awards , , , , , , OCI – which supports the Extreme Science and Engineering Discovery Environment) Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute Any opinions presented here are those of the presenter and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies. 24