Introduction to NMRbox Project and NMRbox Virtual Machine

Slides:



Advertisements
Similar presentations
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Advertisements

What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Installing software on personal computer
Installing Windows XP Professional Using Attended Installation Slide 1 of 41Session 2 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.
SP2 Mikael Nystrom. Agenda Översikt Installation.
© 2012 The McGraw-Hill Companies, Inc. All rights reserved. 1 Third Edition Chapter 3 Desktop Virtualization McGraw-Hill.
Using Virtualization in the Classroom. Using Virtualization in the Classroom Session Objectives Define virtualization Compare major virtualization programs.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Customized cloud platform for computing on your terms !
DIY: Your First VMware Server. Introduction to ESXi, VMWare's free virtualization Operating System.
Virtualization Lab 3 – Virtualization Fall 2012 CSCI 6303 Principles of I.T.
TRD 1: NMRbox Development
Hands-On Virtual Computing
Using Virtualization in the Classroom. Using Virtualization in the Classroom Session Objectives Define virtualization Compare major virtualization programs.
Desktop Virtualization
Virtualization Technology and Microsoft Virtual PC 2007 YOU ARE WELCOME By : Osama Tamimi.
Alessandro Cardoso, Microsoft MVP Creating your own “Private Cloud” with Windows 10 Hyper- V WIN443.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Hands-On Virtual Computing
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
CEG 2400 FALL 2012 Windows Servers Network Operating Systems.
Virtual Machines Module 2. Objectives Define virtual machine Define common terminology Identify advantages and disadvantages Determine what software is.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Explain the purpose of Microsoft virtualization. Objective Course Weight 2%
Intro To Virtualization Mohammed Morsi
Advanced Computing Facility Introduction
READ ME FIRST Use this template to create your Partner datasheet for Azure Stack Foundation. The intent is that this document can be saved to PDF and provided.
Using Virtualization in the Classroom
Virtualization for Cloud Computing
Guide to Operating Systems, 5th Edition
IT06 – HAVE YOUR OWN DYNAMICS NAV TEST ENVIRONMENT IN 90 MINUTES
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
Current Generation Hypervisor Type 1 Type 2.
Supporting Windows 8.1 Krystle Portocarrero | Training Experts Inc.
Desktop Virtualization
Chapter 5: Using System Software
Introduction to NMRbox
Clinton A Jones Eastern Kentucky University Department of Technology
Virtualization, Cloud Computing and Big Data
SAN and NAS.
Debunking the Top 10 Myths of Small Business Server: Using Windows SBS in Larger Environments Abstract: This session will debunk some of the common myths.
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Hands-On Virtualization in the Classroom
5 SYSTEM SOFTWARE CHAPTER
Network+ Guide to Networks 6th Edition
Chapter 4.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Data Security for Microsoft Azure
Microsoft Virtual Academy
HC Hyper-V Module GUI Portal VPS Templates Web Console
Microsoft Virtual Academy
Guide to Operating Systems, 5th Edition
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Microsoft Virtual Academy
Microsoft Virtual Academy
5 SYSTEM SOFTWARE CHAPTER
Windows Virtual PC / Hyper-V
Microsoft Virtual Academy
NMRbox: A Resource for Biomolecular NMR Computation
Microsoft Virtual Academy
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Microsoft Virtual Academy
Microsoft Virtual Academy
HC VMware Module
Microsoft Virtual Academy
Preparing for the Windows 8.1 MCSA
Presentation transcript:

Introduction to NMRbox Project and NMRbox Virtual Machine Mark Maciejewski UConn Health Thanks the organizers for giving me a chance to speak today. The title is “Towards reproducible computation for NMR with NMRbox. Primarily focused on a new virtual machine we are building pre-configured with software used in all aspects of NMR data processing and analysis “Think inside the box”

Outline Lecture Motivation for the project NMRbox platform Benefits to users and developers Usage Hands on (at the beginning of the tutorial session) - Adam Account management Connect to NMRbox VM Set the display resolution Software inventory File transfers The NMRbox project will deliver… NMRbox VM – A virtual machine pre-configured with a range of software used in biological NMR Will provide access to significant computational resources through individual VMs and computational clusters Will provide advanced training material which in many cases will be integrated into the VM The platform will have BMRB integration Annotation of workflows And interoperability between different NMR software packages In addition Bayesian tools will be incorporated into some existing NMR software packages And an API will be developed for developers to incorporate Bayesian inference into their packages.

Motivation: Abundance of software Figure shows a weighted “word cloud” based on software frequency from the BMRB ~120 packages from BMRB As part of the NMRbox project we have identified over 200 packages From looking through J. Bio NMR BMRB and simple web search Hundreds of packages cited in BioMagResBank depositions, J. Bio-NMR, and other journals.

Motivation: Fragmentation Operating systems Programming languages Another motivation is the fragmentation of platforms Several different operating systems, programming languages, and libraries This leads to an enormous burden for software developers and end users attempting to install software and is a major burden for non-computational experts. Libraries BLAS

Motivation: Persistency Platforms become obsolete Developers graduate Software time bombs Many NMR software packages lack persistency for a variety of reasons Platforms become obsolete Graduate students or other developers move on and leave the lab This leaves many programs “as-is” and makes them difficult to keep running on newer platforms Grants end Software time-bombs. In order to push users to keep their software up-to-date to avoid issues with platforms and the evolution of OSs developers sometimes put time bombs in their software. While this helps to a degree for actively developed packages it still leads to old versions of their software not being persistent and can lead to problems if the developer ends their support. Grants end

Motivation: Meta-software packages SHIFTX2 Sparky Rosetta MODELLER NMRPipe Python scripts Another motivation for NMRbox is the growing number of meta-packages such as a new software package called Compass from Chad Rienstra’s lab. This program attempts to predict the structure of a protein from solid state spectra The compass program itself is developed as python scripts, but The workflow relies on NMRPipe, Sparky, Rosetta, and ShiftX, Modeller Not only does the end user now need to install Compass They need to install the dependent programs They also need to configure compass based on the installation of the other packages. Compass will undoubtedly rely on certain versions of these ancillary programs which can lead to issues for end users. These issues combined make it very difficult for non-experts to utilize NMR software and adds to an overly high activation barrier for a researcher to dive into NMR Experimental protein structure verification by scoring with a single, unassigned NMR spectrum. Courtney, Rienstra, et al., Structure, 2015.

Motivation: Computational reproducibility A computational study is reproducible when it provides the “complete software environment needed to reproduce the figures” - D. Donoho, Stanford Obstacles Missing primary empirical data Missing meta-data Missing software (scripts, programs) Non-persistence of software Manual interventions Read from slide through obstacles

Challenges Question How do we address these challenges? Abundance of software (discovery) Fragmentation of OSes, programming languages, libraries Persistence of resources Complexity of design and installation Reproducibility of results Read from slide through obstacles Question How do we address these challenges? Answer NMRbox VM

Deliverables – primary tools Platform NMRbox VM: A virtual machine pre-configured with a wide range of software used in biological NMR Significant computational resources Data BMRB integration & richer depositions Metadata management and workflow annotation Analytics Bayesian tools to enhance data analysis and interpretation API for developers to incorporate Bayesian inference Read from slide through obstacles

Deliverables – community services Training and Dissemination Workshops, tutorials, and guides User and developer support Driving Biological Projects (DBPs) Test beds for NMRbox technology development What limits your progress? Collaboration and Service (C&S) Apply technologies to challenging biomedical research problems Read from slide through obstacles

NMRbox VM. What’s included? Acquisition Agnostic – Install all software available Access Persistent – Archive all versions Content – Software packages 100+ packages installed (see https://nmrbox.org) Spectral reconstruction Spectral visualization Automated assignment Structure determination Molecular visualization Validation Chemical shift prediction Dynamics Residual dipolar coupling Meta packages General purpose Instrument manufactures Read the slide Note on Agnostic – We are trying to have VMs with a wide range of software. We will work hard to enhance the workflows of the most used software, but at the same time allow everyone access to their “favorite” software. There have also been some efforts lately for developers to release the software as a VM for easier installation, such as NMRPipe. The issue then is that you would have multiple VMs for all the software installed – we hope to have everything under a single umbrella.

NMRbox VM. What’s included? Content – Productivity Tools OS xubuntu 16.04 over a dozen editors scientific python packages R and R tools office tools drawing tools Octave shells browsers Dropbox Read the slide Note on Agnostic – We are trying to have VMs with a wide range of software. We will work hard to enhance the workflows of the most used software, but at the same time allow everyone access to their “favorite” software. There have also been some efforts lately for developers to release the software as a VM for easier installation, such as NMRPipe. The issue then is that you would have multiple VMs for all the software installed – we hope to have everything under a single umbrella.

NMRbox VM. What’s included? Release 3 features added GPUs to support 3D drawing PyMOL, VMD, Chimera, and others GPUs to support CUDA processing NAMD, others coming soon Commercial software dataChord spectrum Analyst, dataChord spectrum Miner, MestReNova Matlab compiled binaries ALATIS, GUARDD, TITAN virtual on-screen keyboard See Release notes at - https://nmrbox.org/files/release-notes-version-3-0.pdf Read the slide Note on Agnostic – We are trying to have VMs with a wide range of software. We will work hard to enhance the workflows of the most used software, but at the same time allow everyone access to their “favorite” software. There have also been some efforts lately for developers to release the software as a VM for easier installation, such as NMRPipe. The issue then is that you would have multiple VMs for all the software installed – we hope to have everything under a single umbrella.

Virtual Machine Terminology A software-based emulation of a guest computer backed by the physical resources of a host computer, managed by a hypervisor. VM = Access Local installation (standalone or downloadable) Connect to server (PaaS = Platform-as-a-Service) Advantages Over-subscribe the host computer Snapshot the VM and restore to any point Run multiple OS’s on a single computer “spin-up” VMs in minutes Dynamically load balance VMs across multiple hosts No performance penalties on modern computers Read the slide Note on Agnostic – We are trying to have VMs with a wide range of software. We will work hard to enhance the workflows of the most used software, but at the same time allow everyone access to their “favorite” software. There have also been some efforts lately for developers to release the software as a VM for easier installation, such as NMRPipe. The issue then is that you would have multiple VMs for all the software installed – we hope to have everything under a single umbrella.

Standalone NMRbox VM host computer hypervisor NMRbox (guest) shared folder OS / NMR software user accounts Just to get a feel for how an end user would interact with a downloadable VM here is a short animation User would download a hypervisor software package such as VirtualBox Then download NMRbox Start the hypervisor and then import the NMRbox VM Essentially the user would have a fully functional OS pre-configured with a wide variety of software used in NMR data processing and analysis. User would then need to get their data into the VM Data is a bit trickier with a local VM. Your data can reside in a virtual disk (however this is a single flat file to the OS and can be dangerous) Shared folders work great, but can be tricky to configure the hypervisor to access at times. USB or file servers are the best but require additional hardware. These issues are resolved with a PaaS version of the VM

High Performance Storage PaaS NMRbox VM Authentication Server VM host server Remote Users NMRbox VM - 1 CPU, Ram, NIC NMRbox VM - 2 CPU, Ram, NIC user data Cloud Storage backups user data user home folders NMR Software OS Files In a PaaS version each user will have their own NMRbox VM spun-up on our servers. They will access the VM with full GUI via RealVNC or ssh for advanced users A key is that the user storage and authentication is all separated from the VMs allowing seamless migration as new versions of NMRbox VMs are released and for going back to older versions if needed. High Performance Storage

PaaS deployed with enterprise-class resources 100 GB network 12 VM servers 480 cores 3.8 TB memory Redundant internal network Network attached storage 100’s of TBs available to NMRbox Ultra reliable cloud storage in excess of PB NMRbox VM is being deployed at UConn Health with enterprise level hardware The research network has a 100 GB network connection to our ISP That feeds into a 40 GB network fabric connecting all the switches in the datacenter VM hosts and compute clusters are connected via 10 GB connections with a separate 10 GB dedicated connection to storage The VM hosts will run the NMRbox VMs for individuals and developers Users home folders and the files needed to run the VMs are on fast storage with performance similar to a local SSD We also have access to cloud storage for backups and extra space for user data. The university has a 3 PB geo-dispersed storage system that continues to grow and offers unmatched reliability. It is currently configured for 15 – 9s of reliability. Users will connect via ssh or RealVNC. RealVNC offers several benefits Full GUI Free and runs on all devices Everything is encrypted Built-in file transfer for those not comfortable with scp Maps your local printer Runs in daemon mode. Users just connects and does nothing else. 38 NVIDIA GPUs dramatically increasing graphic performance & CUDA processing

VM Requirements for Users Standalone VM 64-bit hardware (Windows, OSX, Linux,…) any modern laptop and desktop Server based PaaS VM ssh or VNC (Windows, OSX, Linux, tablet, phone, 32-bit hardware, …) Network connection Oracle VirtualBox VMware Workstation VMware Fusion VMware Player

Benefits Users Developers Instructors “Zero-configuration” Access Training Computational resources Discovery Persistence Reproducibility Cost Developers Single platform Discovery Usage metrics Persistence Community Developer tools Computational resources Instructors Access to NMRbox VMs for courses and workshops

Practical aspects Large VM model Updates Backups NMRbox VMs configured with many cores, high memory, and GPUs Multiple users per VM, each user has two VMs (username.nmrbox.org and username2.nmrbox.org) CPU and memory utilization restricted to 50% of full VM GPUs restrict VM management Updates Additional software will be added to “live” NMRbox VMs Version numbers updated All states archived Software versions updated on major releases Older major VM releases continue to run with reduced resources at version.nmrbox.org Backups User data backed up daily

Home folder and archive folder Practical aspects Large memory VM A large memory VM can be “spun-up” for users if needed Home folder and archive folder Each user has two home folders; /home/nmrbox/username and /nmr/archive/username Google Group We have started a Google Group at https://groups.google.com Search for NMRbox to join. Support Email support@nmrbox.org Downloadable version Downloadable version in final testing

Practical aspects Host workshops with NMRbox VMs The NMRbox team will “spin-up” custom VMs to support other workshops File permissions and access Home and archive folders are not accessible to others by default. Will setup lab groups if desired. /public folder for quick sharing Contact us Suggestions for packages to include Suggestions about the package Issues with the NMRbox platform

NMRbox Usage 500+ Users

NMRbox Usage package total_runs total_users rnmrtk 41846863 69 nmrpipe 7104050 186 shiftx2-v110-linux-20160912 482070 3 amber16 215403 24 openbabel-2.4.1 105769 6 hmsIST 101733 44 nmr-scripts 66566 141 cns_solve_1.3 62756 67 mddnmr 51360 62 cns_solve_1.21 28272 nustool 19380 65 xplor-nih-2.43 12322 35 rosetta 7076 28 nmrfam-sparky 5285 97 namd_gpu 4556 namd_cpu 3121 9 shifts-5.1 2119 37 connjurst 2027 56 ensemble 1698 ccpnmr 1197 xplor-nih-2.45 1113 7 molmol 968 34 modelfree 873 16 NMRViewJ 621 57 aria2.3 614 vmd 611 61 NMRFxProcessor 486 Redcat 334 4 chimera 301 21 relax 291 27 pymol-1.8.2.1 262 54 redcraft 251 13 pymol-1.8.6.0 228 26 flexible-meccano 211 12 fmcgui2.5_linux 189 16 TENSORV2_PC9 167 24 cyana-3.97 166 glove 142 11 camera 111 cara 83 INCHI-1 78 14 nmr_wash-1.0.0-linux 68 15 cpmg_fitd9 66 21 pales 63 7 ponderosa 60 nestanmr 57 TREND-1.0 52 8 tinker 48 6 ALATIS 43 17 GISSMO 41 rnmr 37 fastmodelfree 33 5 MestReNova 29 9 ssp 4 BMRB-CS-Rosetta-Submission topspin 25 nessy adapt_nmr_enhancer azara-2.8

Cite NMRbox Very Important!! If you utilize NMRbox in your research please cite and acknowledge us. Details at https://nmrbox.org NMRbox: A Resource for Biomolecular NMR Computation. Maciejewski, M.W., Schuyler, A.D., Gryk, M.R., Moraru, I.I., Romero, P.R., Ulrich, E.L., Eghbalnia, H.R., Livny, M., Delaglio, F., and Hoch, J.C., Biophys J., 112: 1529-1534, 2017. [PMID: 28445744, DOI: 10.1016/j.bpj.2017.03.011] "This study made use of NMRbox: National Center for Biomolecular NMR Data Processing and Analysis, a Biomedical Technology Research Resource (BTRR), which is supported by NIH grant P41GM111135 (NIGMS)."