Cloud Computing for Education and Research

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Cloud Computing for Education and Research Customized cloud platform for computing on your terms ! Nirav Merchant (nirav@email.arizona.edu)

Topic Coverage Introduction to cloud computing concepts Challenges and unique features of cloud computing iPlant Atmosphere overview Designing customized infrastructure for research, course work and training material Using Atmosphere (hands on) for collaborative data analysis Explore use of these resources on your own and ask questions !

Cloud Computing Not a singular technology component Not a black box or alien technology Not a “elixir of scalability”, “panacea for Big Data” etc. It cannot keep growing and scaling without planning (and architecting your application) Unfortunate victim of marketing hype Further complicated by use of jargon, TLA, private cloud, community cloud, hybrid cloud …

What is cloud computing ? http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html

Cloud Computing Amazingly flexible technology It’s a platform that comprise of many uniquely flexible components (more later) Allows us to create “purpose built appliances” Allows us to finally “script our infrastructure” Allows mixing and matching of components that you need to do your science Opens up many new avenues and approaches for teaching topics which usually require complex (pre configured) software tools and data

I do my analysis using the “cloud” Often overheard I do my analysis using the “cloud” It’s the close equivalent of saying: I do my research using “science”

Cloud Computing Zen Don’t get frustrated… This is cutting (bleeding) edge technology There will be plenty of WTF#$@ moments Be patient… Instructions/infrastructure keep changing (s/w version) Be flexible… There will be unanticipated issues along the way Be constructive… Use wiki, forums and share knowledge Make everyone’s experience better Be creative… There is more than one way to do it (TIM-TOWTDI)

iPlant URL’s you should know Wiki.iplantcollaborative.org ask.iplantcollaborative.org www.iplantcollaborative.org

Impromptu survey How many of you use command line How many of you are windows, mac, linux users ? How many of you use HPC ? (or know what HPC is) What resources do you use to teach computing based workshops/training/courses

Atmosphere: motivation Standalone GUI-based applications are frequently required for analysis GUI apps not easily to transform into web apps Need to handle complex software dependencies (e.g specific bioperl version and R modules) Users needing full control of their software stack (occasional sudo access) Need to share desktop/applications for collaborative analysis (remote collaborators) Availability of Next Gen map-reduce based algorithms (currently we have limited support)

As a Service models More Flexibility More Pain Productivity SaaS: Software as a Service (e.g. Clustering/Assembly is a service) IaaS: Infrastructure as a Service (get computer time with a credit card and with a Web interface like EC2) PaaS: Platform as a Service IaaS plus core software capabilities on which you build SaaS (e.g. Hadoop/MapReduce is a Platform) Cyberinfrastructure Is “Research as a Service” http://salsahpc.indiana.edu

But where do I start ? Not very helpful searching for “cloud computing ” related terms (as you will most likely get bombarded by commercials and advertisements in the first few hits !) NIST: National Institute of Standards and Technology Cloud Computing Synopsis and Recommendations (Special Publication 800-146 : May 2012) http://www.nist.gov/customcf/get_pdf.cfm?pub_id=911075

What it is

Challenges of existing platforms Amazon Web Services (AWS) http://aws.amazon.com/ Flexible and scalable High level of expertise required for configurations Fairly challenging for biologists to master all steps Limited lifecycle management (cost, time)

Steps to get started !

What is Atmosphere ? Self-service cloud infrastructure Designed to make underlying cloud infrastructure easy to use by novice user Built on open source Eucalyptus (OpenStack) Fully integrated into iPlant authentication and storage and HPC capabilities Enables users to build custom images/appliances and share with community Cross-platform desktop access to GUI applications in the cloud (using VNC) Start and stop your analysis (without loosing state), much like your laptop (hibernate) Profile your application usage patter Provide easy web based access to remote resources (compute+data+s/w) VNC is virtual networking computing developed

Who is this tutorial designed for ? Users wanting to launch configured images in atmosphere (like a app store) Software/tools developers for application distribution Prototyping/Testing new software/modules (testing software dependencies, conflicts) Tailored software training setups (custom workshops/laboratory courses etc.) Distribute tasks in the “cloud” Collaborate and share screen/applications Extend compute capabilities of existing applications i.e. utilize iPlant API

Terms and jargon for cloud you should know about Virtual Machine (aka VM) Image (aka VM-image) Instance (running VM) IP address Amazon EC2 (Elastic Compute) Amazon EBS (Elastic Block Storage) Amazon S3 (Simple Storage Service)

The iPlant Collaborative Project Atmosphere™: Custom Cloud Computing API-compatible implementation of Amazon EC2/S3 interfaces Virtualize the execution environment for applications and services Up to 12 core / 48 GB instances Access to Cloud Storage + EBS Run servers, CloudBurst desktop use cases. Big data and the desktop are co-local again! >60 hosted applications in Atmosphere today, including users from USDA, Forest Service, database providers, etc. (30 more for postdocs and grad students for training classes)

Atmosphere: Collaboration iPlant Data Store

Lifecycle

Working together How often do you wish you could show your desktop to the person on the phone/skype Let them navigate the application for you ? They can continue your work while you are away ? Give you a judgment call/review details ? Very doable if you Buy screen sharing software Log into a different application

Distributing Tasks (scaling) You have a large collection (aka BoT: Bag of Tasks) e.g many fasta sequence You build a “appliance” and now want to distribute that among many appliances Works well for 1 but how do you feed many ? You REALLY want to add more appliances to finish faster

Makeflow to the Rescue Developed by Doug Thain’s Collaborative Computing Lab at Notre Dame http://www3.nd.edu/~ccl/ Simple way to distribute and manage your workflow/analysis among many computing platforms (appliances) Keeps track of progress, deals with failures and starts where it left (no repeating completed tasks)

Why another workflow system Emphasis on simplicity Very easy to integrate with cloud and HPC resources Does not support complex workflows, handles dependencies in tasks very elegantly Light weight and portable Even works on local machine and makes full use of multiple cores ! Working on certain tasks locally (important for data intensive apps) Workflow system is VERY extensible using various scripting languages (if you choose)

How does it work ? ? Your complex task (needs software X, Y,Z) Atmosphere Image/Appliance Atmosphere Image/Appliance Atmosphere Image/Appliance Atmosphere Image/Appliance Someone built you a script/program Atmosphere Image/Appliance Atmosphere Image/Appliance Atmosphere Image/Appliance ? DATA !

Makeflow instructions out-10-align.fasta : in-10.fasta align.exe align.exe –p 10 –i in-10.fasta -o out-10-align.fasta out-20-align.fasta : in-20.fasta align.exe align.exe –p 10 –i in-20.fasta -o out-20-align.fasta out-30-align.fasta : in-30.dat align.exe align.exe –p 10 –i in-30.fasta -o out-30-align.fasta

Running it Take the makeflow file (previous slide) Run makeflow –f <filename> Launch workers Profit

What happens ? Makeflow instructions + your program (align.exe) + data Tasks Workers in Atmosphere Image/Appliance Workers in Atmosphere Image/Appliance Workers in Atmosphere Image/Appliance Workers in Atmosphere Image/Appliance Workers in Atmosphere Image/Appliance Workers in Atmosphere Image/Appliance Workers in Atmosphere Image/Appliance in-10.fasta In-20.fasta out-10-align.fasta out-20-align.fasta DATA !

Example

When not to use cloud ! When you need “bare metal” performance CPU speed Network Data I/O You application can support MPI across large number of compute nodes (> 2) When applications need large memory (>64Gb)

Users of Atmosphere for teaching Workshops: Frontiers and Techniques in Plant Sciences CSHL 2011,2012 Genotyping by Sequencing Cornell Computational Biology Graduate/U. Graduate course work: BCB 660 Volker Brendel and Amy Toth Fall 2011, Iowa State University ISTA 420/520 Nirav Merchant & Eric Lyons Fall 2012, Univ. of Arizona Intro. Bioinformaics, Anne Lorraine Fall 2012l Univ. of North Carolina Popular community contributed images: PhytoMorph (Nate Miller, U. Wisconsin) Twig2Genome (Haibao Tang, JCVI) Julin Maloof, UC Davis*

Recap on key concepts Purpose built appliances Scriptable infrastructure Scaling multiple self contained tasks Collaborative analysis

Discussion What would you want to build with your custom infrastructure ?

Courses Using Atmosphere

Asian Wild Rice Distribution The Research Genetic studies documented geographic subdivision of Asian wild rice ( Oryza rufipogon ), the progenitor of cultivated Asian rice. Cause unknown. Use species distribution modeling (SDM) to examine environmental factors associated with the spatial and temporal distribution of O. rufipogon. Compare estimated distribution during Last Glacial Maximum (LGM) to genetic data. Results Present distribution of O. rufipogon (Fig. A). Projected paleodistribution at LGM was separated into disconnected east and west ranges (Fig. B). Consistent with current geographic pattern of genetic variation, with two genetic groups that intergrade (Fig. D). Annual precipitation contributes most to SDM estimates. SDM projections for year 2080 indicate an increasing probability of presence and range expansion (Fig. C). Indicates global warming is less threat to this endangered species than other human-mediated factors. Scalable science 325 records of O. rufipogon sample locations from two sources. iPlant enabled Huang and Schaal to successfully pursue this research. Problem Analysis requires large datasets Goal of study was: Understand the historical distribution of O. rufipogon and its relationship to the current geographical pattern of genetic variation. The version of MaxEnt they use has a GUI, so used Atmosphere rather than DE. iPlant Workshop at BSA, July 2011 Pu Huang (Washington U.) attended. Learned about Atmosphere, iPlant’s cloud computing platform. P Huang and B.A. Schaal, Am. J. Botany 99(11). 2012. (A) present, (B) Last Glacial Maximum, (C) Future 2080, (D) Genetic variation.

Hands On Lab

Atmosphere Login Visit http://www.iplantcollaborative.org/ Next click on the Atmosphere Login Image (should be about mid page)

Click the Login button and enter your iPlant username and password

Atmosphere Intro screen Getting familiar with the UI

Search for NGS Viewers v3 08/20/2012(an instance type) and select the purple icon. Give it a name and select the instance size (choose m1.small). By selecting different sizes you will notice project resources change. When ready, press the Launch Instance

Understanding Instance Metrics After an image has launched, you can view information about it. Resource Usage Metrics My Resource Usage at the top of the screen shows how much of your quota in CPUs and GB of memory is being used by your running instances. (Seen at the top) Instance Details The Instance Details tab displays important information about the instance, including the ID assigned to the instance when it was launched, name of the image it is using, unique EMI ID, the instance size, the date you launched the image, and the IP address, which you will need when logging in to the instance. Instance Metrics Instance Metrics allow you to drill down into the usage expended for the running image.

Logging into an Instance Via ssh- If the Shell tab is disabled, you can log into your instance via SSH for you operating system. In your terminal window type: $ssh your_iplant_username@instance_ip_address For example, mine would look like: $ssh amercer@128.196.142.48 Enter your iPlant password and you should be logged into your instance

Terminating an Instance Click instance to terminate in the My Instances list. Either Click the Terminate Instance icon in your My Instances list or Click the Terminate Instance button on the Data tab. Click OK to the warning message.

Requesting More Resources Enter the amount or resources you are requesting. Enter the justification for the request. Click the Request Resources button (right side of page). Your request will be reviewed and you will receive a response within 2 working days.

Reporting an Instance Problem Select the instance which you are having problems with. Click report instance Fill out the Instance Error form. When finished, press the Report this Instance button.

Dealing with technical challenge (Firewall issues)

Logging in via VNC Airport VNC runs a built-in Java VNC viewer from a web browser within the Atmosphere Airport interface and requires Java. This is the more common use. Select the VNC tab If prompted, allow the Java applet to run In the VNC Server field, enter the IP address for your instance, appending :1 after the IP address (should be auto-populated already). Press connect.

Enter your username and password

Here you have successfully logged via VNC.

Terminating a VNC session You can terminate a VNC Viewer session either from the VNC tab in Airport or from the VNC Viewer application window. To terminate the session from Airport: Click the 'X' from the My Instances list or from the VNC tab:

Hands on exercise Launching a instance (one per team) Connecting to it (vnc and ssh) using the web browser and vnc client software Launching a application (flapjack/tablet) Installing a new application (optional) Collaborating with other users (sharing your session) Terminating the instance when you are done