Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.

Slides:



Advertisements
Similar presentations
Devops – The Last Mile. Jay Flowers
Advertisements

Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
Advanced Computing and Information Systems laboratory Educational Virtual Clusters for On- demand MPI/Hadoop/Condor in FutureGrid Renato Figueiredo Panoat.
IERG4180 Tutorial 4 Jim.
Introduction to eValid Presentation Outline What is eValid? About eValid, Inc. eValid Features System Architecture eValid Functional Design Script Log.
Virtual Machine and UNIX. What is a VM? VM stands for Virtual Machine. It is a software emulation of hardware. By using a VM, you can have the same hardware.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
SubVersioN – the new Central Service at DESY by Marian Gawron.
Automating Student Course Profile & Student Record Report Uploads to GaDOE Chris A. McManigal Camden County Schools Kingsland, GA.
Using Opal to deploy a real scientific application as a Web service Sriram Krishnan
OM. Brad Gall Senior Consultant

Customized cloud platform for computing on your terms !
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
SUSE Linux Enterprise Server Administration (Course 3037) Chapter 4 Manage Software for SUSE Linux Enterprise Server.
Client Installation StratusLab Tutorial (Orsay, France) 28 November 2012.
Trilinos 101: Getting Started with Trilinos November 7, :30-9:30 a.m. Mike Heroux Jim Willenbring.
WRF Domain Wizard A tool for the WRF Preprocessing System Jeff Smith Paula McCaslin July 17, 2008.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Linux in a Virtual Environment Nagarajan Prabakar School of Computing and Information Sciences Florida International University.
| nectar.org.au NECTAR TRAINING Module 10 Beyond the Dashboard.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
Vagrant workflow Jul. 15, 2014.
Image Management and Rain on FutureGrid: A practical Example Presented by Javier Diaz, Fugang Wang, Gregor von Laszewski.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Image Management and Rain on FutureGrid Javier Diaz - Fugang Wang – Gregor von.
Technical Workshops | Esri International User Conference San Diego, California Creating Geoprocessing Services Kevin Hibma, Scott Murray July 25, 2012.
Connect. Communicate. Collaborate The Installation of RRD Measurement Archive (MA) Roman Łapacz, PSNC 27 th September, 2006 SEEREN2 Summer School, Heraklion.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
Software Architecture in Practice Practical Exercise in Performance Engineering.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
Intro to Datazen.
| nectar.org.au NECTAR TRAINING Module 10 Beyond the Dashboard.
CS 245 – Part 1 Using Operating Systems and Networks for Programmers Jiang Guo Dept. of Computer Science California State University Los Angeles.
Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
Set up environment for mapreduce developing on Hadoop.
Tutorial: To run the MapReduce EEMD code with Hadoop on Futuregrid -by Rewati Ovalekar.
Sponsored by the National Science Foundation Today’s Exercise.
Application Programming in Cloud via Swift Swift Tutorial, CCGrid 2013, Hour 2 Ketan Maheshwari.
EGEE-II INFSO-RI Enabling Grids for E-sciencE YAIM Overview MiMOS Grid tutorial HungChe, ASGC OPS Team.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen Budapest
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
® IBM Software Group © 2006 IBM Corporation Rational Asset Manager v7.2 Using Scripting Tutorial for using command line and scripting using Ant Tasks Carlos.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Introductory Tutorial: OpenStack, Chef, Hadoop, Hbase, Pig I590 Data Science Curriculum Big Data Open Source Software and Projects September Geoffrey.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker usage Zoltán Farkas MTA SZTAKI LPDS
Introduction to Ansible
Everything you've ever wanted to know about using Control-M to integrate any application workload September 9, 2016 David Fernandez Senior Presales Consultant.
ONAP on Vagrant for ONAPers
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
Set up environment for mapreduce developing on Hadoop
Working With Azure Batch AI
StratusLab Tutorial (Bordeaux, France)
ECE544: Software Assignment 3
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Windows Server & Hyper-V Containers Vaggelis Kappas
Chapter 2. Malware Analysis in VMs
INSTALLING AND SETTING UP APACHE2 IN A LINUX ENVIRONMENT
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
Map Reduce Workshop Monday November 12th, 2012
Getting Started With Solr
Hadoop Installation Fully Distributed Mode
DIBBs Brown Dog Tutorial Setup
Presentation transcript:

Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu

Overview Introduction VirtualBox Prepackaged Image Example 1: Sandbox Hadoop WordCount Example 2: Cloud Twister WordCount Exercises – Sandbox Hadoop/Twister Kmeans – Cloud Hadoop/Twister Kmeans

Motivations Background knowledge – Environment setting – Different cloud infrastructure tools – Software dependencies – Long learning path Automatic these complicated steps? Solution: Salsa Dynamic Provisioning Infrastructure (SalsaDPI). – batch-like program

What is SalsaDPI? (Sandbox) OS chef-solo SalsaDPI Jar S/W Applications 1. Read a Conf. file and execute software run-list 2. Install software 3. Run apps User Configuration

OS Chef Apps S/W VM OS Chef Apps S/W VM OS Chef Apps S/W VM OS Chef Client SalsaDPI Jar Chef Server 1.Bootstrap VMs with a conf. file 4. VM(s) Information 2. Retrieve conf. Info. and request Authentication and Authorization 3. Authenticated and Authorized to execute software run-list 5. Submit application commands 6. Obtain Result What is SalsaDPI? (Cloud) * Chef architecture User Conf.

What is SalsaDPI? (Cont.) Chef features – On-demand install software when starting VMs – Monitor software installation progress – Easy to use SalsaDPI features – Provide configurable interface – Automate Hadoop/Twister/other binary execution *Chef Official website:

Hands-on Session

Online Tutorial page roduce-intro.html roduce-intro.html

Prerequisites Install VirtualBox on your laptop, download and import a prepackaged image Install VirtualBox on your laptop, download and import a prepackaged image Setup FutureGrid Eucalyptus environment Make sure you setup the shared folder between host and guest machine correctly # login to FutureGrid India Headnode i136 $ ssh -i ~/fg_private_key.pem ~/fg_private_key.pem have to be replaced to your own private key file name

About Pre-packaged Image It has the following software installed and configured under /root/software/: – Java JDK – Chef – Hadoop – Twister and ActiveMQ – Hbase – Pig – salsaDPI (/root/salsaDPI/)

Important Notes If you have activemq.log and kahadb in directory /root/software/apache-activemq-5.4.2/, please remove them. Otherwise, it will cause errors when running sandbox Twister applications. $ cd /root/software/apache-activemq-5.4.2/ $ ls activemq.log kahadb $ rm -rf activemq.log kahadb

Examples Example 1: Sandbox Hadoop WordCount Example 2: Cloud Twister WordCount Goals – Learn and modify SalsaDPI json configuration file – Execute SalsaDPI java executable with passing the configuration file – n1_chef_sandbox.html n1_chef_sandbox.html * Json metadata format example :

Step 1. Open the Conf. File Locate and open the configuration file. – /root/salsaDPI/sandbox/templates/sandbox _hadoopTemplate.json – /root/salsaDPI/sandbox/templates/sandbox _twisterTemplate.json

Step 2. Modify Conf. File 'applicationParameters': { 'applicationType':'Hadoop', 'localPathOfProgramBinary':'/root/salsaDPI/apps/hadoopWordCount.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#' }

Detail description could be see here: – n1_chef_sandbox.html n1_chef_sandbox.html applicationParameters A json object that contains user-defined application's information applicationTypeType of user-defined application, options: Hadoop or Twister localPathOfProgramBinary Full path of user-defined Hadoop or Twister compiled jar executable on the working machine localPathOfProgramInput Full path of user-defined input file on the working machine, normally, a plaintext or a *.tar.gz file localPathOfBinaryDependency Full path of user-defined program dependency file on the working machine, such as Twister Kmeans initial cluster file programExecuteLocation Path to Twister program execution script refer to Twister package, such as samples/wordcount/bin or samples/kmeans/bin twisterInputFilesPreFix Twister Input files prefix. Refer to the provided package, for Twister WordCount, the file prefixed is wc_data, for Twister Kmeans is km_data. programArgsUser-defined program execution command

Sandbox Hadoop WordCount { // Useful general variables of programArgs for applicationParameters object // #_JAR_#, #_JOB_ID_#, // #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#, // #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_# // 'mode':'sandbox', | 'mode':'cloud', 'mode':'sandbox', // chef-solo related parameters 'chef':{'chefSoloRecipeUrls':' 'chefSoloConfFilePath':'/root/salsaDPI/solo.rb'}, // ssh passwordless related parameters 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/.ssh/id_rsa' }, // runtime softwares such as recipe[hadoopSandbox] or recipe[twisterSandbox] 'softwareRecipes':['recipe[hadoopSandbox]'], // please don't change this line // user-defined application parameters 'applicationParameters':{'applicationType':'Hadoop', 'localPathOfProgramBinary':'/root/salsaDPI/apps/hadoopWordCount.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'} }

Step 3. Execute SalsaDPI with Conf. Execute SalsaDPI with command: $ cd ~/salsaDPI $ java -cp salsaDPI.jar cgl.salsa.salsadpi.Driver The output will be stored at /salsaDPI_output/ /output/*.

Demo Demo video – Video hands-on 1 Sandbox Hadoop WordCount Video hands-on 1 – YouTube link (1080p) YouTube link (1080p)

Examples Example 1: Sandbox Hadoop WordCount Example 2: Cloud Twister WordCount Goals – Make sure FutureGrid Eucalyptus setup and download required files correctly – Learn and modify SalsaDPI json configuration file – Execute SalsaDPI java executable with passing the configuration file – _cloud.html _cloud.html – For live testing, please make sure your name is herehere * Json metadata format example :

Step 1. Open the Conf. File Locate and open the configuration file. – /root/salsaDPI/cloud/templates/cloud_had oopTemplate.json – /root/salsaDPI/cloud/templates/cloud_twi sterTemplate.json

Step 2. Modify Conf. File 'eucaInfo':{ 'eucarcFilePath':'#_FullPath_to_eucarc_File_#', 'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey':'#_Euca_Keypair_PublicKeyName_#', 'eucaVmType':'m1.small', 'amountOfInstances':2 },

Step 2. Modify Conf. File (Cont.) 'ssh': { 'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/#_yourPrivatekey_FileName_#' },

Step 2. Modify Conf. File (Cont.) 'applicationParameters': { 'applicationType':'Twister', 'localPathOfProgramBinary':'/root/salsaDPI/apps/Twister-WordCount- 0.9.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/twisterWordCountInp ut.tar.gz', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'samples/wordcount/bin', 'twisterInputFilesPreFix':'wc_data', 'programArgs':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out 4 1' }

Detail description could be see here: – n2_chef_cloud.html n2_chef_cloud.html eucaInfo A json object that contains cloud mode Eucalyptus related information, 'eucarcFilePath', 'eucaImageEmi', 'eucaSSHPublicKey', 'eucaVmType', and 'amountOfInstances' eucarcFilePathFull path to downloaed eucarc file eucaImageEmiEucalyptus VM image registered on FutureGrid, e.g. emi-52C93AC2 eucaSSHPublicKey Eucalyptus public key name (which you setup during the FutureGrid Eucalyptus setting) eucaVmTypeEucalypus VM type, e.g. c1.medium amountOfInstancesAmount of instances for this job, e.g. 2 ssh A json object that contains ssh information, SSHLoginUsername and SSHPrivateKeyPath SSHLoginUsernameSsh login username, for cloud mode, it must be root. SSHPrivateKeyPathFull path to ssh private key which uses to login to VM.

Step 3. Execute SalsaDPI with Conf. Execute SalsaDPI with command: $ cd ~/salsaDPI $ java -cp salsaDPI.jar cgl.salsa.salsadpi.Driver The output will be stored at /salsaDPI_output/ /output/*.

Cloud Twister WordCount { // Useful general variables of programArgs for applicationParameters object // #_JAR_#, #_JOB_ID_#, // #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#, // #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_# // 'mode':'sandbox', | 'mode':'cloud', 'mode':'cloud', // euca cloud parameters 'eucaInfo':{'eucarcFilePath':'/root/eucarc', 'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey':'stephen', 'eucaVmType':'m1.small', 'amountOfInstances':2}, // ssh passwordless related parameters 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/stephen.pem' }, // runtime softwares such as recipe[hadoopSandbox], recipe[twisterSandbox], // recipe[hadoopCloud], and recipe[twisterCloud] 'softwareRecipes':['recipe[twisterCloud]'], // user-defined application parameters 'applicationParameters':{ 'applicationType':'Twister', 'localPathOfProgramBinary':'/root/salsaDPI/apps/Twister-WordCount-0.9.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/twisterWordCountInput.tar.gz', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'samples/wordcount/bin', 'twisterInputFilesPreFix':'wc_data', 'programArgs':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out 4 1'} }

Demo Demo video – Video Hands-on 2 Cloud Twister WordCount Video Hands-on 2 – YouTube link (1080P) YouTube link (1080P)

'applicationParameters':{ 'applicationType':'Twister', 'localPathOfProgramBinary':'#_FullPath_To_TwisterKmeans_JAR_#', 'localPathOfProgramInput':'#_FullPath_To_TwisterKmeans_Inputs_GZ_File_#', 'localPathOfBinaryDependency':'#_FullPath_To_TwisterKmeans_InitClusterFile_#', 'programExecuteLocation':'samples/kmeans/bin', 'twisterInputFilesPreFix':'km_data', 'programArgs':'./run_kmeans.sh #_BINARY_DEPENDENCY_# 80 #_TWISTER_PARTITION_FILE_# > #_TWISTER_OUTPUTDIR_#/#_JOB_ID_#.txt' } Twister Kmeans Modify Sandbox/Cloud conf. file for Twister Kmeans. Below are hints for Twister Kmeans conf. file.

Hadoop Kmeans 'applicationParameters': { 'applicationType':'Hadoop', 'localPathOfProgramBinary':'#_Path_HadoopKmeans_Jar_#', 'localPathOfProgramInput':'', 'localPathOfProgramDB':'', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_JOB_ID_# > ~/#_JOB_ID_#/#_JOB_ID_#.txt' } Modify a Sandbox/Cloud conf. file for Hadoop Kmeans. Below snapshot provides hints for Kmeans’ programArgs.

Thank you

Cloud Hadoop WordCount { // mode = 'cloud' 'mode':'cloud', // euca cloud parameters 'eucaInfo':{'eucarcFilePath':'/root/eucarc', 'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey':'stephen', // replace stephen to your pub key name 'eucaVmType':'m1.small', 'amountOfInstances':2}, 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/stephen.pem'}, // replace stephen.pem to your private key 'softwareRecipes':['recipe[hadoopCloud]'], 'applicationParameters':{ 'applicationType':'Hadoop', 'localPathOfProgramBinary':'/root/salsaDPI/apps/hadoopWordCount.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt', 'localPathOfProgramDB':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'} }

Demo Cloud Hadoop WordCount /salsaDPI/cloudHadoopWordCount.wmv

Sandbox Twister WordCount { // mode = 'sandbox' 'mode':'sandbox', // chef solo parameters 'chef':{'chefSoloRecipeUrls':' 'chefSoloConfFilePath':'/root/solo.rb'}, 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/.ssh/id_rsa'}, 'softwareRecipes':['recipe[twisterSandbox]'], 'applicationParameters':{ 'applicationType':'Twister', 'localPathOfProgramBinary':'/root/salsaDPI/apps/Twister-WordCount- 0.9.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/twisterWordCountInput.tar.gz', 'localPathOfBinaryDependency':'', 'localPathOfProgramDB':'', 'programExecuteLocation':'samples/wordcount/bin', 'twisterInputFilesPreFix':'wc_data', 'programArgs':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out 4 1'} }

Demo Sandbox Twister WordCount /salsaDPI/sandBoxTwisterWordCount.wmv