Tools and Services Workshop Overview of the iPlant Data Store

Slides:



Advertisements
Similar presentations
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
Advertisements

Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Customized cloud platform for computing on your terms !
Computer Lab Teachers are welcome to change or add slides within this presentation to suit the needs of their students or better accommodate the structure.
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
| nectar.org.au NECTAR TRAINING Module 10 Beyond the Dashboard.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
| nectar.org.au NECTAR TRAINING Module 10 Beyond the Dashboard.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
Unix Servers Used in This Class  Two Unix servers set up in CS department will be used for some programming projects  Machine name: eustis.eecs.ucf.edu.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Automated File Server Disk Quota Management May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department Sandia is.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
Mendeley: a tool for organizing references and an aid to research
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Joslynn S. Lee, PhD, Data Science Educator Cold Spring Harbor Laboratory, DNA Learning Center Transforming Science Through Data-driven Discovery.
VMware ESX and ESXi Module 3.
Managing Windows Server 2012
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
Selecting a Web Hosting Service
Tools and Services Workshop
Customized cloud platform for computing on your terms !
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
The importance of being Connected
MANAGING, SHARING, AND PUBLISHING DATA WITH THE CYVERSE DATA STORE
Principles of Information Technology
A Few Questions Before We Begin
Introduction to Data Management in EGI
Tools and Services Workshop Overview of Atmosphere
in All Office 365 Apps for Enterprise Companies
Tools and Services Workshop
Virtualization, Cloud Computing and Big Data
Introduction to Lime Survey
Cloud based Open Source Backup/Restore Tool
Data uploading and sharing with CyVerse
Bioinformatic analysis using Jetstream, a cloud computing environment
Basic Computing for Teachers
Unit 3 NT1330 Client-Server Networking II Date: 1/6/2016
Interoperability of Digital Repositories
Chapter 9: IOS Images and Licensing
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
Information Technology Ms. Abeer Helwa
OPS235: Week 1 Installing Linux ( Lab1: Investigations 1-4)
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Windows Operating System
Preparing for the Windows 8.1 MCSA
Introducing MagicInfo 6
Presentation transcript:

Tools and Services Workshop Overview of the iPlant Data Store iPlant Collaborative Tools and Services Workshop Presenters Notes: This power point is designed for a 25 minute presentation (5-minutes on the introduction; slides 2-6, and 20 minutes for the simple hands on lab; slides 7-X) Overview of the iPlant Data Store

Overview of the iPlant Data Store What is “Big Data”? Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. Wikipedia (http://en.wikipedia.org/wiki/Big_data) Wikipedia gives a nice definition Its important to note, Big Data is not just a problem in Biology, this means we can learn from other disciplines coping with the same problems.

Overview of the iPlant Data Store High-Throughput Biology (Not Just Sequence Data) Genotype Phenotype HT data sources in biology are not limited to sequence data Besides HT phenotyping, there are many other areas of biology (proteomics, metabolomics, etc.) that produce big data. In 11 Days Generates 4TB of raw data 600,000,000,000 bases of DNA sequence (200 human genomes) 1 Day 30 camera sets ~200 movies of dynamic root growth: 4GB a day

Overview of the iPlant Data Store What makes big data different? Why isn't saving/moving/copying big data as simple as using the tools we already have?

Changes in scale - quantitative introduce qualitative differences Overview of the iPlant Data Store What makes big data different? Changes in scale - quantitative introduce qualitative differences and complications?! We use various technologies in the context of our experimental designs. While the principles are the same, as the scale we are operating on changes, we need to accommodate new complications introduced e.g. now you need a light source, now you need to do calibrations, you need to spend a lot of money!

Overview of the iPlant Data Store Some Complications of Big Data Difficult/slow transfers Expense for storage/backup Difficult to share and publish Metadata Analysis

Overview of the iPlant Data Store Scalable, Reliable, Redundant, High-performance Access your data from multiple iPlant services Automatic data backup (redundant between University of Arizona and University of Texas) Multiple way to share data with collaborators Multi-threaded high speed transfers Default 100GB allocation. >1TB allocations available with justification Teragrid XSEDE

Overview of the iPlant Data Store Scalable, Reliable, Redundant, High-performance iRODS is an open-source data management system iRODS supports many data intensive projects like NSF TeraGrid, Large Synoptic Survey telescope, etc.

Overview of the iPlant Data Store There are multiple ways to access the data store Through the Discovery Environment Davis Web interface (data.iplantcollaborative.org) WebDAV iDrop stand alone client iCommands iRODS FUSE (mounted volume in Linux environment)

Overview of the iPlant Data Store Some important items we won’t see in the demo Replication You won’t see these items because you shouldn't have to. They are working For you in the background, but worth reviewing. How the Data Store infrastructure is laid out: 1. replication between AZ and TX 2. connected to local computing resources (AZ-GriD, TX super) 3. Both connected to cloud computing resources 4. All transfers from outside go to AZ, replicated to TX. 5. TX is often busy with lots of file transfers due to their SC facility Arizona Texas Key component of your NSF data management Worry Free!

Overview of the iPlant Data Store Some important items we won’t see in the demo Source Destination Copy Method Time (seconds) CD My Computer cp 320 Berkeley Server scp 150 External Drive 36 USB2.0 Flash 30 iDS MyComputer iget 18 15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s 1 GB / 17.5 seconds

Overview of the iPlant Data Store Some important items we won’t see in the demo One of the complications of big data transfers is that you will always be limited by your local connection and Institutional policies. http://www.speedtest.net/

iPlant Data Store Hands-on Lab

iPlant Data Store Lab By the end of this module you should be able to: Upload “large” (3-4 GB) files into the DE Import “large” (3-4 GB) files into the DE using a URL Understand metadata and annotate a file using the AVU format Share your data with another colleague/user Get started with iCommands (* command line interface)

iPlant Data Store Lab Goal: Import files into the data store, annotate them with metadata and share them with a colleague. Task 1: Import a file into the DE from a URL Task 2: Import a “large” file using iDrop in the DE Task 3: Markup your files with metadata Task 4: Share your data with a colleague / other user

iPlant Data Store Lab Please login to the Discovery Environment. Follow along with the instructor Or Follow along with the handouts on your own Do the lab now!

iPlant Data Store Lab Quick iCommands demo Commands demonstrated: iinit ils iget iexit Enter the host name (DNS) of the server to connect to: data.iplantcollaborative.org Enter the port number: 1247 Enter your irods user name: <your iplant login name> Enter your irods zone: iplant Enter your current iRODS password: <your iplant password> I will be doing a demo using iCommands for windows. There are other options, but this particular Part requires software installation. Learn more in the online documentation: http://www.iplantcollaborative.org/w_icmds

iPlant Data Store Lab iPlant Supports the Life Cycle of Data Markup Search Store Transfer Pre- Publication Post- Publication You’ve seen an overview of how iPlant manages the life cycle of data. Share Collaborate Visualize Analyze Data Results A Results B Algo1 Algo2