The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Welcome to the Minnesota SharePoint User Group. Introductions / Overview Project Tracking / Management / Collaboration via SharePoint Multiple Audiences.
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
Customized cloud platform for computing on your terms !
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iCommands and Other Data Store Resources.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
ISpheresImage iSpheresImage Feature Overview and Progress Summary.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Knowledge Management Platform Communities of Practice User Guide for CoP users Copyright © 2010 Group Technology Solutions. All Rights Reserved.
Build an Automated Workflow Visual Workflow Creator Discovery Environment.
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu Advisor: Prof. Geoffrey C. Fox 1/14/2009.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
The iPlant Collaborative
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
1 Managing Learning Assets New Horizons Conference Virginia Community College System Darek Sady Blackboard Senior Consultant April 2006 Roanoke, VA.
Maintaining and Updating Windows Server 2008 Lesson 8.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator.
Special Education Teachers and Speech Language Pathologist Effective Technology Tools By: Beth Fulks, June 23, 2014.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Joslynn S. Lee, PhD, Data Science Educator Cold Spring Harbor Laboratory, DNA Learning Center Transforming Science Through Data-driven Discovery.
Core ELN Training: Office Web Apps (OWA)
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
MANAGING, SHARING, AND PUBLISHING DATA WITH THE CYVERSE DATA STORE
Tools and Services Workshop
Tools and Services Workshop Overview of the iPlant Data Store
Data uploading and sharing with CyVerse
Cyberinfrastructure for the Life Sciences
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
MCBIOS 2016 – University of Memphis, TN
What is UiPATH? For more details visit this link online-training.
Presentation transcript:

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data

Welcome to the iPlant Data Store Manage and share your data across iPlant's tools and services

Big Data: data sets whose size and complexity is beyond the capabilities of commonly used tools to capture, manage, and process the data within a tolerable time frame. Big Data: constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in single data sets, with different types of data sets potentially deeply intertwined. - Wikipedia ( Wikipedia Challenges: the scope and scale of life sciences data continue to grow Working with Big Data

Challenges (sequencing example): data generation is cheaper and faster Working with Big Data

Biologists work with diverse data types Working with Big Data Challenges: biology encompasses more than sequence data Advanced ImagingGeospatialNetwork

Challenges: changes in data require changes in tools Working with Big Data Changes in scale introduce quantitative and qualitative challenges Transfer: difficult/slow Store/backup: expensive Share/publish: all of the above Analyze: Tools complex; format & reformat Understand: Don’t forget your Metadata

The Data Store services all iPlant platforms iPlant Data Store Overview Access your data from multiple iPlant services Automatic backup (redundant between University of Arizona and University of Texas) Default 100GB allocation, >1TB allocations available with justification

iRODS (integrated Rule-Oriented Data System) is an established, scalable, open-source data management system iRODS supports many data intensive projects iRODS abstracts data services from data storage to facilitate executing services across heterogeneous, distributed storage systems Avoid reinventing the wheel iPlant Data Store Overview Critical for effective data management Works under the hood Folder = Collection

Benefits Get Science Done Reproducibility Productivity Store files of any type related to your research Access key data sets in “Data Commons” Capture data about data in metadata Base your data management plan on iPlant’s automatic backup and accessibility Take advantage of IRODS high-speed transfers (100GB in ~30min)* Share any data instantly from within iPlant iPlant Data Store Overview

Multiple ways to access for varied skill levels iPlant Data Store Overview Command linePoint-and-click Discovery Environment iDrop Desktop iCommands

iPlant Data Store Overview Texas Replication Arizona Key component of your data management Worry Free! Some important things we will not “see” in the demo SourceDestination Copy Method Time (seconds) CDMy Computercp320 Berkeley ServerMy Computerscp150 External DriveMy Computercp36 USB2.0 FlashMy Computercp30 iPlant Data StoreMyComputeriget18 My Computer cp15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s, 1 GB / 17.5 seconds Data TransfersData Backups

iPlant Data Store Overview Some important things we will not “see” in the demo Local connections and institutional policies limit data transfers

User perspectives and potential applications Bench Scientist Bioinformatician Uploads all of his fastq files along with 50gb of root growth videos Shares his analysis results with his thesis advisor Core Facilities iPlant Data Store Overview Images from personas based on: Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies PLOS Biology DOI: /journal.pcbi Creates a metadata template for assembled genomes her students and collaborators will place in a shared folder Uses public links in supplemental materials for her publications Develops a script to automate transfer of data to core users Uses a shared folder to make large datasets accessible

Hands-on demo iPlant Data Store Overview

Hands-on demo: Managing “Big” Data Import files from a URL Upload/Download large files Share data View and manage file metadata By the end of this demo you will know how to:

iPlant Data Store Overview Hands-on demo: Managing “Big” Data Packet pages 8-13 (iDrop, Sharing) & (URL Import, iDrop Lite, Managing and Adding Metadata) 1. Upload/Download files Via DE Via iDrop iCommands 2. Import files from a URL ( 3. Share data via a public link via the DE (practice with a neighbor) 4. View, generate and manage metadata Demo Components

Time for Summaries and Tips? iPlant Data Store Overview

Searching in the Discovery Environment Basic search bar searches all files and folders where you have permission Advanced search allows searching based on metadata, permissions, and share status Create auto-updated ‘smart’ folders based on searches

Summary: Upload and Download In the Discovery Environment  ‘Simple’, for small files (~ 5 files, <1.9 GB)  ‘Bulk’, for larger files and folders (<10GB)  Import from URL (no size limit) Advantage + Disadvantage - Covers most upload/download sharing needs Point and Click Some size/speed limitations

Tips Spaces /Special Characters Many software packages are sensitive to spaces in file names and/or the special characters below. Rename uploaded files before using them in an analysis. Good advice for any transfer method. ~` $ % ^& *()+ = {}[]|\:;"'<>,?/

Summary: Faster Transfers iDrop  Drag and Drop files and folders  File sizes up to your total allocation  Fast transfers  Synchronize folders with Data Store Advantage + Disadvantage - Upload/download large / many files Sharing and permission features more complex

Summary: Sharing Files in the Data Store Discovery Environment Sharing Sharing via Public Link Share files/folders instantly Control access permissions Manage sharing between collaborators No iPlant account required Limited to individual files URLs are public (less secure, can revoke) 2 Easy ways to share data from the Discovery Environment

Tips When sharing, use this chart to decide appropriate permissions PermissionReadDownloadMetadataRenameMoveDelete Read Write Own

Viewing and Editing Metadata In the DE User metadata stored AVUs Attribute – Value – Unit Template-based metadata

Tips Can only use one template at a time (this will change) Can create custom metadata templates Not just for data management, think reproducibility! Metadata in the DE

Detailed instructions with videos, manuals, documentation in Learning Center Keep asking: ask.iplantcollabortive.org

The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI ).