CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iCommands and Other Data Store Resources.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Knowledge Management Platform Communities of Practice User Guide for CoP users Copyright © 2010 Group Technology Solutions. All Rights Reserved.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Kathleen Shearer Data management: The new frontier for libraries.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Joslynn S. Lee, PhD, Data Science Educator Cold Spring Harbor Laboratory, DNA Learning Center Transforming Science Through Data-driven Discovery.
Enhancements to Galaxy for delivering on NIH Commons
Using core competencies in curriculum design
Core ELN Training: Office Web Apps (OWA)
Accessing the VI-SEEM infrastructure
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
Amazon Storage- S3 and Glacier
MANAGING, SHARING, AND PUBLISHING DATA WITH THE CYVERSE DATA STORE
Making “Open Data” Work: Challenges for Data Integration in Genomics Research
A Few Questions Before We Begin
Joseph JaJa, Mike Smorul, and Sangchul Song
Welcome! Thank you for joining us. We’ll get started in a few minutes.
Tools and Services Workshop
Tools and Services Workshop Overview of the iPlant Data Store
Data uploading and sharing with CyVerse
University of Technology
Jay Bhatt Drexel University Libraries
SRA Submission Pipeline
Cyberinfrastructure for the Life Sciences
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Bird of Feather Session
Implementing the Child Outcomes Summary Process: Challenges, strategies, and benefits July, 2011 Welcome to a presentation on implementation issues.
MCBIOS 2016 – University of Memphis, TN
  1-A) How would Arctic science benefit from an improved GIS?
Implementing the Child Outcomes Summary Process: Challenges, strategies, and benefits July, 2011 Welcome to a presentation on implementation issues.
What is UiPATH? For more details visit this link online-training.
Presentation transcript:

CyVerse Data Store Managing Your ‘Big’ Data

Welcome to the Data Store Manage and share your data across all CyVerse platforms

Working with Big Data Challenges: the scope and scale of life sciences data continue to grow Big data a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time Big data sizes are a constantly moving target currently ranging from a few dozen terabytes (TB) to many petabytes of data in a single data set. “‘Big Data': Big gaps of knowledge in the field of Internet". International Journal of Internet Science 7: 1–5.

Working with Big Data Challenges: data generation is cheaper and faster

Working with Big Data Challenge: biology encompasses more than sequence data Advanced Imaging GeospatialNetwork Biologists work with and require access to diverse data types

Working with Big Data Challenges: changes in data require changes in tools Difficult / slow transfers Expense for storage / backup Difficult to share and publish Analysis Metadata (What Is metadata?) Changes in scale introduce quantitative and qualitative complications

Data Store Overview The Data Store services all CyVerse platforms Access your data from multiple CyVerse services Automatic backup (redundant between University of Arizona and University of Texas Default 100 GB allocation, > 1 TB allocations available with justification

Data Store Overview Avoid reinventing the wheel iRODS (integrated Rule-Oriented Data System) is an established, scalable, open- source data management sytem iRODS supports many data intensive projects iRODS abstracts data services from data storage to facilitate executing services across heterogeneous, distributed storage systems

Benefits Data Store Overview Get Science Done Reproducibility Productivity Store any type of files related to your research An evolving “Data Commons” lets you access important datasets Metadata captures information needed for reproducibility Automatic backup and accessibility support your data management plan iRODS makes high-speed transfers possible (100 GB in ~30 min) Share data instantly with collaborators within CyVerse

Data Store Overview Multiple ways to access Command linePoint-and-click iCommands Cyberduck Discovery Environment

Data Store Overview Some important things we will not “see” in the demo Data Backups ArizonaTexas Key component of your data management Worry-free Data Transfer SourceDestinationCopy MethodTime (seconds) CDMy Computercp320 Berkeley ServerMy Computerscp150 External DriveMy Computercp36 USB 2.0 FlashMy Computercp30 Data StoreMy Computeriget18 My Computer cp15 Closer to optimum conditions: transfers between University of Arizona and UC Berkeley 100 GB: 26m15s, 1 GB 17.5s

Data Store Overview Some important things we will not “see” in the demo Local connections and institutional policies limit data transfer

Data Store Overview Hands-on demo

Data Store Overview User perspectives and potential applications Bench Scientist Bioinformatician Uploads all of his.fastq files along with 50GB of root growth videos Shares all his analyses results with his thesis advisor Created a metadata template for assembled genomes her students and collaborators will place in a shared folder Uses public links in the supplemental materials of her publications Developed a script to automate transfer of data to core users Uses a shared folder to make large datasets accessible Core Facilities Images from personas based on: Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies PLOS Biology DOI: /journal.pcbi

Data Store Overview Time for Summaries and Tips

Tips for any transfer method Spaces / Special Characters Many software packages are sensitive to spaces in files names and/or the special characters below Rename uploaded files before using them in an analysis ~` $ % ^& *()+ = {}[]|\:;"'<>,?/

Tips When sharing, use this chart to decide appropriate permissions PermissionReadDownloadMetadataRenameMoveDelete Readxx Writexxx Ownxxxxxx

Keep asking: ask.iplantcollaborative.org Detailed instructions with videos, manuals, documentation in Learning Center