Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator.

Slides:



Advertisements
Similar presentations
EndNote Web Reference Management Software (module 5.1)
Advertisements

EndNote Web Reference Management Software (module 5)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
HNA-Drive Familiarization Presentation. From the address bar in your preferred internet browser, navigate to Site supports: Internet.
Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
Steven Lau Academic Solutions Specialist Microsoft.
5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Tutorial Introduction Fidelity NTSConnect is an innovative Web-based software solution designed for use by customers of Fidelity National Title Insurance.
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
Trimble Connected Community
Customized cloud platform for computing on your terms !
In addition to Word, Excel, PowerPoint, and Access, Microsoft Office® 2013 includes additional applications, including Outlook, OneNote, and Office Web.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iCommands and Other Data Store Resources.
CBEO Portal Presentation 2/6/2008, 4:30pm EST SDSC Or link from
Getting the most out of ArcGIS Web Application Templates
ISpheresImage iSpheresImage Feature Overview and Progress Summary.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Atmosphere.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Knowledge Management Platform Communities of Practice User Guide for CoP users Copyright © 2010 Group Technology Solutions. All Rights Reserved.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
What is it? CLOUD COMPUTING.  Connects to the cloud via the Internet  Does computing tasks, or  Runs applications, or  Stores Data THE AVERAGE CLOUD.
TopCAT Use Cases Priorities User Interface 1 ICAT developer workshop, August 2009 Laurent Lerusse – STFC
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
Module 6: Configuring User Environments Using Group Policies.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
1 Managing Learning Assets New Horizons Conference Virginia Community College System Darek Sady Blackboard Senior Consultant April 2006 Roanoke, VA.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
Mendeley: a tool for organizing references and an aid to research
Special Education Teachers and Speech Language Pathologist Effective Technology Tools By: Beth Fulks, June 23, 2014.
Classroom Wiki Tutorial EDIC 763 Instructional Design Fall 2011 Aysha Bajabaa Dr. Gary Whitt 1 NextBack.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
With Weebly.com. What hoop do I have to jump through to create my own site? Is it expensive? Is it time consuming? Do I have to be tech savvy? Will it.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Joslynn S. Lee, PhD, Data Science Educator Cold Spring Harbor Laboratory, DNA Learning Center Transforming Science Through Data-driven Discovery.
Transforming Science Through Data-driven Discovery Bringing your Bioinformatics tools to CyVerse’s Discovery Environment using Docker Upendra Kumar Devisetty.
Core ELN Training: Office Web Apps (OWA)
Using the Personal Image Photo Library
3.02H Publishing a Website 3.02 Develop webpages..
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
MANAGING, SHARING, AND PUBLISHING DATA WITH THE CYVERSE DATA STORE
Tools and Services Workshop
Tools and Services Workshop Overview of the iPlant Data Store
Data uploading and sharing with CyVerse
Cyberinfrastructure for the Life Sciences
4.02 Develop web pages using various layouts and technologies.
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
MCBIOS 2016 – University of Memphis, TN
Microsoft Office Illustrated Fundamentals
Presentation transcript:

Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory, DNA Learning Center

Welcome to the Data Store Manage and share your data across all CyVerse platforms

Working with ‘Big’ Data Challenges: the scope and scale of life sciences data continue to grow Big data a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time Big data sizes are a constantly moving target currently ranging from a few dozen terabytes (TB) to many petabytes (PB) of data in a single data set “‘Big Data': Big gaps of knowledge in the field of Internet". International Journal of Internet Science 7: 1–5.

Working with ‘Big’ Data Challenges: sequencing example of data generation is cheaper and faster

Working with ‘Big’ Data Challenge: biology encompasses more than sequence data Advanced Imaging GeospatialNetwork Biologists work with and require access to diverse data types

Working with ‘Big’ Data Challenges: changes in data require changes in tools Difficult / slow transfers Expense for storage / backup Difficult to share and publish Analysis Metadata (What Is metadata?) Changes in scale introduce quantitative and qualitative complications

Data Store Overview The Data Store services all CyVerse platforms Access your data from multiple CyVerse services Automatic backup (redundant between University of Arizona and University of Texas) Default 100 GB allocation, > 1 TB allocations available with justification

Data Store Overview Avoid reinventing the wheel iRODS (integrated Rule-Oriented Data System) is an established, scalable, open-source data management system iRODS supports many data intensive projects iRODS abstracts data services from data storage to facilitate executing services across heterogeneous, distributed storage systems Critical for effective data management Works under the hood Folder = Collection

Data Store Overview Benefits Get Science Done Reproducibility Productivity Store any type of files related to your research allocations greater than 1 TB, please include in your request Metadata captures information needed for reproducibility Automatic backup and accessibility support your project’s data management plan iRODS makes high-speed transfers possible (100 GB in ~30 min) Share data instantly with collaborators within CyVerse

Data Store Overview Multiple ways to access for varied skill levels Discovery Environment (DE) CyberduckiCommands Point-and-clickCommand line

Data Store Overview Some important things we will not “see” in the demo Local connections and institutional policies limit data transfer

Data Store Overview Some important things we will not “see” in the demo Data Backups ArizonaTexas Key component of your data management Worry-free Data Transfer SourceDestinationCopy MethodTime (seconds) CDMy Computercp320 Berkeley ServerMy Computerscp150 External DriveMy Computercp36 USB 2.0 FlashMy Computercp30 Data StoreMy Computeriget18 My Computer cp15 Closer to optimum conditions: transfers between University of Arizona and UC Berkeley 100 GB: 26m15s, 1 GB 17.5s

Hands-on Demo Workshop packet: Data storage that supports the Life Cycle of Data Page 7

Data Store Overview Hands-on demo: Data Storage that supports the Life Cycle of Data Transferring data with Cyberduck– page 10 Easiest way to share data with CyVerse – page 11 More Data Store exercises – page By the end of this demo, you should be able to: Import files from a URL Upload / Download ‘large’ files Share data via a public link and via the Discovery Environment View and manage file metadata

Data Store Overview Hands-on demo: Data Storage that supports the Life Cycle of Data Easiest way to share data with CyVerse – page 10 1.Download Cyberduck for your OS (Mac/Windows/Linux) 2.Download CyVerse Data Store (iRODS) profile – link in CyVerse wiki 3.Open Cyberduck 4.Open CyVerse Data Store (iRODS) profile 5.Upload a file from your desktop, see transfer manager

Data Store Overview Hands-on demo: Data Storage that supports the Life Cycle of Data Sharing with other CyVerse users in the CyVerse DE – page 12 1.Collect your neighbor’s CyVerse username 2.Log into CyVerse Discovery Environment 3.Click on sharing icon 4.Find your neighbor, type username Only share with CyVerse users, share files + folders Manage permissions and collaborators Unshare anytime

Data Store Overview Hands-on demo: Data Storage that supports the Life Cycle of Data Import a file into the DE from a URL (database) – page 29 Example file: Bos_taurus.UMD dna_rm.chromosome.1.fa.gz from ftp://ftp.ensembl.org/pub/release-67/fasta/bos_taurus/dna/ 1.Copy URL of a zipped fasta file 2.Open Data App 3.Make a new folder to upload 4.Upload tab – Import from URL

Data Store Overview Hands-on demo: Data Storage that supports the Life Cycle of Data Metadata - data about a data or analysis file or folder that describes its contents and the context of its data. 1.Check the file you want to edit 2.Edit tab  Edit Metadata 3. Attribute and value (examples) collection_date: 01/01/2016 host: human strain: Trypanosoma collected_by: J. Lee temp: 21

Data Store Overview User perspectives and potential applications Welch et al Uploads all of his.fastq files along with 50GB of root growth videos Shares all his analyses results with his thesis advisor Created a metadata template for assembled genomes her students and collaborators will place in a shared folder Uses public links in the supplemental materials of her publications Developed a script to automate transfer of data to core users Uses a shared folder to make large datasets accessible Bioinformatician Core Facilities Bench Scientist

Data Store Overview Time for Summaries and Tips

Summary: Tips for upload and download in the DE ‘Simple’, for small files (~5 files, < 1.8GB) ‘Bulk’, for larger files and folders (< 10GB) Import from URL (no size limit) Advantage Covers most upload/download sharing needs Point and Click Disadvantage Some size / speed limitations

Summary: Tips for searching in the DE Metadata and naming of files are important Basic search bar searches all files and folders where you have permission Advanced search allows searching based on metadata, permissions, and share status Create auto-updated ‘smart’ folders based on searches

Summary: Tips for file names Spaces / Special Characters ~` $ % ^& *()+ = {}[]|\:;"'<>,?/ Many software packages are sensitive to spaces in files names and/or the special characters below Rename uploaded files before using them in an analysis

Summary: Faster transfer with Cyberduck Free cross-platform open source file transfer program Drag and drop files and folders has been extensively tested with large data transfers (60-70 GB) from desktop to Data Store access public and private data with your CyVerse account login can access data anonymously data that has been shared with anyone without the need of an account

Summary: Sharing files in the Data Store Two easy ways to share data from the DE 1) Discovery Environment Sharing Share files / folders instantly Control access permissions Manage sharing between collaborators 2) Sharing via Public Link No CyVerse account required Limited to individual files URLs are public (less secure, can revoke)

Summary: Tips for sharing files When sharing, use this chart to decide appropriate permissions for CyVerse PermissionReadDownloadMetadataRenameMoveDelete Readxx Writexxx Ownxxxxxx Scenario 1: Lab PI with no computational biology experience Scenario 2: Individual who collected data Scenario 3: You ?

Summary: Tips for viewing and editing metadata User metadata stored in Attribute Value Unit (AVU) Edit  Metadata  Edit Metadata Choose your template -> Use term guide Add your own attributes and values / customize Metadata not just for data management, think reproducibility!

Help: ask.iplantcollaborative.org Detailed instructions with videos, manuals, documentation in CyVerse Wiki Search by tag

Parker Antin Nirav Merchant Eric Lyons Matt Vaughn Doreen Ware Dave Micklos CyVerse is supported by the National Science Foundation under Grant No. DBI and DBI Executive Team Transforming Science Through Data-driven Discovery