The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store
Welcome to the iPlant Data Store Manage and share your data across iPlant's tools and services
Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. - Wikipedia - ( Challenges Working with Big Data
Challenges: rapid technological progress Working with Big Data
Biologists work with and require access to diverse data types Working with Big Data Challenges: biology is more than sequence data
Working with Big Data Why isn't saving/moving/copying big data as simple as using the tools we already have?
Challenges: moving to a big data mindset Working with Big Data Changes in scale introduce quantitative and qualitative complications Difficult/slow transfers Expense for storage/backup Difficult to share and publish Metadata Analysis
The Data Store services all iPlant platforms iPlant Data Store Overview Access your data from multiple iPlant services Automatic data backup (redundant between University of Arizona and University of Texas) Default 100GB allocation. >1TB allocations available with justification
iRODS is an open-source data management system iRODS supports many data intensive projects like NSF TeraGrid, Large Synoptic Survey telescope, etc. iRODS abstracts data services from data storage to facilitate executing services across heterogeneous, distributed storage systems. Avoid reinventing the wheel iPlant Data Store Overview
Benefits Get Science Done Reproducibility Productivity Store any type of files related to your research An evolving “Data Commons” lets you access important datasets Metadata captures information needed for reproducibility Automatic backup and accessibility support your project’s data management plan IRODS makes high-speed transfers possible (100GB in ~30min)* Share data instantly with collaborators within iPlant iPlant Data Store Overview
Multiple ways to access iPlant Data Store Overview Command linePoint-and-click Discovery Environment iDrop Desktop iCommands
iPlant Data Store Overview Texas Replication Arizona Key component of your NSF data management Worry Free! Some important things we will not “see” in the demo
iPlant Data Store Overview SourceDestinationCopy MethodTime (seconds) CDMy Computercp320 Berkeley ServerMy Computerscp150 External DriveMy Computercp36 USB2.0 FlashMy Computercp30 iPlant Data StoreMyComputeriget18 My Computer cp15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s 1 GB / 17.5 seconds Some important things we will not “see” in the demo
iPlant Data Store Overview Some important things we will not “see” in the demo One of the complications of big data transfers is that you will always be limited by your local connection and Institutional policies.
Hands-on demo iPlant Data Store Overview
Import files from a URL Upload/Download “large” files Share data via a public link and via the Discovery Environment View and manage file metadata By the end of this demo you should be able to:
User perspectives and possible applications Bench Scientist Bioinformatician Uploads all of his fastq files along with 50gb of root growth videos Shares his analyses results with his thesis advisor Created a metadata template for assembled genomes her students and collaborators will place in a shared folder Uses public links in the supplemental materials of her publications Developed a script to automate transfer of data to core users Uses a shared folder to make large datasets accessible Core Facilities iPlant Data Store Overview Images from personas based on: Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies PLOS Biology DOI: /journal.pcbi
Keep asking:
The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI ).