Infrastructure to enable Data Management Nirav Merchant

Slides:



Advertisements
Similar presentations
NSF Data Management Plan
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
USG INFORMATION SECURITY PROGRAM AUDIT: ACHIEVING SUCCESSFUL AUDIT OUTCOMES Cara King Senior IT Auditor, OIAC.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
OVERVIEW & LIBRARY SUPPORT FOR DATA MANAGEMENT/SHARING Jim Van Loon, MSME/MLIS Science Librarian.
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
Data Management What? Why? How?. 2 What do we mean by … Managing your Research (aka Data) … Ensuring physical integrity of files and helping to preserve.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
NSF Data Management Plan Requirements Alex Kanous
National Science Foundation: Transforming Undergraduate Education in Science, Technology, Engineering, and Mathematics (TUES)
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Copyright 2006 M.R.Thorley/NERC Mark Thorley, Natural Environment Research Council Research Outputs: Their Access & Preservation A perspective.
1 Sharing Research Data in Hong Kong (position paper) Professor John Bacon-Shone Associate Director, Knowledge Exchange The University of Hong Kong Forum.
Guidance on Preparing a Data Management Plan
DMPTool Expert Resources and Support for Data Management Planning Tao Zhang Michael Witt Purdue University Libraries 1.
Customized cloud platform for computing on your terms !
DATA MANAGEMENT SUPPORT FOR RESEARCHERS …………………………………………
Data Management Plans Bill Michener University Libraries and Biology Dept. University of New Mexico.
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
SCSC 311 Information Systems: hardware and software.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
UVa Library Research Data Services
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iCommands and Other Data Store Resources.
Data Management Planning
A 40 Year Perspective Dr. Frank Scioli NSF-Retired.
The Data Management Plan Amanda Rinehart, Data Librarian Illinois State University The Data Management Plan by Amanda K Rinehart is licensed under a Creative.
Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries.
Data Management and Accessibility S.M. Kaye PPPL Research Seminar 12/16/2013.
Changing Implementation of NSF Data Policy Dr. Jennifer M. Schopf, NSF OD/OIA/EPSCoR On behalf of the NSF Data Working Group March 17, 2011 CASC Spring.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
DMPTool and Data Management Basics Hannah Norton July 29, 2014 Image modified from :
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
DRAFT EDMC Procedural Directives NOAA Environmental Data Management Committee 12/3/2015 1
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Elements of a Data Management Plan Bill Michener University of New Mexico
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
Marv Adams Chief Information Officer November 29, 2001.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
1Mobile Computing Systems © 2001 Carnegie Mellon University Writing a Successful NSF Proposal November 4, 2003 Website: nsf.gov.
Breakout Session Assignments and Goals. Summary of Objectives and Charge to Breakout Groups Desired outcome: a comprehensive vision for NACP Data Management.
Course, Curriculum, and Laboratory Improvement (CCLI) Transforming Undergraduate Education in Science, Technology, Engineering and Mathematics PROGRAM.
Data Management Lesley A. Brown Director of Proposal Development.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
Office of Science Statement on Digital Data Management Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
NSF INCLUDES Inclusion Across the Nation of Learners of Underrepresented Discoverers in Engineering and Science AISL PI Meeting, March 1, 2016 Sylvia M.
C OLLEGE OF A GRICULTURE D ATA C OHORT D ATA M ANAGEMENT P LANNING J ANUARY 27, 2014 Jake Carlson Associate Professor of Library Science / Data Services.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Bob Jones EGEE Technical Director
CFI John R Evans Leaders Fund Digital Data Management
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Research Data Management
Exchanging Data Management Plans with DDI
Presentation transcript:

Infrastructure to enable Data Management Nirav Merchant

Topic coverage NSF requirements, recent clarifications for data management plan (DPM) for Bio Typical data lifecycle for life sciences Bottlenecks in data sharing and collaboration iPlant infrastructure for Data Management (DM) Utilizing iPlant infrastructure for your projects Demo and Discussion

NSF Requirements The National Science Foundation (NSF) started requiring a data management plan (DMP) for all full proposals submitted, or due, to NSF on or after January 18, 2011 Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of the work under NSF grants.

NSF: DMP Guidelines The DMP should describe how the PI(s) will manage and disseminate data generated by the project in sufficient detail to enable evaluation of the plan (and past performance if any) during the merit review process. Adherence to the proposed DMP will be monitored by BIO Program Directors and Committees of Visitors Updated 06/15/2011

DMP Details Each DMP should address the following questions, as appropriate: What kind of data will be collected, standards employed, and for how long will data be retained? What physical and/or cyber resources and facilities (including third party resources) will be used to store and preserve the data? What data and metadata formats, media and dissemination methods will be used to make the data and metadata available to others? What will be the policies for data sharing and public access (including provisions for protection of privacy, confidentiality, security, intellectual property rights and other rights as appropriate)? What are the rights and obligations of all parties with respect to their roles in and responsibilities for the management and retention of research data (including contingency plans for the departure of key personnel from the project)?

POST-AWARD MANAGEMENT After an award is made, implementation of the DMP will be monitored through the annual and final report process and during evaluation of subsequent proposals. Data management must be reported in subsequent proposals by the PI and CoPIs under “Results of prior NSF support”. Annual project reports required for all NSF multi-year awards must include information about progress made in data management and sharing of research products (e.g., citations of relevant publications, conference proceedings, and other types of data sharing and dissemination)

Final project reports required for all NSF awards should describe the implementation of the DMP including any changes from the original DMP and the following information: The data produced during the award period The data that will be retained after the award expires How the data will be disseminated and verification that it will be available for sharing The format (including community standards) that will be used to make the data – including any metadata – available to others The archival location of the data POST-AWARD MANAGEMENT

Patterns of information use and exchange: Case studies of researchers in the life sciences A report by the Research Information Network (RIN) and the British Library (BL) November

Patterns of information use and exchange: case studies of researchers in the life sciences Researchers use informal and trusted sources of advice from colleagues, rather than institutional service teams, to help identify information sources and resources The use of social networking tools for scientific research purposes is far more limited than expected Data and information sharing activities are mainly driven by needs and benefits perceived as most important by life scientists rather than top-down policies and strategies There are marked differences in the patterns of information use and exchange between research groups active in different areas of the life sciences, reinforcing the need to avoid standardized policy approaches A report by the Research Information Network (RIN) and the British Library November 2009

The data lifecycle Creative Commons Attribution-Non-Commercial-Share-Alike 2.0 UK: England and Wales License

Information flow

Few Life cycle challenges Collaboration driven by and around data Size of data (number of files + size of each file) is rapidly growing Data intensive computing is common Data handling (tracking flow) is getting complex (versioning) Common tools and protocols do not scale well(ftp, sftp, http) More team science (geographically distributed teams) and data sharing Depositing, archiving to multiple sources

iPlant approach to DM Provide access to reliable, scalable data infrastructure Allows easy access, searching, sharing Can be accessed by multiple modes (programs, web based analysis services) Regardless of access method provide a unified view/access Flexible to align it with your information/data life cycle Capable of adopting meta data standards decided by community Deposition/transfer to final archival destination Ability to address the broad spectrum needs for the community (from bench biologists to computational biologists and developers)

iPlant DM Inspirations RIN, BL reports and DMP from many institutions Web 2.0 tools such as DropBox, capabilities for tagging and sharing data (photos etc) Input and guidance from disciplines with long standing DM expertise (Astronomy, Planetary Sciences, National Archives) Desire to bring DM capabilities to your desktop, overcoming network and bandwidth limitations Feedback from community members (although much more needed)

iPlant Data Storage Based on proven underlying technology components (iRODS, future XSEDE file system) Highly redundant and mirrored (between UA and TACC) Breaks the 2Gb barrier for file size over web Capable for parallel transfers, synchronizing multiple sources and destinations To get started visit documentation at: or lant+and+Accessing+that+Data lant+and+Accessing+that+Data

Getting Started How much space do I get (10Gb for start) How much more can I get ? (the steps) How much more do you want ? What tools can I use with it ? Who can see/access it ? (privacy/ownership) How do I tag, search, share my data ? Features tailored for: Individual investigator based projects Multi laboratory projects Advance User/Developers Consortium/large scale distribution

Data access the way you like Multi use web interface like DE (with other added benefits of analysis ) Multiple web based options for managing data Organize, arrange, tag as you like Share via web, tokens (single use etc.), url Integrate you own programs/web apps (demo by Eric) Build you own scripts/tools (Demo by Matt) Attach directly to your mac/pc as shared drive Bidirectional sync using idrop (iPlant DropBox for science)

Demos