Presentation is loading. Please wait.

Presentation is loading. Please wait.

Infrastructure to enable Data Management Nirav Merchant

Similar presentations


Presentation on theme: "Infrastructure to enable Data Management Nirav Merchant"— Presentation transcript:

1 Infrastructure to enable Data Management Nirav Merchant nirav@email.arizona.edu

2 Topic coverage NSF requirements, recent clarifications for data management plan (DPM) for Bio Typical data lifecycle for life sciences Bottlenecks in data sharing and collaboration iPlant infrastructure for Data Management (DM) Utilizing iPlant infrastructure for your projects Demo and Discussion

3 NSF Requirements The National Science Foundation (NSF) started requiring a data management plan (DMP) for all full proposals submitted, or due, to NSF on or after January 18, 2011 Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of the work under NSF grants.

4 NSF: DMP Guidelines The DMP should describe how the PI(s) will manage and disseminate data generated by the project in sufficient detail to enable evaluation of the plan (and past performance if any) during the merit review process. Adherence to the proposed DMP will be monitored by BIO Program Directors and Committees of Visitors http://www.nsf.gov/bio/pubs/BIODMP061511.pdf Updated 06/15/2011

5 DMP Details Each DMP should address the following questions, as appropriate: What kind of data will be collected, standards employed, and for how long will data be retained? What physical and/or cyber resources and facilities (including third party resources) will be used to store and preserve the data? What data and metadata formats, media and dissemination methods will be used to make the data and metadata available to others? What will be the policies for data sharing and public access (including provisions for protection of privacy, confidentiality, security, intellectual property rights and other rights as appropriate)? What are the rights and obligations of all parties with respect to their roles in and responsibilities for the management and retention of research data (including contingency plans for the departure of key personnel from the project)?

6 POST-AWARD MANAGEMENT After an award is made, implementation of the DMP will be monitored through the annual and final report process and during evaluation of subsequent proposals. Data management must be reported in subsequent proposals by the PI and CoPIs under “Results of prior NSF support”. Annual project reports required for all NSF multi-year awards must include information about progress made in data management and sharing of research products (e.g., citations of relevant publications, conference proceedings, and other types of data sharing and dissemination)

7 Final project reports required for all NSF awards should describe the implementation of the DMP including any changes from the original DMP and the following information: The data produced during the award period The data that will be retained after the award expires How the data will be disseminated and verification that it will be available for sharing The format (including community standards) that will be used to make the data – including any metadata – available to others The archival location of the data POST-AWARD MANAGEMENT

8 Patterns of information use and exchange: Case studies of researchers in the life sciences A report by the Research Information Network (RIN) and the British Library (BL) November 2009 http://www.rin.ac.uk/system/files/attachments/Patterns_information_use-REPORT_Nov09.pdf

9 Patterns of information use and exchange: case studies of researchers in the life sciences Researchers use informal and trusted sources of advice from colleagues, rather than institutional service teams, to help identify information sources and resources The use of social networking tools for scientific research purposes is far more limited than expected Data and information sharing activities are mainly driven by needs and benefits perceived as most important by life scientists rather than top-down policies and strategies There are marked differences in the patterns of information use and exchange between research groups active in different areas of the life sciences, reinforcing the need to avoid standardized policy approaches A report by the Research Information Network (RIN) and the British Library November 2009

10 The data lifecycle Creative Commons Attribution-Non-Commercial-Share-Alike 2.0 UK: England and Wales License

11 Information flow

12 Few Life cycle challenges Collaboration driven by and around data Size of data (number of files + size of each file) is rapidly growing Data intensive computing is common Data handling (tracking flow) is getting complex (versioning) Common tools and protocols do not scale well(ftp, sftp, http) More team science (geographically distributed teams) and data sharing Depositing, archiving to multiple sources

13 iPlant approach to DM Provide access to reliable, scalable data infrastructure Allows easy access, searching, sharing Can be accessed by multiple modes (programs, web based analysis services) Regardless of access method provide a unified view/access Flexible to align it with your information/data life cycle Capable of adopting meta data standards decided by community Deposition/transfer to final archival destination Ability to address the broad spectrum needs for the community (from bench biologists to computational biologists and developers)

14 iPlant DM Inspirations RIN, BL reports and DMP from many institutions Web 2.0 tools such as DropBox, capabilities for tagging and sharing data (photos etc) Input and guidance from disciplines with long standing DM expertise (Astronomy, Planetary Sciences, National Archives) Desire to bring DM capabilities to your desktop, overcoming network and bandwidth limitations Feedback from community members (although much more needed)

15 iPlant Data Storage Based on proven underlying technology components (iRODS, future XSEDE file system) Highly redundant and mirrored (between UA and TACC) Breaks the 2Gb barrier for file size over web Capable for parallel transfers, synchronizing multiple sources and destinations To get started visit documentation at: http://goo.gl/rlS5l or https://pods.iplantcollaborative.org/wiki/display/start/Storing+Your+Data+with+iP lant+and+Accessing+that+Data http://goo.gl/rlS5l https://pods.iplantcollaborative.org/wiki/display/start/Storing+Your+Data+with+iP lant+and+Accessing+that+Data

16 Getting Started How much space do I get (10Gb for start) How much more can I get ? (the steps) How much more do you want ? What tools can I use with it ? Who can see/access it ? (privacy/ownership) How do I tag, search, share my data ? Features tailored for: Individual investigator based projects Multi laboratory projects Advance User/Developers Consortium/large scale distribution

17 Data access the way you like Multi use web interface like DE (with other added benefits of analysis ) Multiple web based options for managing data Organize, arrange, tag as you like Share via web, tokens (single use etc.), url Integrate you own programs/web apps (demo by Eric) Build you own scripts/tools (Demo by Matt) Attach directly to your mac/pc as shared drive Bidirectional sync using idrop (iPlant DropBox for science)

18 Demos


Download ppt "Infrastructure to enable Data Management Nirav Merchant"

Similar presentations


Ads by Google