George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of.

Slides:



Advertisements
Similar presentations
BioPortal as (the only functional) OOR SandBox (so far) Natasha Noy, Michael Dorf Stanford University.
Advertisements

What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Distributed Data Processing
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
The Internet2 NET+ Services Program Jerry Grochow Interim Vice President CSG January, 2012.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Data, Data Everywhere, But Not a Byte to Eat Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information.
Brought to you by 1 ICEF Online 2.0 – an introduction for for Educators and Service providers ICEF Berlin Workshop nd November.
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
SciENcv: NLM’s Fed-wide biosketch tool NIH Regional Meeting May 2015 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health NCI Perspective on Informatics and Clinical Decision.
Teula Morgan The Adaptable Repository: Swinburne Online Journals.
Chapter 2 Database Environment Pearson Education © 2014.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
The University of Texas Research Data Repository : “Corral” A Geographically Replicated Repository for Research Data Chris Jordan.
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
An Introduction to the Open Science Data Cloud Heidi Alvarez Florida International University Robert L. Grossman University of Chicago Open Cloud Consortium.
New Crossroads Transitions & Transformations Science Librarians in the 21st Century Mary M. Case University of Illinois at Chicago.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
CLOUD COMPUTING  IT is a service provider which provides information.  IT allows the employees to work remotely  IT is a on demand network access.
New Jersey: State Industry Sector Investment Initiatives Aaron R. Fichtner, Ph.D Deputy Commissioner NJ Department of Labor and Workforce Development June.
Shayan Zamani University of Science and Technology Mazandaran, Babol 07 Jan 2010 Seminar of “Virtual Machines” course 1/21.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Data! Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
Managing Data: The Long View FORCE15 – 12 January 2015 Amy Friedlander, Ph.D.
API, Interoperability, etc.  Geoffrey Fox  Kathy Benninger  Zongming Fei  Cas De’Angelo  Orran Krieger*
Big Data to Knowledge (BD2K) Jennie Larkin, Ph.D. NIH RDA P5 March 10,2015.
Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
-- Don Preuss NCBI/NLM/NIH
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
ECM and Shared Services Overview AITR Meeting April 23, 2009.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Ian Bird, WLCG MB; 27 th October 2015 October 27, 2015
National Archives and Records Administration Status of the ERA Project RACO Chicago Meg Phillips August 24, 2010.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
How Not to Be the Only One Who Knows About Your Research Sharing and Archiving for Posterity Melanie Radik and Raphael Fennimore Library & Technology Services.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
MPS Workshop 1: Gauging the Impact of Requirements for Public Access to Data November 19, 2015 Jennie Larkin, Ph.D. Office of the Associate Director for.
NIH: DATA SCIENCE & BD2K Jennie Larkin, PhD Senior Advisor, Extramural Programs and Strategic Planning Office of the Associate Director for Data Science,
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
Data Citation Implementation Pilot Workshop
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Detective Sergeant John Powell Detective Division Los Angeles County Sheriff’s Department Cell# Powered.
WHAT SURF DOES FOR RESEARCH SURF’s Science Engagement TNC15 June 18, 2015 Sylvia Kuijpers (SURFnet)
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
SciENcv: NLM’s Fed-wide biosketch tool NIH Regional Meeting May 2016 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
The NIH Data Commons: A Cloud-based Training Environment Philip E. Bourne, Ph.D. FACMI Associate Director for Data Science National Institutes of Health.
SciENcv: a Federal biosketch tool NIH Regional Meeting October 2016 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
Enhancements to Galaxy for delivering on NIH Commons
NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science
Jennie Larkin, PhD Senior Advisor
Commons Credit Model: Update to the BD2K AHM
EOSC MODEL Pasquale Pagano CNR - ISTI
Introduction to Data Management in EGI
Research Data Archive - technology
Christy Shorey Southern Miss
Carolina Mendoza-Puccini, MD
OUR HISTORY & MISSION ABOUT US. OUR HISTORY & MISSION ABOUT US.
Scientific Data: A View from the US
Presentation transcript:

George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services

The Commons Digital Objects (with identifiers) Search (Indexed Metadata and API) Computing Platform Open APIs Software Encapsulation

The Commons Digital Objects (with identifiers) Search (Indexed Metadata and API) Computing Platform Commons Federation (Infrastructure) BD2K Centers DDICC (Search) Existing Resources Indexes Methods Content

Commons Federation (Infrastructure) BD2K Centers DDICC (Search) Existing Resources Indexes Methods Content Works In Searches

Commons Federation (Infrastructure)

Commons Implemented as a federation of ‘conformant’ cloud providers and HPC environments Funded primarily by providing credits to investigators

Cost effective - Only pay for IT support used Drives competition – Better services at lower cost Supports Data sharing by driving science into the Commons Facilitates public-private partnership Scalable to most categories of data expected in the next 5 years.

Novelty: Never been tried, so we don’t have data about likelihood of success Cost Models: Predicated on stable or declining prices among providers True for the last several years, but we can’t guarantee that it will continue, particularly if there is significant consolidation in industry Service Providers: Predicated on service providers willing to make the investment to become conformant Market research suggests 3-5 providers within 2-3 months of program launch Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going Giving investigators an unprecedented level of control over what lives (or dies) in the Commons

Minimum set of requirements for Business relationships (reseller, investigators) Interfaces (upload, download, manage, compute) Capacity (storage, compute) Networking and Connectivity Information Assurance Authentication and authorization Likely to be reviewed self-certification in pilot phase A conformant cloud ≠ an IaaS provider

Likely to evolve into multiple ‘Levels of Compliance’ corresponding to increasing degrees of making data/software meet ‘FAIR’ criteria. Some of our current thinking for basic compliance Objects are physically or logically available in the Commons Objects are indexed with a usable identifier Objects have basic search metadata attached to index entries Objects have clear access rules Objects have basic semantic metadata available Higher levels could include Objects indexed with standards based identifiers (ORCID, doi, etc.) Objects are open to the public (or as open as reasonable given data type) Objects conform to agreed upon standards (CDISC, DICOM, etc.) Data objects are accessible via standard APIs Software is encapsulated (containers, other technology) for easier usage We want and need your feedback on these matters!

Phase 0: Build the plumbing Phase 1: Pilot the model on a small number of investigators experienced with cloud computing, probably within the context of BD2K awards Phase 2: Open the Commons credit process to grantees from a subset of NIH Institutes and Centers Phase 3: Open the process to all NIH grantees

Approved March 23, 2015 “In light of the advances made in security protocols for cloud computing in the past several years and given the expansion in the volume and complexity of genomic data generated by the research community, the National Institutes of Health (NIH) is now allowing investigators to request permission to transfer controlled-access genomic and associated phenotypic data obtained from NIH-designated data repositories under the auspices of the NIH Genomic Data Sharing (GDS) Policy to public or private cloud systems for data storage and analysis.” Responsibility for ensuring the security and integrity remains with the institution.

Sensor Stream = 500 EB/day Stores 69 TB/day Collection = 14 EB/day Store 1PB/day Total Data = 14 PB Store an average of 3.3TB/day for 10 years!

NIH Office of ADDS Vivien Bonazzi, Ph.D. Philip Bourne, Ph.D Michelle Dunn, Ph.D Mark Guyer, Ph.D. Jennie Larkin, Ph.D. Leigh Finnegan Beth Russell NCBI Dennis Benson, Ph.D. Alan Graeff David Lipman, MD Jim Ostell, Ph.D. Don Preuss Steve Sherry