Download presentation
Presentation is loading. Please wait.
1
Scientific Data: A View from the US
George O. Strawn nitrd.gov
2
Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government
3
Three faces of data Big Data research initiative
Open access becomes the default for U.S. government data Public access mandated for "scientific results" supported by the U.S. government
4
Big Data White House, multi-agency research initiative
Basic research, Disciplinary data, Education and training, Prizes and competitions Joint solicitation by NIH and NSF NIH: BD2K program, Associate director for Data Science
5
Data.gov Open access to U.S. government data
"Voluntary" data.gov participation has yielded ~100,000 data sets to date A new version of data.gov to be unveiled soon utilizes CKAN, "an open source data portal"
6
Public Access to Scientific Results
Both journal articles and data Public access to journal articles pioneered by Harold Varmus at NIH Semantic access to Medline abstracts pioneered by Tom Rindflesch at NLM
7
Public Access to Scientific Data
Federal agencies have submitted their "initial plans" for public access to scientific data to OSTP NITRD may host a series of talks by the agencies on their data access plans Plans for articulating USG scientific data and USG-supported scientific results still in process
8
Some issues regarding data access
Disciplinary versus multi-disciplinary, agency versus multi-agency repositories Plain (human) access versus semantic (machine) access A general digital object architecture? Degrees of openness
9
Digital Object Architecture
An "hour glass" for data? (As the Internet was an hour glass for networks: TCP/IP at the narrow point; many applications above, many implementations below)
10
Digital Object Architecture
Digital Object Data Model & Protocol Logical interface to heterogeneous information management and storage systems Built-in strong authentication and encryption Digital Object Repository Implements the digital object data model and protocol Portal into multiple info and storage systems Security is at the object level & objects can be securely shared Current version successfully used by industry and government Handle System Highly scalable identifier resolution system for digital objects Provides referential integrity as objects move and environments change Proven and in wide use Digital Object Registry Manages metadata records about resources Assigns handles to metadata records and resources Normalizes organizational boundaries through commonly agreed API’s and metadata models
11
Measuring openness Ease of discovery (googleable, etc) Ease of use
Extent of reusability. Legal matters (eg, CC license, derived works friendly, etc)
12
Sustainability Could we duplicate the Internet story?
Public investments create a new activity The new activity leads to a new industry The new industry leads to novel use cases
13
In conclusion Data Intensive Science aspirations are here
Data Intensive Science is slowly emerging One result will be to make the scientific record into a first class scientific object
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.