Download presentation
Presentation is loading. Please wait.
Published byDamian Newton Modified over 8 years ago
1
eScience Institute, Edinburgh - 14-06-07 What is Data Anyway? Findings from the StORe Project John MacColl Edinburgh University Library
2
eScience Institute, Edinburgh - 14-06-07 Structure 1.Thoughts about data and business models 2.StORe and its findings 3.Curators, publishing and libraries: emerging roles and models
3
eScience Institute, Edinburgh - 14-06-07 The bioscientist's question Need we polarise (source and outputs)? Prose is problematic Structured data publication is more accurate Relying on the literature Google-ises access to results It is not scientific
4
eScience Institute, Edinburgh - 14-06-07 Instead... Mark-up statistical results (i.e. enrich the data) Publish directly on the web in XML databases Link to commonly agreed domain ontologies (Publish the papers as well)
5
eScience Institute, Edinburgh - 14-06-07 But Where is the business model? Who pays? Who does the curation? Libraries?
6
eScience Institute, Edinburgh - 14-06-07 Source Output Middle ware StORe: the Mission ('Superware?')
7
eScience Institute, Edinburgh - 14-06-07 The Survey
8
eScience Institute, Edinburgh - 14-06-07 Two-Way Links? 85% support project aims as potentially advantageous to conduct of research “Would save so much time, making research more productive” “Would allow reanalysis as new methods emerged” “Integration of multiple data sets from different publications”
9
eScience Institute, Edinburgh - 14-06-07 “It should be a requirement that data from publicly funded research is freely available” “Restrict access until results are published to prevent data scavenging” “A creditable aspiration but without a data administrator this represents a large burden from editing, compiling and sanctioning release” Open Access?
10
eScience Institute, Edinburgh - 14-06-07 75% generate and use ‘complex data sets’ Storage of unique and original research on PCs and laptops is commonplace Access influenced by perceived absence of adequate protection and need for interpretation –“Data is held on secured CDs in encrypted format with only an identifying code. The codebook is kept physically separate.” Data volume and lack of time/experience compound issues of data ownership –“I ’m not encouraged or discouraged from providing data. It just does not justify the effort.” Data management?
11
eScience Institute, Edinburgh - 14-06-07 Development and use: culture of self-sufficiency; broad range of effectiveness Some sophistication but limited understanding and familiarity also found across the sector –“Very happy with what we have in astronomy…please don’t mess with them for the sake of some aesthetic..” –“... my understanding of this topic is so limited that giving any answer would be trivial” Repositories?
12
eScience Institute, Edinburgh - 14-06-07 Appropriate assignment acknowledged to be –Critical –Demanding (intellectually and in the time it requires) Consensus on the need for good metadata does not necessarily translate into its provision: –High level of self-assignment Metadata?
13
eScience Institute, Edinburgh - 14-06-07 Scientists and Metadata: the Facts I decide which terms to use and I assign them212 Research colleagues assign metadata on the team's behalf55 Research support staff assign metadata on the team's behalf22 Metadata are assigned by library/information services staff4 Metadata are assigned by the repository administrators37 Metadata are generated automatically63 It is not known who assigns metadata68 Other (please specify) 37
14
eScience Institute, Edinburgh - 14-06-07 Lack of awareness of available support Disinclination to seek support Self-reliance with IT matters Prefer online or documentary support Discipline knowledge essential With few exceptions, management of research data not usually associated with librarians Yet… Support?
15
eScience Institute, Edinburgh - 14-06-07 Qualified demonstration of expertise in metadata, data preservation or curation Examples of uncritical use of sources Both access and sharing restricted by lack of confidence in processes Browsing, online help and other features need expert support and maintenance Need to boost awareness of opportunities from electronic data management
16
eScience Institute, Edinburgh - 14-06-07 Researchers and discovery services: behaviour, perceptions and needs – RIN report, November 2006 “…contact with librarians and information professionals is rare” “…researchers are generally confident in their [self- taught] abilities.., librarians see them as..relatively unsophisticated” “…librarians see it as a problem that they are not reaching all researchers with formal training, whereas most researchers don’t think they need it” Corroboration
17
eScience Institute, Edinburgh - 14-06-07 Improving the Research Management Lifecycle Hypothesis Undertake research/experiment Produce data/commence data curation Publish data ← [and/or paper? - bioscientist] Select and organise evidence Write/assemble paper Submit to peer review Apply revisions and produce final draft Publish paper ← Activate data-publication links [Peer review of data?]
18
eScience Institute, Edinburgh - 14-06-07 Two-way links - benefits and risks Opportunities to Explore a deeper level of detail Validate experiments Track the use and improvement of research output Identify collaborators Confirm completeness of information searches Supplement published papers Potential risks from Uncertainty of peer review Premature dissemination Subversion of scholarly paper Scavenging Lack of interpretive data
19
eScience Institute, Edinburgh - 14-06-07 The StORe Pilot Demonstrator Allows publication only if data has been deposited or identified Groups items (data and publications) as projects Accepts projects based on either primary or secondary analysis –Primary analysis creates new data –Secondary analysis is based on existing data and may or may not create new data Source repository - the UK Data Archive Output repository - the LSE’s Research Articles Online Pilot federation - includes a test institutional repository at the University of Essex
20
eScience Institute, Edinburgh - 14-06-07 How StORe works Value- added items in own house style, additional metadata and ID Items for review (have IDs of data used) UKDA verifies (acquisition number added) and approves (ID added) ItemsItems for review (have IDs of data used and data produced) LSE approves (ID added as part of process) Peer-reviewed items in own house style, additional metadata and ID Own SiteUKDA collaborations Researchers' Area LSE collaborations LSE’s Own Site UKDA (Source) University of Essex (Institutional) Research Articles Online (Output)
21
eScience Institute, Edinburgh - 14-06-07 How StORe works At present, search is across metadata in Essex Find an article and you can –find the associated data –move to the official versions at LSE or UKDA –list all articles and related items of the author Find data and you can –find all articles associated with them –e-mail data owner to request access to approved but embargoed/private data
22
eScience Institute, Edinburgh - 14-06-07 How StORe works A user can search across all or specific collections without being logged in, using a simple Google-type search…
23
eScience Institute, Edinburgh - 14-06-07... or by employing more advanced options
24
eScience Institute, Edinburgh - 14-06-07 Registered users can also view data within Solely owned private or public collections Collaborative collections within a federated source or output repository Collections to which they are a contributor
25
eScience Institute, Edinburgh - 14-06-07 StORe: the future? Essex as multi-disciplined institutional repository in large federation of source & output repositories plus StORe-enabled institutional repositories at other HE/FE institutions If all universities had a StORe system, collaborations could be established with the institutional repository/repositories, and all relevant data repositories. Researchers would use StORe as their single route to deposit and associate They would need curators to assist
26
eScience Institute, Edinburgh - 14-06-07 Data deposit in institutional repositories until accepted by source repository Source repository verifies data authenticity Publication dependent on data deposit and subject to output repository controls Access to non-public objects can be restricted and requires authentication Release of data can be embargoed Curators should know about it and its alternatives StORe: one approach to data publishing
27
eScience Institute, Edinburgh - 14-06-07 Curators link across paradigms Curation embraces appraisal, description, preservation, rights management... It requires understanding of science at its domain levels Good curation is key to –achieving more rapid discovery –achieving more efficient research –preventing bad practice –preventing lazy supervision –preventing sloppy analysis –preventing fraudulent claims
28
eScience Institute, Edinburgh - 14-06-07 Scientific Publication As prose: –Journals provided good practice for a long time –Now no longer true, hence reinvention (Open Access, Creative Commons, SPARC, etc) –There was a business model which is now broken As data –Has never existed –Is beginning to be invented –There is no business model
29
eScience Institute, Edinburgh - 14-06-07 Libraries? Exist to permit sharing Protect fragile business models Sustain otherwise uneconomic enterprises Financial role will be to pay for publication – embracing prose and data, and subsuming curation Must help researchers to publish their data Because the prose-data hierarchy is being gradually subverted (and peer review will ulimately require it)
30
eScience Institute, Edinburgh - 14-06-07 Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.