Digital Curation Centre research agenda

Slides:



Advertisements
Similar presentations
Long-term Digital Metadata Curation Arif Shaon University of Reading 16 April 2014.
Advertisements

A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
A centre of expertise in digital information managementwww.ukoln.ac.uk Approaches To E-Learning: Developing An E-Learning Strategy Brian Kelly UKOLN University.
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
A centre of expertise in data curation and preservation UKOLN Open ForumIWMW June 2006 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service King’s College London.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
THE JOINED UP WORLD OF E-RESEARCH Professor Neil McLean National Technical Standards Adviser to the Department of Education Science and Training (DEST)
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
© HATII, University of Glasgow Introduction to the UK ’ s Digital Curation Centre Prof Seamus Ross Visiting Fellow at Oxford Internet Institute ,
Good practice in Research Data Management Module 6: Tools, training and support.
E-Science: Stuart Anderson National e-Science Centre Stuart Anderson National e-Science Centre.
Writing Impact into Research Funding Applications Paula Gurteen Centre for Advanced Studies.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Considering Open Access – Digital Preservation of arts research data: AKA Managing your “stuff” Open Repositories Conference 2015 Main Strand Dr Robin.
1 The Technical Standards and Your Bid Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by Resource: The Council for Museums, Archives.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
MEDIN Work Plan for By March 2011 MEDIN will be 3 years into the original 5 year development plan started in Would normally ask for continued.
WP1: IP charter Geneva – 23rd June 2009 Contribution from CERN.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Research Information Management: Continuity, Change and Impact Michael Jubb Research Information Network UUK Workshop 5 December 2007.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Long-term preservation and access: the UK context Michael Day, UKOLN, University of Bath RCUK Workshop on Publication.
To Share or not to Share? Michael Jubb, Director, RIN Dryad Workshop 27 April 2010.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Role of Metadata in dissemination of census data Regional Seminar on dissemination and spatial analysis of census data, Nairobi, September, 2010.
Beyond the Repository: Research Systems, REF & New Opportunities William J Nixon Digital Library Development Manager.
UK DP Needs Assessment Project overview 2 November 2005 Martin Waller.
eContentplus 2008 Work Programme
CESSDA SaW Training on Trust, Identifying Demand & Networking
Strategic Information Systems Planning
Building evaluation in the Department of Immigration and Citizenship
Digital Sustainability on the EU Policy Level
An Approach to Software Preservation
GISELA & CHAIN Workshop Digital Cultural Heritage Network
aspects of archive system design
Moving on : Repository Services after the RAE
2. What are the major research priorities for the LAC region?
National e-Infrastructure Vision
LEARNING REPORT 2016 Disasters and Emergencies Preparedness Programme
Changing Practices… Changing Values
NHS Education for Scotland Always Event Project
Introduction to Research Data Management
CRUE – The Way Forward Vicki Jackson
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Metadata for digital long-term preservation
Research Data Management
Common Solutions to Common Problems
Nicolás J. I. Rodríguez & Arild Mellesdal
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Digital Curation Activities at the University of Glasgow
GISELA & CHAIN Workshop Digital Cultural Heritage Network
EAC Education Committee
ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.
MSDI training courses feedback MSDIWG10 March 2019 Busan
Web archives as a research subject
Safety Culture Peter Jacobsson Environment, Safety & Health
Subject repositories Session 6.3
Knowledge Lost in Information
Palestinian Central Bureau of Statistics
Presentation transcript:

Digital Curation Centre research agenda a centre of expertise in data curation and preservation Digital Curation Centre research agenda Michael Day Digital Curation Centre UKOLN, University of Bath http://www.ukoln.ac.uk/ The Digital Curation Centre: a NIEeS community awareness day, Centre for Mathematical Sciences, Cambridge, UK, 16 June 2005 Funded by:

DCC research The DCC research team Links with other research groups Led by Professor Peter Buneman (School of Informatics, University of Edinburgh) Distributed throughout all four DCC partner organisations Strong links with other DCC components, through multi-team working, etc. Links with other research groups Visitors programme Research Agenda The DCC research team is led by Professor Peter Buneman and has four main goals: To draw together the various functions of curation, from the traditional archival functions to the maintenance and publication of evolving knowledge as seen in scientific databases. To identify through direct research collaboration, and through interaction with the service arm of DCC, the key projects in which research is needed. To conduct research in areas already identified by the partners as crucial to digital curation. To institute two-way conduits between research and service in which practical issues can be drawn to the attention of researchers and the products of research can be tested in practice. Current research priorities are: Data integration and publication Performance and optimisation Annotation Appraisal and long-term preservation Socio-economic and legal context: rights, responsibilities and viability Cost-benefit analysis of the data curation process Security: safe and effective data analysis environments Automation of metadata extraction The DCC hosts a Visitors Programme in which talks by those engaged in cutting edge research are brought to the UK to disseminate their findings and engage with DCC staff. See upcoming and previous events for more information. If you have any questions, comments, or offers of collaboration, contact the research team by sending an e-mail to research@dcc.ac.uk.

Research goals (1) To draw together the various functions of curation, from the traditional archival functions to the maintenance and publication of evolving knowledge as seen in scientific databases To conduct research in areas already identified by the partners as crucial to digital curation

Research goals (2) To identify through direct research collaboration, and through interaction with the service arm of DCC, the key projects in which research is needed To institute two-way conduits between research and service in which practical issues can be drawn to the attention of researchers and the products of research can be tested in practice

Current priorities (1) Data integration and publication Review of techniques Publishing data that conforms to a given format or schema Performance and optimisation Safe data analysis environments within data centres Initial testbed based on sky survey databases (in collaboration with the Wide Field Astronomy Unit and AstroGrid) Data integration and publication Review of techniques Report to be delivered within the first year. A special emphasis of this project will be to look at integration techniques in the context of digital preservation metadata. Enhanced publishing systems A new form of data integration arises in the need to publish data that conforms to some format or schema. The situation is typically one in which a community may support several data resources and wants to publish one or more integrated views of these resources. A longer-term research goal is to provide some synthesis between these projects with the goal of building in constraints, security features and tools for coping with schema evolution. Performance and optimisation A safe and effective data analysis environment in the data centre We propose to address this problem in collaboration with the Wide Field Astronomy Unit and AstroGrid, to develop a testbed system based on the SuperCOSMOS and WFCAM Science Archive, two TB-scale sky survey databases created and curated in Edinburgh. Deliverables: Year 1: A report assessing the scientific requirements for developing a safe data analysis environment and review existing work in this area. Year 2: Development and deployment of a testbed data analysis system based on the SuperCOSMOS and WFCAM Science Archives. Year 3: The testbed data analysis system will be generalised, in collaboration with partners in other communities (e.g. bioinformatics, geological sciences) where this problem is important.  

Current priorities (2) Performance and optimisation (continued) Automated metadata extraction and generation Essential for testing the scalability of metadata-based preservation strategies Review of tools, assessment of text mining techniques Annotation Survey of the forms of annotation Annotation and provenance A model for data transformations that maintains annotation and provenance Automation of metadata extraction These issues are important in the scientific domain, as well as within the digital library community. In many disciplines vast quantities of data products are archived and adequate metadata must be made available to make their subsequent retrieval efficient. Our research into this topic will focus on a review of currently available tools for automatic metadata extraction, together with an assessment of how text mining techniques may be applied to this problem. Deliverables: Year 1: A study of automated metadata extraction to aid long-term digital preservation, and the use of text mining techniques in digital curation. Year 2: A report to summarise existing tools available for automated metadata extraction and their place in the curation process, together with an assessment of what further tools should be developed. Annotation Forms of Annotation A survey is needed of the various forms of annotation. Of particular interest will be the extent to which forms of annotation can be predicted when metadata formats or databases are designed. We know from examples that this is not always possible, and in these cases, the difficulty of subsequently attaching annotations will be investigated. Annotation and Provenance The grand challenge here is to develop a model for data transformations in which annotation and provenance are fully described. BIO-DAS, for example, is a system in which annotation is carried with provenance of data items. This project will also investigate annotation of special (e.g. spatial) structures and the attachment of annotation to data.

Current priorities (3) Appraisal and long-term preservation Appraisal techniques Investigating the applicability and scalability of traditional appraisal techniques in 'data-intensive' contexts Dynamic databases Preservation techniques for evolving metadata and databases Appraisal and long-term preservation A study of appraisal techniques This study involves the questions of when and how to retain and preserve data. It requires a two-way communication of expertise between the "library/archives" and "scientific database" or components of the proposal. The flood of raw scientific data will defeat Moore's law and some form of appraisal of experimental scientific data is essential. This is an area in which library and archival expertise may be of use to scientists. Development and field-testing of database and dynamic data set preservation software We also need to archive dynamic data – the fluid datasets that constitute preservation metadata and much scientific data (especially annotation data). Recent work has shown that this kind of data has certain properties that allow all past versions of the database to be efficiently preserved. Tests on existing scientific data sets indicate that all versions of such a database over a year can be stored in an XML file that is typically 10-15% larger than the size of one version of the data. The frequency of archiving is limited only by the speed of the algorithm, which, in its basic form, is the time to scan the archival file and the most recent version. The method enjoys certain other useful properties. It interacts well with compression techniques, and it permits – with the appropriate indexing structures – temporal queries on objects in the file. Development of preservation techniques for evolving metadata and databases  We need also to investigate issues such as how database attributes change in their interpretation (meaning) when the database is active over very long periods of time. Some e-Science databases will continue to grow either for long but fixed periods (e.g. possibly sky surveys?), or indefinitely (e.g. datasets of genetic information). In that time, it is likely that concepts and attitudes will change, and this may mean the data is interpreted differently. Sometimes this will add value (new meanings discovered in old data); sometimes it will obscure value. As an example from another field, the meaning of a credit in a University student record system will change over time. Explicit recognition must be made of this.

Current priorities (4) Socio-economic and legal contexts Networks of trusted repositories Varying preservation role for repositories Roles for co-operation, exchange formats, replication, etc. Economic cost-benefit analysis of curation processes Quantifying costs and benefits Testing economic viability of curation processes Socio-economic and legal context: rights, responsibilities and viability The organisational dynamics of a network of trusted repositories In the near future, there is likely to be a variety of different trusted repositories that will need to interact both with each other and with their designated communities. One of the key roles of the DCC will be to help synergise effort between these organisations.   The first stage of this will be a research study examining the organisational dynamics of trusted repositories and how a future network of repositories might function in the UK higher and further education and research contexts and in the wider global network. This would identify the full range of potential stakeholders and propose ways in which they could co-operate in order to prevent duplication of effort, e.g. on common technical approaches to the curation of digital data, repository certification, etc. The study would also help initiate a debate on the long-term curation role of institution-based repositories and how they might link with services based at national or international level. For example, this might include the replication of data between repositories or the development of policies that deal with institutional impermanence. The project will produce a report the organisational dynamics of a network of trusted repositories. Economic cost-benefit analysis of the data curation process Research is needed to help funding bodies, and others, to quantify the costs as well as the benefits of Data Curation. The OAIS Reference Model stresses the importance of the "Designated User Community/Communities" and the need to understand their knowledge bases in determining the needs for preserving/curating information in a useful way. Ontologies and data models form part of this but just as important are the expected levels of knowledge which users might have, and the "Gödel ends". It is likely that digital curation to the level encouraged by the DCC will imply additional costs which must be estimated. Benefits are more difficult to quantify; for Research funding bodies, publication and citation statistics would be obvious ones. Other measures must be constructed in discussion with the funding bodies and other stakeholders. It is likely that a range of such measures will be needed to suit a variety of types of archives.   Economic analysis and modelling techniques will be applied in this research study. The information obtained will provide evidence for the economic viability of the data curation process and the associated data repositories. Where appropriate, additional expertise in economic analysis and modelling methods will be sought to complement the expertise within the Consortium.  The project will produce a preliminary analysis of the economic cost-benefits of the data curation process.

Current priorities (5) Socio-economic and legal contexts (continued) Rights and responsibilities The legal contexts of curation, e.g. impacts of the Database Directive Complexity of rights held in databases, impacts on aggregation and reuse of data Rights and responsibilities Where tools are built specifically to enable aggregation and re-use of existing works and data, issues of IP ownership and exploitation of the results of aggregation become difficult to resolve. Whereas rights management systems may be able to deal with some ownership issues, complexity can quickly develop where different ownership, access and re-use conditions apply. Pressure to release and further share results may be impeded by IPRs. Upstream IPR claims may inhibit downstream exploitation. How could the tools, and more particularly the legal framework, enable research and development whilst at the same time ensuring that the rights of third parties are not compromised? Where might the balance be struck between respecting existing rights and the general public interest in furthering research and development, and what strategies might assist in achieving that goal? The project will produce a scoping report identifying intellectual property rights and responsibilities in the development of digital curation tools and their uses.

Further information Digital Curation Centre Web site: http://www.dcc.ac.uk/ Contact: research@dcc.ac.uk