Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director

Slides:

Advertisements

Similar presentations

April 2010 MRC Data Sharing Policy Peter Dukes Policy Lead – Data Sharing & Preservation.

Advertisements

DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.

DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.

Electronic Theses - The Next Stage Institutional Repositories: A view from SHERPA Bill Hubbard SHERPA Project Manager University of Nottingham.

Digital | Curation | Centre Continuing Access to Research Data: The New Digital Curation Centre Peter Burnhill Director (Phase One) Funded by:

Philip LordDigital Archiving Consultancy Alison Macdonald Digital Archiving Consultancy Liz LyonDigital Curation Centre David GiarettaDigital Curation.

A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.

S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.

© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.

RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.

Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.

The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.

A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons.

UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference.

A centre of expertise in digital information management UKOLN is supported by: Curating the Scientific Record: The Challenges Ahead Dr.

A centre of expertise in digital information management UKOLN is supported by: Dealing with Data: Roles, Rights, Responsibilities & Relationships.

Dr Liz Lyon, Associate Director Outreach UK Digital Curation Centre An Introduction Digital Curation Centre a centre of support for data curation and preservation.

Digital | Curation | Centre An Introduction to the UK Digital Curation Centre Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University.

A centre of expertise in digital information management UKOLN is supported by: British Academy e-Resources Policy Review: UKOLN Report.

Digital | Curation | Centre UK Digital Curation Centre An Introduction Dr Liz Lyon, Associate Director Outreach IACMST MED Forum, November 2005 Funded.

UKOLN is supported by: Emergent technologies & digitisation: the institutional impact. Liz Lyon & Kevin Edge VCs Retreat, October a.

Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support.

A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.

A centre of expertise in digital information management UKOLN is supported by: Changing Roles, Responsibilities and Relationships Dr Liz.

A centre of expertise in digital information management UKOLN: providing support to the RSCs. Dr Liz Lyon, Director RSC Managers Meeting.

A centre of expertise in digital information management UKOLN is supported by: Digital Futures for MLAs? A snapshot in real time. Dr Liz.

A centre of expertise in digital information management UKOLN is supported by: Memory institutions and the social fabric of the Web Dr.

Digital | Curation | Centre Supporting Digital Curation to safeguard research data: adding value today and ensuring long-term access Dr Liz Lyon, DCC Associate.

EBank UK CCLRC Workshop February eBank and CCLRC Workshop February 2005 University of Bath.

Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK

Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,

A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.

A centre of expertise in data curation and preservation Preserving Digital ArchivesLUCAS March 2006 Funded by: This work is licensed under the Creative.

A centre of expertise in data curation and preservation UKOLN Open ForumIWMW June 2006 Funded by: This work is licensed under the Creative Commons.

A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.

Peter Clarke UK National e-Science Centre University of Edinburgh e-Infrastructure in the UK.

INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.

EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.

Digital | Curation | Centre UK strategies for digital preservation and curation Chris Rusbridge, Digital Curation Centre Funded by:

Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.

© HATII, University of Glasgow Introduction to the UK ’ s Digital Curation Centre Prof Seamus Ross Visiting Fellow at Oxford Internet Institute ,

David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.

Selecting journals for digitisation Piecing together the puzzle to create a European model Dr Hazel Woodward Cranfield University, UK

Considering Open Access – Digital Preservation of arts research data: AKA Managing your “stuff” Open Repositories Conference 2015 Main Strand Dr Robin.

Caring and Sharing Collaboration in Digital Curation outside North America Ross Harvey Simmons College, Boston Curation Matters: 17 June 2010.

Peter Burnhill Director (Phase One) Funders: Aims & Organisation Digital Curation Centre a centre of expertise in data curation and preservation.

David Giaretta Associate Director (Development) for Chris Rusbridge (Director) Funders: Digital Curation Centre a centre of expertise in data curation.

Seamus Ross Director, HATII & ERPANET Associate Director of DCC Services Funders: Service Definition & Delivery Digital Curation Centre a centre of expertise.

Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.

NDCC CANDO Present: Malcolm AtkinsonDirector NeSC & Professor of Computer Science, University of Glasgow Peter Bunemandesignate Research Director & Professor.

Digital Curation Centre: tools and services under development David Giaretta Associate Director (Development) Funders: Digital Curation Centre a centre.

Research Information Management: Continuity, Change and Impact Michael Jubb Research Information Network UUK Workshop 5 December 2007.

UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.

The DEER The Distributed European Electronic Resource.

Dr Liz Lyon Associate Director, Outreach Funders: Engaging the Users: the Outreach & Community Support Programme Digital Curation Centre a centre of expertise.

CombeDay Making Data Openly Available Simon Coles.

1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service Arts and Humanities e-Science Support Centre King’s.

Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.

Long-term preservation and access: the UK context Michael Day, UKOLN, University of Bath RCUK Workshop on Publication.

UKOLN is supported by: Library futures in the new research landscape. Dr Liz Lyon, UKOLN, University of Bath, UK CURL Members Meeting October 2004, London.

Joint Information Systems Committee Repositories Support Project Summer School 2008 Amber Thomas, JISC.

NRF Open Access Statement

Digital Sustainability on the EU Policy Level

GISELA & CHAIN Workshop Digital Cultural Heritage Network

Moving on : Repository Services after the RAE

Digital Curation Centre research agenda

Common Solutions to Common Problems

GISELA & CHAIN Workshop Digital Cultural Heritage Network

Bird of Feather Session

Presentation transcript:

Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director Digital Curation Centre a centre of support for data curation and preservation UK Digital Curation Centre One Year On Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director

Overview What are the challenges that the DCC faces? Why is digital curation important? What are the challenges that the DCC faces? About the people and our collaborative approach Addressing the issues How can you contribute to the DCC?

Curation? “maintaining and adding value to a trusted body of digital information for current and future use”

Digital curation continuum For later use? In use now (and the future)? Static Dynamic Data preservation Data curation

Assuring permanent access to the records of science & the humanities? Long term access to primary data Increasing data volumes from eScience and Grid-enabled / cyberinfrastructure applications Changing research paradigm: data-driven science, “big science” Observational data, simulations, large-scale experimentation Multi-media resources, statistical data, surveys, geo-spatial data……

Facilitate “post-processing” and knowledge extraction Enable the acquisition of newly-derived information and knowledge Run complex algorithms over primary datasets Mining (data, text, structures) Modelling (economic, climate, mathematical, biological) Analysis (statistical, lexical, pattern matching, gene) Presentation (visualisation, rendering)

Provide additional functionality beyond digital preservation processes Annotations Gene and protein sequences e-Lab books (Smart Tea Project in chemistry)

Emerging policy on open access to data Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media The scholarly knowledge cycle : linking research data to publications eBank UK Project http://www.ukoln.ac.uk/projects/ebank-uk/ Data analysis, transformation, mining, modelling Aggregator services: national, commercial Harvestingmetadata Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Deposit / self-archiving Validation Validation Publication Linking Emerging policy on open access to data Data curation: databases & databanks Peer-reviewed publications: journals, conference proceedings

DCC people (some of them…) Management & Co-ordination Director Chris Rusbridge (University of Edinburgh) Community Support & Outreach Led by Dr Liz Lyon (UKOLN, University of Bath) Service Definition & Delivery Led by Professor Seamus Ross (HATII [ERPANET], University of Glasgow) Development Led by Dr David Giaretta (Astronomical Software & Services, CCLRC) Research Led by Professor Peter Buneman (Informatics, University of Edinburgh)

The challenges we face Standards Interoperability issues: technical & hopefully soluble Scale Volume and diversity of datasets Culture Bringing communities together Library/information science/archives “document tradition” Domain research (chemists, astronomers, biologists) Computer science (databases) Commercial suppliers (storage technology)

More challenges…… Process Highly-distributed organisation: use collaborative tools Skills Distributed amongst the 4 partners & beyond Engagement Lots of existing work and many significant players Impact Visible & measurable, in the short & long-term Meeting expectations (which are high…..) Of the community and our funders

User requirements analysis Commissioned study Leona Carpenter Reporting now Desk-based research Focus groups Interviews Results will inform research, development service definition / delivery and outreach Recommendations and priority tasks

Some sound bytes… R&D issues: Annotation services, Ontology development, Automating metadata creation, Tools and toolkits, Data Format Description Language, Identifiers, Registries, Economic and cost-benefits studies Advisory services :“Ask-a-Curator”,FAQs, reports, briefings, awareness-raising materials, best practice guidance, Storage media, “Like Erpanet”, advise Government, Research Councils, funding bodies Professional development: Short courses, conferences, seminars, workshops, secondments to DCC and to working repository services Outreach: Leadership for the future, case studies, sharing solutions, collaboration with other partners, international peers, industry links Taxonomy of “Users”

Outline Taxonomy of digital curation users by role 4. Policy makers funding bodies other leaders 2. Data Curators 1. Data Creators 3. Data Re-users

Outline Taxonomy of digital curation users by role Data Preservers 4. Policy makers funding bodies other leaders 2. Data Curators Data publishers 1. Data Creators 3. Data Re-users

Outline Taxonomy by significant function of organisational entity Research 4. Funders 3. Learning & teaching 5. Policy / strategy makers 2. Service provision “Designated communities”

Outline Taxonomy by significant function of organisational entity Research 4. Funders 3. Learning & teaching 5. Policy / strategy makers 2. Service provision Commercial “Designated communities”

Service definition & delivery Advisory services Responses to queries—from legal to technical guidance HELPDESK@dcc.ac.uk Site visits (National Institute of Environmental eScience) Information Services Briefing Documents - Freedom of Information by Mags McGinley DIGITAL CURATION MANUAL 20 chapters written by community experts e.g. Metadata written by Michael Day, UKOLN Peer-reviewed Checklist for Compliance with best practices and standards Technology Watch

Services: workshops 2005 Programme Preservation of medical databases: 24-25 May at the Gulbenkian Institute, Lisbon in collaboration with ERPANET & the Wellcome Trust Institutional repositories: 6 July at the University of Cambridge, UK in collaboration with DSpace Cost models in collaboration with the Digital Preservation Coalition July at British Library Persistent identifiers liaising with NISO, summer, UK location tbc

Development approach OAIS (Open Archival Information System) linkage: focus on representation information link to global work on format registries? Concentrate on scientific data formats? Repository Representation Information Standards and Tools Aim for OAIS compliance Persistent identifiers Certification… RLG task force Open development wiki and email list

OAIS Reference Model – Functional Model How relevant to curation?

Representation Net

Representation Information More detail How does this relate to format registries?

High Level View Example of use of Representation Information Labelling

Registry issues? Trusted repository of Representation Information Authenticity of information Access control Certificates/Digests : (are they trustable over the long term?) Findability Persistent IDs What can we rely on? Labels (to support automated processing) Extensibility Distributed

Registry development Simple PHP prototype Scoping study- unification Formats, standards, tools More robust prototype in development Based on ebXML & JAXR Potentially distributed, cooperative maintenance model

Development Roadmap Registry: complete prototype, link to PRONOM, GDFR etc, handover to service Representation information: describe CCLRC (science) data using EAST, etc Certification work continues Additional tools: metadata extraction Testbeds, interactions with others

Research approaches Publishing & integrating scientific databases ‘Archiving’ past states of volatile databases Database provenance and annotation Organisational dynamics of trusted repositories Automating metadata extraction Cost-benefit analysis of data curation Rights and responsibilities

The database picture Source data Curated data: classified, cleaned, annotated, integrated, cross-linked Source data

Curated Databases are Central Much/most scientific data is now in databases They often do not contain source experimental data. Sometimes just annotation/metadata They borrow extensively from, and refer to, other databases You are now judged by your data as well as your (paper) publications!! These databases are built and maintained with a great deal of human or computational effort. What makes a database? it has internal structure or it changes. Size alone doesn’t qualify

Archiving (preserving) volatile databases How do you preserve something that changes every hour or minute? Important for the scientific record – someone might have cited your data at time t. Current practice Create versions (how often?) Log changes Use diffs Do nothing (common!)

Curated databases – some issues Integrating and publishing data so that someone else can use it. Annotating existing data and moving annotations to other databases Provenance: where did this data come from? Archiving: how do you preserve something that is constantly changing?

How do we cite data? A URL or citation to an article is already unsatisfactory. DCC client complaint: “I spend a lot of time searching [electronic documents] for the part that is relevant to the citation.” The problem is much worse when you are citing something in a very large database. How do you use a citation to locate data? How do you ensure that the citation persists? Connections with DB archiving and DOIs

Research approaches Publishing & integrating scientific databases ‘Archiving’ past states of volatile databases Database provenance and annotation Organisational dynamics of trusted repositories Automating metadata extraction Cost-benefit analysis of data curation Rights and responsibilities “Public domain, public interest, public funding” paper Waelde & McGinley

www.dcc.ac.uk

www.ijdc.net Launch planned June/July Peer-reviewed contributions Peter Buneman Editor (research) Production editor Philip Hunter

Sample issue Full papers Invited articles News & views Papers for submission are very welcome!

1st DCC International Conference Location - Bath UK 29-30 September 2005 Keynote speakers Cliff Lynch CNI Graham Cameron European Bio-informatics Institute DCC Research update Social highlights

Associates Network Goals Develop understanding, share best practice, advance research, promote recognition, develop consensus Membership International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals…… Benefits Early access to R&D outputs, advisory services, training, input to definition and design, community participation Discussion Forum www.dcc.ac.uk Please join us!

Research Councils HEIs & FE Institutes International Collaborations CMS-Bristol NIEeS RG BADC BODC ESO IVOA EDG GridPP EGEE Cambridge Leicester Jodrell Bank DPC MIMAS ILRT Council for Museums, Archives & Libraries RDN. OCLC So’ton OAI NOF NLA UNC ESA NASA NARA CNES RLG BNSC SDSC NEODC CEH RI NCS RLG Research Councils HEIs & FE Institutes International Collaborations Standards Bodies CCLRC UKOLN Durham WT-CFG Leicester IC Maastricht Oxford DELOS AHDS Microsoft IBM Oracle BT STK DPC DLI (US) NeSC UofE UofG Innogen NHS Capri Dutch NA Swiss NA Urbino UNC Salzburg IBM Almaden JHU CSIRO Caltech CDS ESO OCLC NTUA INRIA HUJ UPC Max- Planck MIMAS IASSIST LDC ACM Data Archive TU Vienna UPenn EBI MRC HGU Kyoto USC INRIA GSK Roslin

Slides from Peter Buneman, David Giaretta and others used with thanks. Acknowledgements Slides from Peter Buneman, David Giaretta and others used with thanks.

How you can help us How does OAIS relate to curation? How do format registries relate to representation information? Who else is working across these areas? What outcomes would you like to see?