Presentation is loading. Please wait.

Presentation is loading. Please wait.

Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director

Similar presentations


Presentation on theme: "Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director"— Presentation transcript:

1 Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director
Digital Curation Centre a centre of support for data curation and preservation UK Digital Curation Centre One Year On Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director

2 Overview What are the challenges that the DCC faces?
Why is digital curation important? What are the challenges that the DCC faces? About the people and our collaborative approach Addressing the issues How can you contribute to the DCC?

3 Curation? “maintaining and adding value to a trusted body of digital information for current and future use”

4 Digital curation continuum
For later use? In use now (and the future)? Static Dynamic Data preservation Data curation

5 Assuring permanent access to the records of science & the humanities?
Long term access to primary data Increasing data volumes from eScience and Grid-enabled / cyberinfrastructure applications Changing research paradigm: data-driven science, “big science” Observational data, simulations, large-scale experimentation Multi-media resources, statistical data, surveys, geo-spatial data……

6

7 Facilitate “post-processing” and knowledge extraction
Enable the acquisition of newly-derived information and knowledge Run complex algorithms over primary datasets Mining (data, text, structures) Modelling (economic, climate, mathematical, biological) Analysis (statistical, lexical, pattern matching, gene) Presentation (visualisation, rendering)

8

9 Provide additional functionality beyond digital preservation processes
Annotations Gene and protein sequences e-Lab books (Smart Tea Project in chemistry)

10 Emerging policy on open access to data
Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media The scholarly knowledge cycle : linking research data to publications eBank UK Project Data analysis, transformation, mining, modelling Aggregator services: national, commercial Harvestingmetadata Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Deposit / self-archiving Validation Validation Publication Linking Emerging policy on open access to data Data curation: databases & databanks Peer-reviewed publications: journals, conference proceedings

11 DCC people (some of them…)
Management & Co-ordination Director Chris Rusbridge (University of Edinburgh) Community Support & Outreach Led by Dr Liz Lyon (UKOLN, University of Bath) Service Definition & Delivery Led by Professor Seamus Ross (HATII [ERPANET], University of Glasgow) Development Led by Dr David Giaretta (Astronomical Software & Services, CCLRC) Research Led by Professor Peter Buneman (Informatics, University of Edinburgh)

12 The challenges we face Standards
Interoperability issues: technical & hopefully soluble Scale Volume and diversity of datasets Culture Bringing communities together Library/information science/archives “document tradition” Domain research (chemists, astronomers, biologists) Computer science (databases) Commercial suppliers (storage technology)

13 More challenges…… Process
Highly-distributed organisation: use collaborative tools Skills Distributed amongst the 4 partners & beyond Engagement Lots of existing work and many significant players Impact Visible & measurable, in the short & long-term Meeting expectations (which are high…..) Of the community and our funders

14 User requirements analysis
Commissioned study Leona Carpenter Reporting now Desk-based research Focus groups Interviews Results will inform research, development service definition / delivery and outreach Recommendations and priority tasks

15 Some sound bytes… R&D issues: Annotation services, Ontology development, Automating metadata creation, Tools and toolkits, Data Format Description Language, Identifiers, Registries, Economic and cost-benefits studies Advisory services :“Ask-a-Curator”,FAQs, reports, briefings, awareness-raising materials, best practice guidance, Storage media, “Like Erpanet”, advise Government, Research Councils, funding bodies Professional development: Short courses, conferences, seminars, workshops, secondments to DCC and to working repository services Outreach: Leadership for the future, case studies, sharing solutions, collaboration with other partners, international peers, industry links Taxonomy of “Users”

16 Outline Taxonomy of digital curation users by role
4. Policy makers funding bodies other leaders 2. Data Curators 1. Data Creators 3. Data Re-users

17 Outline Taxonomy of digital curation users by role
Data Preservers 4. Policy makers funding bodies other leaders 2. Data Curators Data publishers 1. Data Creators 3. Data Re-users

18 Outline Taxonomy by significant function of organisational entity
Research 4. Funders 3. Learning & teaching 5. Policy / strategy makers 2. Service provision “Designated communities”

19 Outline Taxonomy by significant function of organisational entity
Research 4. Funders 3. Learning & teaching 5. Policy / strategy makers 2. Service provision Commercial “Designated communities”

20 Service definition & delivery
Advisory services Responses to queries—from legal to technical guidance Site visits (National Institute of Environmental eScience) Information Services Briefing Documents - Freedom of Information by Mags McGinley DIGITAL CURATION MANUAL 20 chapters written by community experts e.g. Metadata written by Michael Day, UKOLN Peer-reviewed Checklist for Compliance with best practices and standards Technology Watch

21 Services: workshops 2005 Programme
Preservation of medical databases: May at the Gulbenkian Institute, Lisbon in collaboration with ERPANET & the Wellcome Trust Institutional repositories: 6 July at the University of Cambridge, UK in collaboration with DSpace Cost models in collaboration with the Digital Preservation Coalition July at British Library Persistent identifiers liaising with NISO, summer, UK location tbc

22 Development approach OAIS (Open Archival Information System) linkage: focus on representation information link to global work on format registries? Concentrate on scientific data formats? Repository Representation Information Standards and Tools Aim for OAIS compliance Persistent identifiers Certification… RLG task force Open development wiki and list

23 OAIS Reference Model – Functional Model
How relevant to curation?

24 Representation Net

25 Representation Information More detail
How does this relate to format registries?

26 High Level View Example of use of Representation Information Labelling

27 Registry issues? Trusted repository of Representation Information
Authenticity of information Access control Certificates/Digests : (are they trustable over the long term?) Findability Persistent IDs What can we rely on? Labels (to support automated processing) Extensibility Distributed

28 Registry development Simple PHP prototype Scoping study- unification
Formats, standards, tools More robust prototype in development Based on ebXML & JAXR Potentially distributed, cooperative maintenance model

29 Development Roadmap Registry: complete prototype, link to PRONOM, GDFR etc, handover to service Representation information: describe CCLRC (science) data using EAST, etc Certification work continues Additional tools: metadata extraction Testbeds, interactions with others

30 Research approaches Publishing & integrating scientific databases
‘Archiving’ past states of volatile databases Database provenance and annotation Organisational dynamics of trusted repositories Automating metadata extraction Cost-benefit analysis of data curation Rights and responsibilities

31 The database picture Source data
Curated data: classified, cleaned, annotated, integrated, cross-linked Source data

32 Curated Databases are Central
Much/most scientific data is now in databases They often do not contain source experimental data. Sometimes just annotation/metadata They borrow extensively from, and refer to, other databases You are now judged by your data as well as your (paper) publications!! These databases are built and maintained with a great deal of human or computational effort. What makes a database? it has internal structure or it changes. Size alone doesn’t qualify

33 Archiving (preserving) volatile databases
How do you preserve something that changes every hour or minute? Important for the scientific record – someone might have cited your data at time t. Current practice Create versions (how often?) Log changes Use diffs Do nothing (common!)

34 Curated databases – some issues
Integrating and publishing data so that someone else can use it. Annotating existing data and moving annotations to other databases Provenance: where did this data come from? Archiving: how do you preserve something that is constantly changing?

35 How do we cite data? A URL or citation to an article is already unsatisfactory. DCC client complaint: “I spend a lot of time searching [electronic documents] for the part that is relevant to the citation.” The problem is much worse when you are citing something in a very large database. How do you use a citation to locate data? How do you ensure that the citation persists? Connections with DB archiving and DOIs

36 Research approaches Publishing & integrating scientific databases
‘Archiving’ past states of volatile databases Database provenance and annotation Organisational dynamics of trusted repositories Automating metadata extraction Cost-benefit analysis of data curation Rights and responsibilities “Public domain, public interest, public funding” paper Waelde & McGinley

37

38 Launch planned June/July Peer-reviewed contributions Peter Buneman Editor (research) Production editor Philip Hunter

39 Sample issue Full papers Invited articles News & views Papers for submission are very welcome!

40 1st DCC International Conference
Location - Bath UK 29-30 September 2005 Keynote speakers Cliff Lynch CNI Graham Cameron European Bio-informatics Institute DCC Research update Social highlights

41 Associates Network Goals
Develop understanding, share best practice, advance research, promote recognition, develop consensus Membership International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals…… Benefits Early access to R&D outputs, advisory services, training, input to definition and design, community participation Discussion Forum Please join us!

42 Research Councils HEIs & FE Institutes International Collaborations
CMS-Bristol NIEeS RG BADC BODC ESO IVOA EDG GridPP EGEE Cambridge Leicester Jodrell Bank DPC MIMAS ILRT Council for Museums, Archives & Libraries RDN. OCLC So’ton OAI NOF NLA UNC ESA NASA NARA CNES RLG BNSC SDSC NEODC CEH RI NCS RLG Research Councils HEIs & FE Institutes International Collaborations Standards Bodies CCLRC UKOLN Durham WT-CFG Leicester IC Maastricht Oxford DELOS AHDS Microsoft IBM Oracle BT STK DPC DLI (US) NeSC UofE UofG Innogen NHS Capri Dutch NA Swiss NA Urbino UNC Salzburg IBM Almaden JHU CSIRO Caltech CDS ESO OCLC NTUA INRIA HUJ UPC Max- Planck MIMAS IASSIST LDC ACM Data Archive TU Vienna UPenn EBI MRC HGU Kyoto USC INRIA GSK Roslin

43 Slides from Peter Buneman, David Giaretta and others used with thanks.
Acknowledgements Slides from Peter Buneman, David Giaretta and others used with thanks.

44 How you can help us How does OAIS relate to curation?
How do format registries relate to representation information? Who else is working across these areas? What outcomes would you like to see?


Download ppt "Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director"

Similar presentations


Ads by Google