Download presentation
Presentation is loading. Please wait.
Published byErika Walker Modified over 9 years ago
1
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC
2
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 2 Overview Issues that arise when databases are records Informing (expensive, important) decisions Tensions between ideal formats and non-ideal data Representation mechanisms for access control and absent data Concentrating on R&D issues
3
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 3
4
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 4 What is NDAD? A service for UK government records which exist as ‘structured information’ Contains data + contextual information Established in 1997 - service in March 1998 First service by a national archive to provide online public access to preserved material Selection undertaken by National Archives and government departments Everything else at ULCC: under contract to TNA
5
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 5
6
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 6 Preservation Data transformed to canonical form - originals kept Paper documentation digitised Technical metadata produced or transformed Consistency checks applied: For transformation process Against original system Against published information Internal cross-checks
7
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 7 Consequences Preservation far removed from creation Unlike actively curated systems: preservation and use can take place simultaneously Multiple use scenarios - more than views
8
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 8 Where are the problems? Management
9
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 9 Perfect Preservation Formats? DDI: XML-based good for survey/social science data Not so good for complex relational stuff Likes clean data XML representations More flexible Not so good when data is unclean As SQL Much metadata or needs another scheme Useless for unclean data
10
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 10 How bad is bad? Data out of range is a quality problem, not a preservation problem (e.g. ‘Age’ of 230) But… Age = -20? Age = B0 ? Age = Thursday? All present problems if ‘Age’ is a positive integer in our preservation schema Date = ‘31 Feb 2007’ is syntactically but not semantically valid
11
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 11 More bad stuff Absent key fields or mandatory fields Encoded data that uses bad codes if days of week are 1 - 7, what is day 9? Day X ? ‘Encoded’ data which is stored translated 1 - 1 mappings that aren’t
12
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 12 What’s the problem? Must preserve errors - their nature is informative Would like to understand original system behaviour with these errors Don’t want to use tools that force all fields to be text Want a datatype like ‘almost always integer’ or ‘often a date’ - and intelligent behaviour when it isn’t.
13
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 13 How does it get that way? Data validation often in application, not database Isn’t always well-implemented People hack around the application Past migrations were poor
14
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 14 Missing and absent values Common occurrence in survey and experimental data Different types of ‘missing’: No information Known to be unreadable Refused to answer Subject didn’t know All mechanisms for representation ad-hoc Knowledge in application, not database Query engines don’t understand concept
15
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 15 Access: restricted viewing People Trips Vehicles Not available until 2050
16
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 16 Access - goal Duplicate original system Advanced analysis tools Simple viewing via a generic tool Multimedia datatypes Extensible via object-like design Traditional database systems not up to task without significant additional effort Hence much software home-grown
17
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 17 New issues from temporal GIS Temporal GIS allows one system to represent changing features and knowledge Queries like: Which features are newer than feature X? What did area Y look like 10 years ago? What present-day names correspond to ‘Hetfelle’? In a preserved temporal GIS: What would the answer to question 2 have been if I asked it 5 years ago?
18
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 18 Inconsistencies and errors Schools census - 4 datasets per year for different school types But 1976 only has 3 - no nursery schools Further examination shows files have been merged Confirmation came from completed census forms held by schools - not by government department
19
National Digital Repository ® 2007-03-23 Presdb07 - Edinburgh 19 Cornell’s DP model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.