Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Tools used for testing and long-term preservation Bern, 10.4.2003 Terje Pettersen-Dahl, adviser Department of electronic archives (Elark), National Archives.

Similar presentations


Presentation on theme: "1 Tools used for testing and long-term preservation Bern, 10.4.2003 Terje Pettersen-Dahl, adviser Department of electronic archives (Elark), National Archives."— Presentation transcript:

1 1 Tools used for testing and long-term preservation Bern, 10.4.2003 Terje Pettersen-Dahl, adviser Department of electronic archives (Elark), National Archives of Norway

2 2 System types Registry- based ERMs Specialized case handling systems Information systems ArkN3 Arkadukt Arkade NoarkADDMML

3 3 Overview Original system For Long term preservation Data files ADDMML-file Access Arkadukt Structural- description Arkade New data files New ADDMML-file Analysis Checks Controls

4 4  Migration.  Preserving only extracts of data from the databases.  Extracts on a software and hardware independent format.  In addition to the extract we need technical metadata about the extracts. Choice of method

5 5  The metadata has to be standardized  The National Archivist has established a national standard for metadata called ADDMML (Archives Data Description and Manipulation Mark-up Language). 3ADDMML is an XML DTD. Metadata

6 6 Structure in ADDMML The structure is hierarchical. A simple extract is called a dataset. A dataset can contain one or more files. A file may contain one or more tables. Tables contains some fields. Fields may contain codes. Dataset File Field Record-type Code

7 7 Arkadukt  Arkadukt produces ADDMML-files that always are 100 % correct syntactic. 3This is a must for Arkade. 3The user does not need to know anything about ADDMML!  The metadata itself is registered as plain text. 3Simple registration. 3Adjusted to the structure in ADDMML.

8 8 Arkadukt

9 9 Arkade  Arkade has following functions: 3Conversions 3Analysis 3Checks and controls 3Special-functions  Additionally some functions can be initiated from Arkade: 3Creation of SAS-dataset 3Random quality testing of records

10 10 Conversion  Arkade can convert data on different terms. Examples are: 3Convert from one character-set to an other. 3Convert from one file-format to an other. 3Change record-delimiter. 3Unpack packed fields. 3Split repeating groups or record-types into different files. 3Convert from one field-format to an other.  All conversions are initiated by processes in the ADDMML-file.

11 11 Analysis Arkade does analysis of data on different levels  File level. 3Count total number of records in the file. 3Count total number of characters in the file.

12 12 Analysis (cont.)  Record-type level. 3Find minimum- and maximum-length for records of this type in the file. 3Find number of fields in records of this type, eventually minimum- and maximum-number if the number varies. 3Count number of records of this type in the file. 3Produce sorted frequency-lists of the values throughout the file for each field in the record-type. 3Produce cross-reference table for two specified fields from the same record-type.

13 13 Analysis (cont.)  Field level 3Count number of empty (NULL) and non-empty values in the field throughout the file. 3Find length and record-number for the shortest and longest data-value (ex padding) in the field. 3Find minimum- and maximum-value (including record-number) in the field. 3Produce sorted frequency-lists of the values throughout the file in this field. All analysis are initiated by processes in the ADDMML- file.

14 14 Checks and controls As analysis checks are done on different levels.  File level 3Check if given record-length is correct. 3Check if given number of record-types is correct. 3Check if given number of records is correct. 3Check if given number of characters is correct.

15 15 Checks and controls (cont.)  Record-type level 3Check whether primary key is unique and do not contain any empty value (NULL). 3Check whether secondary key is unique and do not occurs with empty value (NULL). 3 Check whether foreign key either are empty or exists in the referenced file. Additionally if the given type of relation is correct. 3 Check if given record length is correct. 3 Check if given minimum record length is correct. 3 Check if given maximum record length is correct.

16 16 Checks and controls (cont.)  Record-type level (cont.) 3 Check if given number of fields is correct. 3Check if given number of records of this type is correct.  Field level 3 Check if given field length is correct. 3 Check if given minimum field length is correct. 3 Check if given maximum field length is correct. 3 Check if given data-type and field format is correct. 3 Check whether the field always has a value (no NULL).

17 17 Checks and controls (cont.)  Field level (cont.) 3 Check on uniqueness. 3 Check given codes against a specified code-set. All checks are initiated by processes in the ADDMML- file.

18 18 Special-functions Additionally Arkade has a few special functions:  Control of control-digits in birth-number.  Control of control-digits in account-number.  Add key-fields in record-types where these are not given (Key-values are given indirectly by the records internally positions to each other). All special-functions are initiated by processes in the ADDMML-file.

19 19 SAS-dataset  Arkade can generate an internal dataset. As Arkade is made in SAS, this internal dataset will be a SAS- dataset.  The SAS-dataset can be used further to: 3Sort tables 3Do an extract 3Make statistics 3Make a basis for a public version.  Generation of SAS-dataset are initiated from the screen.

20 20 Random tests  Arkade can do random tests on the extracts. Examples: 3Look at the first 100 records only. (The number can vary and is decided by the user.) 3Look at each 25. record. (Once again the number is decided by the user.) 3Only test the ADDMML-file without doing anything with the extracts.  Random tests are initiated in the screen.  Random tests are mainly used to check syntax and conformity in the data-files.

21 21 Arkade

22 22 Conditions for Arkade  Arkade is dependent of a correct ADDMML-file.  To run Arkade there must be data-files, and the references to the data-files have to be correct.  Even most logical dependencies have to be correct.

23 23 ArkN3  Imports data in the format described in the Noark-3 manual.  Tests whether the described format is followed.  Presents cases and registry-records.  Makes it possible to search on different levels.  Does an analysis on the imported data.

24 24 Dublin Core International view ISAD(G) EAD ADDMML

25 25 ISO 15489 and MoReq versus Noark  These new standards are in close harmony with Norwegian theory and Norwegian requirements  But Noark is not a general records management- standard 3Noark = a detailed application standard, initially for registry systems

26 26 ISO 15489 and MoReq versus Noark  Registry- and case handling workflow is integrated in Noark: 1)Registry handling control: follow-up- and “sign- off”-functions connected to case management (MoReq’s workflow-functions are related to capture, retention and availability/distribution) 2) Process management – implements the general specification in MoReq, but is closely related to registry handling and case handling in Noark 3)Board-handling (described in great detail, but only an option in Noark)

27 27 ISO 15489 and MoReq versus Noark  MoReq-elements which are given less consideration in Noark: 3”freezing” of metadata 3audit trails 3“robust” metadata capture  Necessary to map Noark to MoReq’s requirements 3It is important for us to have a standard which is related to Moreq, 3Market considerations (Norwegian suppliers export opportunities to EU-countries - and vice versa)

28 28 General RM-standard  I addition to Noark there is a need for en general Norwegian RM-standard based on MoReq 3for systems without registry functions which generate and manage records 3E.g. it is necessary with a category for file which is more general and liberal then the category “case” in Noark 3A general standard is also necessary to avoid discrimination of EU-suppliers who offer MoReq- based solutions in Norway

29 29 ”May” RM-standard: possible Norwegian model ”Should” Level of requirements: ”Must” Basic RM Basic workflow Case handling info. in registry Board handling Case handling & RM workflow Specific RM (Process management) Other case handling & workflow Not registry- & Noark- based process mgmt. (Doc. & metadata capture and other MoReq-specified functions) Registry- & Noark- based process mgmt.*) *) Noark also requires defined levels of functionality in Basic RM


Download ppt "1 Tools used for testing and long-term preservation Bern, 10.4.2003 Terje Pettersen-Dahl, adviser Department of electronic archives (Elark), National Archives."

Similar presentations


Ads by Google