United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data Archiving Session6
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 What is documentation? Documentation: comprehensive information on the processes and methods used to produce, archive and disseminate micro-data oDocumentation includes metadata and other information related to the dataset
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Role of documentation Documentation explains how the data were collected, their content and structure and any manipulation that may have taken place, how to access the data, terms for their use, etc Documentation is required in order to understand and interpret the data by providing a context: without proper documentation, data are useless The further data gets from its source, the greater the importance of the documentation (metadata) Also allows reuse of documents for future surveys
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 When to undertake documentation Documentation is an incremental process that should be a shared responsibility among various parts of an institution Different types of documentation can be added by different people at various stages of an information object’s life cycle A common documentation framework, used by different actors - the actor who is closest to the information to be used as documentation/metadata adds that information to the framework
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Types of material for documentation Three broad categories of documentation: o Explanatory material o Contextual information o Cataloguing material
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Types of material for documentation Explanatory material – required to ensure the long- term viability and functionality of a dataset and without which full understanding of the dataset and its contents cannot be achieved oData collection methods (data collection process including instruments used, methods employed, and how these were developed) oStructure of the dataset (information about relationships between individual files or records within the study, e.g., the number of cases and variables in each file and the number of files in the dataset) oTechnical information (computer system used to generate the files; software packages with which the files were created; medium on which the data was stored; and complete list of all data files present in the dataset)
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Types of material for documentation (contd) Explanatory material (contd.) oVariables and values, coding and classification schemes (descriptions of all variables (or fields) in the dataset, with explanations about coding and classifications used and for blank and missing fields) oDerived variables (how it was done) oWeighting and grossing (procedures should be explained) oData source (sources from which the data is derived e.g. questions used) oConfidentiality and anonymization (if data contain any confidential information or anonymization has been implemented and implication of both on data usage)
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Types of material for documentation (contd) Contextual information - the context in which the data was collected, and how it was put to use oDescription of the originating project (why the data collection was felt necessary; who or what was being studied; the geographic and temporal coverage) oProvenance of the dataset (history of the data collection process, changes and developments that occurred in the data themselves and the methodology, or any adjustments made) oSerial and time-series datasets, new editions (e.g., descriptions of changes in question text, variable labelling or sampling procedures for repeated cross-section, time-series datasets)
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Types of material for documentation (contd) Catalogue metadata: o A sub-set of core data documentation providing standardized structured information explaining the purpose, origin, time reference, geographic location, creator, access conditions and terms of use of data
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards Traditionally, data producers wrote text-based codebooks. To take advantage of web technology, these have been replaced by XML-based codebooks Use of metadata standards brings key data documentation together into a single document, creating detailed and structured content about the data. This enhances: −Quality of statistical documentation provided to data users −Access to the data and semantic interoperability of data sets The Data Documentation Initiative (DDI) Dublin Core Metadata Standard
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards (contd.) On XML (eXtensive Markup Language) −A way of tagging text for meaning instead of appearance (i.e., XML can be used to organize text by tagging with meaningful information −Unlike text in the database, XML text files can be viewed and edited using any standard text editor −With appropriate tools, XML files can be searched and queried like a regular database −XML documents can be read and transformed by other software applications into user-friendly formats, e.g., spreadsheets, PDF files or web pages
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards (contd.) The Data Documentation Initiative (DDI): oIs based around the data lifecycle model and provides specifications for a structured framework for organizing the content, presentation, transfer and preservation of metadata in the social and behavioural sciences oProvides comprehensive metadata on the entire survey process and usage oFacilitates point-of-origin capture of metadata oIncludes machine actionable elements to facilitate processing, discovery and analysis
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards (contd.) The Data Documentation Initiative (DDI): oFacilitates reuse of common metadata items because DDI is designed around schemes (lists of items) for commonly reused information within a study, e.g., categories, code schemes, concepts, universe, etc. −Items are entered once and used in multiple locations in a DDI document by referencing item in the list oReuse of items supports: −Consistency and accuracy of metadata content thereby minimizing redundancy and discrepancies −Internal and external implicit comparisons −External registries of concepts, questions, variables, etc. −Metadata driven processing
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards (contd.) The Data Documentation Initiative (DDI): o Information in DDI schemes can be stored in external registries and used by multiple studies to support: −Comparisons within and between studies −Organizational consistency through use of agreed content managed in registries o Designed to support easy interaction with other major standards (Dublin Core, SDMX, ISO/IEC 1179, ISO 19115) −Ensures that metadata can be connected to other domains or stages of the lifecycle
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards (contd.) Dublin Core Metadata Standard: o A general purpose metadata standard for describing digital resources related to micro-data −Questionnaires −Reports −Manuals −Data processing scripts −Programs −etc. o Makes it easy and inexpensive to create descriptive records for information resources while providing for effective retrieval of these resources on the web or other similar networked environment
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Metadata standards (contd.) Dublin Core Metadata Standard: o Consists of 15 metadata elements: TitleRelationRights SubjectCoverageDate DescriptionCreatorFormat TypePublisherIdentifier SourceContributorLanguage
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 What is cataloguing? Cataloguing: creation of documentation for a dataset providing standardized structured information so that searchers can easily identify and access datasets according to their needs (title of study, source, year of collection, etc)
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Cataloguing Sharing survey micro-data with legitimate users offers many benefits, e.g., the diversity of research work, the acceptability of data, the quality of data, etc. Therefore, users should be informed about existence and characteristics of datasets Cataloguing material serves as: −A bibliographic record of the dataset, allowing it to be properly acknowledged and cited in publications −A formal record for long-term preservation purposes −Basic instrument used for resource discovery, allowing datasets to be uniquely identified within the collection by providing appropriate information to help secondary users identify the study as useful to their purpose
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Cataloguing (contd.) Searchable catalogues facilitate finding datasets and related metadata and increase access to datasets Use of XML-based metadata standards facilitate creation of catalogues as they are structured making them searchable Information on title of dataset, data collector(s), dates of data collection, temporal and geographic coverage, methods of data collection, sampling design and frames (if undertaken), other documentation information. Also variable names, abstracts and key words…
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Cataloguing (contd.) Characteristics of a good survey catalogue - From the user point of view: o Compliant with international metadata standard, particularly XML standards o Provides detailed metadata, including at the variable level o Provides user-friendly search functionalities (full text search) o Provides clear information on the policy and procedure for accessing the data o Provides a list and direct access to reference materials (questionnaires, manuals, reports) o Includes a "search by topic" compliant with an international thesaurus
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Cataloguing (contd.) Characteristics of a good survey catalogue - From the catalogue administrator's point of view: o Provides a secure environment for storing and sharing data and metadata o Provides a "users' requests" and "user's management" tool to receive and respond to data requests and information queries o Provides a solution for sharing public use files and licensed files o Generates admin reports on access requests received/processed; most popular surveys/documents; keywords used for searching data; etc.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Thank You!