Metadata and RIF-CS in plain English SEQld Data Intensive - 30 January 2015 Kathryn Unsworth
2 Comic strip picture of a simple explanation of metadata – an 8 letter word. Not included due to copyright constraints.
3 What is metadata? Metadata is a means of collecting or structuring data about the content of other data Example: catalogue record
4 Timelapse: What is metadata?
5 Basics of metadata
6 Describing publications is easy, but data….? Research data varies widely between and within disciplines: ▪Methods of data collection ▪Number, type and size of data files ▪Electronic/physical resources or combinations of ▪Software/hardware dependencies ▪Contextual information ▪Legal restrictions ▪Ethical restrictions ▪Access restrictions This has lead to the proliferation of metadata schemas for data.
7 Metadata standards – why we need them ▪Provide us with a common way of describing information resources ▪Facilitate discoverability of resources ▪Facilitate exchange of data between systems – interoperability
8 So… Requirements for describing datasets efficiently differ somewhat from the metadata used to describe publications, but the same principles are applicable to both.
9 1. Enable a (re)user to find a dataset of which either (A) the data creator (B) the title of the dataset (C) the subject is known 2. To show what is available (D) by a given data creator (E) on a given subject (F) in a given format/data type 3. To assist in the choice of a dataset (G) as to temporal and/or spatial coverage (when and where) (H) as to its authority and access (associated publications, access rights) Keep the (re)user in mind when describing datasets Based on a customised version of Cutter’s cataloguing objects and the FISO user tasks model Find Identify Select Obtain Recent changes to RIF- CS now include: access descriptions – open, conditional, restricted
10 Descriptive metadata (intellectual content) Administrative metadata (rights, technical, preservation) Structural metadata (relationship between the parts) Discovery/access points
11 Traditional publication (homogenous) VSData publication (heterogeneous) MARC21RIF-CSExample Item-level descriptionCollection-level description Fields and tagsElements and attributesStandards for consistency 655$a “Dataset” 245$a “Historical coastlines (community perspectives) : manuscript and images archive” 700$a “Steve Mullins” 710$a,b “Faculty of Arts, Humanities and Education CQU” 520%a “Photographic images are useful tools for environmental historians and have their place alongside the official documents generated by environmental regulators and managers, and the unofficial written observations and…” 654$a 2103 – Historical Studies 542$f Rights statement This work is copyright. Permission is given for non-profit electronic viewing, via the Internet. Apart from this, and any use as permitted under the Australian Copyright Act 1968 no part may be reproduced or copied by any process, without written permission. 700$0 ORCID
13 RIF-CS elements, a quick look
▪ISO 2146 standard: Registry services for libraries and related organizations ▪ANDS implementation describes not just collections but also the researchers, research activities that surround and are linked to a research data collection and the systems that support data collections – the ‘mesh’ ▪RIF-CS is an XML schema for sharing metadata between source repository and the ANDS Collections Registry ▪Is reviewed annually by the RIF-CS Advisory Board ISO 2146 and RIF-CS interchange schema 14
15 The structure of RIF-CS (based on ISO 2146 standard) ISO 2146 objectDescription Collectionan aggregation of physical or digital objects Partya person or group ActivitySomething occurring over time that generates one or more outputs Servicea physical or electronic interface that provides its users with benefits such as work done by a party or access to a collection or activity
16 Metadata pathways to RDA Create records manually in the ANDS Registry Create automatic RIF-CS feed for harvesting by ANDS into RDA Configure your harvest for schema that is not RIF-CS, eg.CKAN, ISO19115 Do you know what method your institution will use?
17 A primary, abbreviated and/or alternative name for the data collection Rainfall in northern Australia Better… Daily rainfall observations over the northern Australian tropics, November to February, Name
Description 18 A full or brief description of the data collection Rainfall observations were taken over a 10-year period across northern Australia. Better… The dataset consists of rainfall observations taken during the wet season in northern Australia, over a 10 year period, It is part of an ongoing longitudinal study of weather in the region. Observations are made daily at 157 geographic locations across the area. Data is sent to a central point in Darwin. Measurements are recorded in millimetres: 0.2 to 9; 10 to 24; 25 to 49; 50 to 99; 100 to 149; 150 +; Data is recorded in spreadsheets and calculated hourly, daily, weekly and monthly. Statistical analysis of the data was made using Excel.
Subject 19 The subject represents the primary topic or topics covered by the collection. Rainfall Better… (text value = Weather Research & Forecast Model (WRF)) (type = anzsrc)anzsrc Rainfall frequencies (type = lcsh) Rainfall in northern Australia (type = local)
Coverage (spatial) 20 Spatial coverage refers to the geographical area where data was collected or a place which is the subject of a collection. Australia Better… , , , , , (type = kmlPolyCoords) Northern Australia (type = text) AU-NSW (type = iso31662)
Coverage (temporal) 21 Temporal coverage refers to a time period during which data was collected or observations made or a time period that collection is linked to intellectually or thematically Better… November-February, (type = text)
Rights 22 Rights held in and over the data collection. copyright Better… Copyright Use of the data is subject to legal, ethical and commercial restrictions. Licenced under Attribution 3.0 Australia (CC BY 3.0)
Access rights 23 Information about access rights to the data collection. Recent changes to RIF-CS now allow for selection of the following access statements: open, conditional or restricted. Restricted. Access to this dataset is restricted Better… Conditional. Access to this data collection is by negotiation with Professor Rayne Fall. Better… Open. Access to this data collection is open.
Identifier 24 Identifiers uniquely identify the collection within the domain of a specified authority; persistent identifiers are preferred. ID: Better… (type = uri) Other examples (type = uri) / (type = hdl)
Location/address 25 The address of the collection (electronic or physical), or another address which enables access to the collection. Australian Research Institution Western Australia or (type = uri) (type = )
Related object (party) 26 A party (person or group) related to the data collection. Not used; if used, relation is incorrect Better… Key: Relation: hasCollector (Professor Rayne Fall - person) Key: Relation: isManagedBy (Australian Research Institution - group) Other examples (NLA party record) (ORCID party record)
Related object (activity) 27 An activity related to the collection Not used or is made up where no real activity exists; if used, relation is incorrect Better… Key: Relation: isOutputOf (Rainfall patterns in the northern Australian tropics during the wet period: a longitudinal study from 1950 onwards) Other example
Related information 28 Related information that provides contextual information about the data collection. Title: Title not included Identifier: ISSN to the journal (type = publication) Better… Title: Rainfall in the northern Australian tropics: a statistical analysis of rainfall over a 50 year period, Identifier: (type = publication; the identifier is the journal article’s url)
Citation 29 Citation is the preferred form for citing a dataset or collection in a publication or other bibliographic environment. Citation given is to the research publication based on the data, not the collection Better… Fall, R (2011): Daily rainfall observations over the northern Australian tropics, November to February, [place of publication, publisher]. doi: / (type = fullCitation)
31 Any questions?
32 This work is licensed under a Creative Commons Attribution 3.0 Australia License ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).