1 A STRATEGIC APPROACH TO METADATA CREATION
Why do people search for data? 2
Exploratory/Scoping Reuse/Secondary data analysis Can be starting point or ad hoc Peer review Reproduce/extend results Repurpose (e.g. for mashups, visualisations, simulations) Verify claims (e.g. report findings) *Not in any order; not exhaustive! 3 Why do people search for data?*
How do people find data? 4
How do people find data?* 5 Google Ask a colleague Find link to data in a journal article Discipline specific data journal Data registries e.g. DataBib Open data portals e.g. data.gov.au Institutional repositories Data repositories e.g. Dryad Project website Data discovery services like Research Data Australia Library catalogues, databases e.g. NLA’s Trove *Not in any order; not exhaustive!
How do people find data? 6 Movable feast / changing beast No standard practice, universal standard or vocab Databases are non-exhaustive Methods for searching and terms driven by why people are looking and how the data is stored
Find Identify Select Obtain This is our guide for creating metadata records! 7 When looking for data, people need to:
8 What is metadata? Metadata is a means of collecting or structuring data about the content of other data Example: catalogue record
9 How is metadata useful? Metadata is useful because it: Assists in the discovery of a resource Assists users to evaluate a resource Provides digital identification for a resource on the internet Also helps to reduce workload, assist data to transcend people and time, assist institutional memory. “Metadata has value for data users, data developers, and organizations. No dataset should be considered complete without accompanying metadata. Data without metadata is useless”. Source: U.S. Geological Survey - Core Science Analytics and Synthesis – Metadata Metadata is critical in providing meaning and context to the resource it describes; without it, a resource could remain undiscovered and unidentifiable.
10 Why do we need metadata standards? Provide us with a common way of describing information resources Facilitate discoverability of resources Facilitate exchange of data between systems
11 Describing publications is easy, but data….? Research data varies widely in aspects such as: Method of data collection Number, type and size of data files Electronic/physical resources or combination Related software Contextual information Legal restrictions Ethical restrictions Access restrictions Data itself varies widely between and within disciplines = proliferation of metadata schemas for data.
12 RIF-CS ISO 2146 objectDescription Collectionan aggregation of physical or digital objects Partya person or group ActivitySomething occurring over time that generates one or more outputs Servicea physical or electronic interface that provides its users with benefits such as work done by a party or access to a collection or activity
13 RIF-CS Create records manually Create automatic feed for harvesting by ANDS into Research Data Australia Example: Griffith University’s RIF-CS feed:
14
15 What makes a high quality metadata record?
Title 16 The name/title of the data collection Rainfall in northern Australia VS Daily rainfall observations over the northern Australian tropics, November to February,
Description 17 A full or brief description of the data collection Rainfall observations were taken over a 10-year period across northern Australia. VS The dataset consists of rainfall observations taken during the wet season in northern Australia, over a 10 year period, It is part of an ongoing longitudinal study of weather in the region. Observations are made daily at 157 geographic locations across the area. Data is sent to a central point in Darwin. Measurements are recorded in millimetres: 0.2 to 9; 10 to 24; 25 to 49; 50 to 99; 100 to 149; 150 +; Data is recorded in spreadsheets and calculated hourly, daily, weekly and monthly. Statistical analysis of the data was made using Excel.
Subject 18 The subject represents the primary topic or topics covered by the collection. Rainfall VS (text value = Weather Research & Forecast Model (WRF)) (type = anzsrc)anzsrc Rainfall frequencies (type = lcsh) Rainfall in northern Australia (type = local)
Coverage (spatial) 19 Spatial coverage refers to the geographical area where data was collected or a place which is the subject of a collection. Australia VS , , , , , (type = kmlPolyCoords) Northern Australia (type = text) AU-NSW (type = iso31662)
Coverage (temporal) 20 Temporal coverage refers to a time period during which data was collected or observations made or a time period that collection is linked to intellectually or thematically VS November-February, (type = text)
Description (rights) 21 Rights held in and over the data collection. copyright VS Copyright Use of the data is subject to legal, ethical and commercial restrictions. Licenced under Attribution 3.0 Australia (CC BY 3.0)
Description (access rights) 22 Information about access rights to the data collection. access restricted VS Access to this data collection is by negotiation with Professor Rayne Fall.
Identifier 23 Identifiers uniquely identify the collection within the domain of a specified authority; persistent identifiers are preferred. ID: VS (type = uri) Other examples (type = uri) / (type = hdl)
Location/address 24 The address of the collection (electronic or physical), or an other address which enables access to the collection. Australian Research Institution Western Australia OR (type = uri) (type = )
Related object (party) 25 A party (person or group) related to the data collection. Not used; if used, relation is incorrect VS Key: ari.org.au/researcher: Relation: hasCollector (Professor Rayne Fall - person) Key: Relation: isManagedBy (Australian Research Institution - group) Other examples (NLA party record) (ORCID party record)
Related object (activity) 26 An activity related to the collection Not used or is made up where no real activity exists; if used, relation is incorrect VS Key: ari.org.au/activity: Relation: isOutputOf (Rainfall patterns in the northern Australian tropics during the wet period: a longitudinal study from 1950 onwards) Other example
Related information 27 Related information that provides contextual information about the data collection. Title: Title not included Identifier: ISSN to the journal (type = publication) VS Title: Rainfall in the northern Australian tropics: a statistical analysis of rainfall over a 50 year period, Identifier: (type = publication; the identifier is the journal article’s url)
Citation 28 Citation is the preferred form for citing a dataset or collection in a publication or other bibliographic environment. Citation given is to the research publication based on the data, not the collection VS Fall, R (2011): Daily rainfall observations over the northern Australian tropics, November to February, [place of publication, publisher]. doi: / (type = fullCitation)
Poor quality metadata record 29
High quality metadata record 30
International initiatives Hands-on exercise Practice evaluating the quality of metadata records 31
32 Metadata is critical to: discovery of research data collections determination of value of research data collections access to research data collections, and re-use of research data collections. The higher quality the metadata record the better. However, quality metadata takes time. Think about manual vs automated means of metadata creation. Re-cap
Resources 33 Content Provider’s Guide Technical resources RIF-CS documentation Schema GuidelinesSchema Guidelines (technical info) Controlled Vocabularies Quality processes and gold standard recordsQuality processes and gold standard records (includes links to examples)