Www.csiro.au CM [A] R’s “MarLIN” Metadata System - or, how do we discover what data we’ve got?? Tony Rees Manager, Divisional Data Centre 3 June 2005 CSIRO.

Slides:



Advertisements
Similar presentations
Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
Advertisements

Metadata workshop, June The Workshop Workshop Timetable introduction to the Go-Geo! project metadata overview Go-Geo! portal hands on session.
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Business Development Suit Presented by Thomas Mathews.
CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Mine Action Information Center
Versioning Requirements and Proposed Solutions CM Jones, JE Brace, PL Cave & DR Puplett OR nd April
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
OBIS Australia – Regional Node for the Ocean Biogeographic Information System (OBIS) OBIS Australia is an operational component of the Census of Marine.
Presented by: Virginia Hendricks Information Audit.
1 Adaptive Management Portal April
Requirements Specification
NSF Data Management Plan Requirements Alex Kanous
Lecture Nine Database Planning, Design, and Administration
Purpose of the Standards
Administration Of A Website Information Architecture November 17, 2010.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Improving access to digital resources: a mandate for order mandate: managing digital assets in tertiary education craig green,
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Implementation of ISO in Australia Craig Macauley ANZLIC, Australia
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
The BIM Project Execution Planning Procedure
Collection Level Description: The RASCAL Experience A Case Study Clare McVeigh RASCAL Project Manager 8 February 2002.
MEDIN Partners Meeting Sept 2010 DASSH – The Archive for Marine Species and Habitats Dan Lear DASSH Project Co-ordinator Marine.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Developing an accessibility policy. In this talk we will discuss What is an accessibility policy Why do we need one? Getting started - steps to consult.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Database Design - Lecture 1
EXPECTATIONS OF TURKISH ENVIRONMENTAL SECTOR FROM INSPIRE Ministry of Environment and Forestry June, 2010 Özlem ESENGİN Ahmet ÇİVİ Tuncay DEMİR.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart.
Developing an accessibility strategy. In this talk we will discuss an accessibility strategy an accessibility policy getting started - steps to consultation.
A Feasibility Study of a Unified Library Management System for NHS Scotland - A Progress Report Laura McCaig, Information Manager, E-Library Communicators.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
The Value of Geospatial Metadata Metadata has tremendous value to Individuals within your organization, as well as to individuals outside of your organization.
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Tony Rees Divisional Data Centre CSIRO Marine Research, Australia Metadata concepts, issues and experiences – lessons from 8 years.
MarLIN CSIRO Marine Laboratories Information Network update April 1999 Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart acknowledgements:
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
PACSCL Consortial Survey Initiative Group Training Session February 12, 2008 at The Historical Society of Pennsylvania.
ITGS Databases.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
MEDIN Work Plan for By March 2011 MEDIN will be 3 years into the original 5 year development plan started in Would normally ask for continued.
NDD (National Oceans Office Data Directory) development overview as at 1 July 2002 Tony Rees/Miroslaw Ryba CSIRO Marine Research, Hobart.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
1 EMS Fundamentals An Introduction to the EMS Process Roadmap AASHTO EMS Workshop.
MarLIN - CSIRO Marine Laboratories Information Network.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
11 th Open Forum for Metadata Registries May 2008 Metadata Management on a Shoe String in the Vietnamese Ministry of Planning and Investment by Michael.
CEOS Working Group on Information System and Services (WGISS) Data Access Infrastructure and Interoperability Standards Andrew Mitchell - NASA Goddard.
ICAO Seminar on Aeronautical spectrum management (Cairo, 7 – 17 June 2006) SAFIRE Spectrum and Frequency Information Resource (presented by Eurocontrol)
Data Management: Documentation & Metadata
ICAO Seminar on Aeronautical spectrum management (Cairo, 7 – 17 June 2006) SAFIRE Spectrum and Frequency Information Resource (presented by Eurocontrol)
Data Quality By Suparna Kansakar.
Digital Stewardship Curriculum
MSDI training courses feedback MSDIWG10 March 2019 Busan
Introduction of PTM (Planning Tracking & Management) Tool - developed by Meridian Technology 29/05/2019.
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Presentation transcript:

CM [A] R’s “MarLIN” Metadata System - or, how do we discover what data we’ve got?? Tony Rees Manager, Divisional Data Centre 3 June 2005 CSIRO Marine Research

CMR’s “MarLIN” Metadata System Talk overview  Data in our overall activities  Data directories and metadata  CMR’s “MarLIN” system  Creating metadata content – roles and issues  Concluding remarks  Questions / discussion items MarLIN... Marine Laboratories Information Network (1998)... Marine & Atmospheric Research Laboratories Information Network (2005 onwards) 1

CMR’s “MarLIN” Metadata System The research process (heavily simplified!) Problem Formulation Operationalisation Data Acquistion Data Appraisal Data Interpretation Solution / New Knowledge 2

CMR’s “MarLIN” Metadata System The research process (heavily simplified!) Problem Formulation Operationalisation Data Acquistion Data Appraisal Data Interpretation Solution / New Knowledge Scientific publications, reports, products Happy customers New problems outputs: 2

CMR’s “MarLIN” Metadata System The research process (heavily simplified!) Problem Formulation Operationalisation Data Acquistion Data Appraisal Data Interpretation Solution / New Knowledge Data Sources Data Sinks Scientific publications, reports, products Happy customers New problems outputs: 2

CMR’s “MarLIN” Metadata System Data sources and sinks Data sources  pre-existing data in CMR / CSIRO holdings  pre-existing data – external (third party) sources  new data collection / generation 3

CMR’s “MarLIN” Metadata System Data sources and sinks Data sources  pre-existing data in CMR / CSIRO holdings  pre-existing data – external (third party) sources  new data collection / generation Data sinks  project / researcher archives (formal / informal)  CMR centralised data holdings (e.g. in Data Centre)  external repositories / customers’ holdings 3

CMR’s “MarLIN” Metadata System Data sources and sinks Data sources  pre-existing data in CMR / CSIRO holdings  pre-existing data – external (third party) sources  new data collection / generation Data sinks  project / researcher archives (formal / informal)  CMR centralised data holdings (e.g. in Data Centre)  external repositories / customers’ holdings feed back to... 3

CMR’s “MarLIN” Metadata System Need for a data catalogue Requirements  (1) Need to know what we already have...  across our Division  across whole agency  (2) Others may also need to know what we have (or subset of the same, that we are in a position to share)  (3) Also need to know what exists elsewhere (potential for acquisition of third party data, where it exists) 4

CMR’s “MarLIN” Metadata System Need for a data catalogue Requirements  (1) Need to know what we already have...  across our Division  across whole agency  (2) Others may also need to know what we have (or subset of the same, that we are in a position to share)  (3) Also need to know what exists elsewhere (potential for acquisition of third party data, where it exists) Solutions  (1 and 2) Catalogue of our own data assets, using “metadata”  (3) Other agencies’ catalogues, metadata gateways  plus other routes (literature searches, peer-to-peer networking, etc.) 4

CMR’s “MarLIN” Metadata System What is this “metadata”? “ Metadata is information about data or other information.” (USGS web site) “Metadata is data about data. In other words, it is a structured summary of information that describes the data. Metadata includes, but is not restricted to, characteristics such as the content, quality, currency, access and availability of the data.” (ANZLIC Metadata Guidelines v2, 2001) 5

CMR’s “MarLIN” Metadata System What is this “metadata”? “ Metadata is information about data or other information.” (USGS web site) “Metadata is data about data. In other words, it is a structured summary of information that describes the data. Metadata includes, but is not restricted to, characteristics such as the content, quality, currency, access and availability of the data.” (ANZLIC Metadata Guidelines v2, 2001) In practice... Metadata is structured summary information about [......] E.g.:  a film guide holds metadata about films e.g. title, director, genre, cast, running length, story synopsis, language, rating...  a scientific publications database holds metadata about scientific publications e.g. author, title, location in journal, publication date, abstract, keywords...  a data directory (metadata system, in the present context) holds metadata about datasets of value to scientists and other potential users. 5

CMR’s “MarLIN” Metadata System Why metadata?  Structured information collection supports powerful information retrieval  Efficient – easier to search / browse the metadata than obtaining and interrogating all the actual resources, in the first instance (i.e., metadata is a surrogate for the resource)  Metadata is human-readable and text searchable, resource may not be (e.g. rock specimens, images, music, digital data files...)  Collection of metadata into metadata systems supports resource discovery (entry point/s for information)  Captures “corporate memory” – essential information required to understand or re-use the resource (information does not solely reside in people’s heads)  Assists in resource management – knowing what one has is a precursor to managing it well  Can be used for resource distribution – enquirer locates the metadata, then is provided with an access point to the data (e.g. with a web-based system, can then hyperlink to any web- accessible data source). 6

CMR’s “MarLIN” Metadata System Who uses metadata?  Science agencies and jurisdictions use it to describe their data holdings – e.g. (in our sphere of interest):  Australian Antarctic Division  Australian Hydrographic Service  BRS  Bureau of Meteorology  Dept. of Environment and Heritage [EA]  Geoscience Australia...  Jurisdictional directories: ACT, NSW, NT, SA, TAS, VIC, WA  Overseas examples – some agency-based, some jurisdictional, some national/thematic (e.g. European marine data, international “global change” data, space/satellite data, etc.)  Frequently, metadata “push” is coming predominantly from the “spatial data” community, i.e. data with a geographic component – but similar principles can be applied to (virtually) any data. 7

CMR’s “MarLIN” Metadata System Example: the Australian Spatial Data Directory (ASDD)  Single gateway (portal) to search 20+ metadata systems around Australia concurrently  CMR is represented (but no other CSIRO Divisions currently have a metadata system)  Searching is fairly basic, individual system entry points often have more functionality (e.g. MarLIN), however both have their place. 8

CMR’s “MarLIN” Metadata System How it works in practice... online data files / databases documents (digital and non-digital) offline data archives images (graphics, photos, video) specimen collections www user (internal / external) CMR metadata system describes / points to... metadata gateways, search engines (e.g. ASDD, Google, etc.) (etc.) 9

CMR’s “MarLIN” Metadata System An overseas metadata example  The UK’s “National Biodiversity Network” ( holds datasets on species distribution surveys for UK birds, animals, invertebrates, plants, etc...

CMR’s “MarLIN” Metadata System An overseas metadata example... 11

CMR’s “MarLIN” Metadata System An overseas metadata example... 11

CMR’s “MarLIN” Metadata System An overseas metadata example... 12

CMR’s “MarLIN” Metadata System An overseas metadata example... 12

CMR’s “MarLIN” Metadata System An overseas metadata example... 13

CMR’s “MarLIN” Metadata System An overseas metadata example... 13

CMR’s “MarLIN” Metadata System An overseas metadata example...

CMR’s “MarLIN” Metadata System An overseas metadata example... 13

CMR’s “MarLIN” Metadata System In other words, metadata supports...  Dataset discovery – via lists and / or structured searches  Dataset appraisal (via descriptive information) – including  what (dataset content)  where (dataset spatial footprint) – if applicable  when (dataset temporal footprint)  who by, why, etc.  Dataset access constraints – who can access, under what conditions  Dataset location and access point  Supplementary information e.g. documentation, images, references, etc. 14

CMR’s “MarLIN” Metadata System Metadata standards  Metadata example just shown – own format (no externally recognised standard)  Standards assist interoperability – e.g.  USA – 2 standards currently (1 small one “DIF”, one large one “FGDC”)  UK / Europe – historically, little standardization – but new ISO standard (2003) now exists (based on US “FGDC” model)  Australian uses “ANZLIC” standard (v.2) – currently pre- ISO, next version will be ISO compatible. 15

CMR’s “MarLIN” Metadata System What’s in the ANZLIC standard? 16

CMR’s “MarLIN” Metadata System What’s in the ANZLIC standard?  Dataset title, ANZLIC identifier  Custodian organisation and contact  Abstract, search words, bounding box, and Geographic Extent Name or polygon  Start / end dates, progress and maintenance status  Access constraints, stored and available data formats  Data quality (lineage, positional & attribute accuracy, completeness, and logical consistency)  Metadata entry (or last update) date  “Additional metadata” (for anything else) 16

CMR’s “MarLIN” Metadata System What’s in the ANZLIC standard?  Dataset title, ANZLIC identifier  Custodian organisation and contact  Abstract, search words, bounding box, and Geographic Extent Name or polygon  Start / end dates, progress and maintenance status  Access constraints, stored and available data formats  Data quality (lineage, positional & attribute accuracy, completeness, and logical consistency)  Metadata entry (or last update) date  “Additional metadata” (for anything else) Note: - Bounding box, and start / end dates support spatial and temporal searching - “Search words” support structured searches and information retrieval - Remaining fields searchable as free text. 16

CMR’s “MarLIN” Metadata System What’s missing from the ANZLIC standard (but would be useful)? 17

CMR’s “MarLIN” Metadata System What’s missing from the ANZLIC standard (but would be useful)?  Originator organisation (for data obtained from elsewhere)  Contributors, Acknowledgements, References  “Global Project” affiliation (e.g. WOCE, JGOFS, etc.)  Better keywords (ANZLIC ones are very high level)  Better geographic footprint (e.g. by grid squares or similar) – especially for patchy / irregular sampling patterns  CMR Project affiliation  Voyage or survey name (and relevant details)  Species names (if relevant)  Data volume and attributes in the dataset, plus information about its local storage environment  Links to documentation, graphics, and the data itself (where available) (probably some other stuff too, but that is a good start). 17

CMR’s “MarLIN” Metadata System CMR metadata standard = “extended ANZLIC”... ANZLIC + useful extras = “CMR metadata standard” (1998 onwards) - informal set of elements of value to our operations - also, prototype for draft CSIRO metadata standard (2002). 18

CMR’s “MarLIN” Metadata System The external context – Australian Government 19

CMR’s “MarLIN” Metadata System The external context – Australian Government Commonwealth Statement – via Office of Spatial Data Management (OSDM) extract from: AUSTRALIAN GOVERNMENT CUSTODIANSHIP GUIDELINES [for spatial data in this instance]: The Rights and Responsibilities of Spatial Data Custodians  “1. Various Australian Government agencies hold large amounts of spatial data, and will continue to collect more in the future. To achieve efficient and effective acquisition, management and use of spatial data, custodian agencies will be given policy guidelines setting out custodianship rights and responsibilities. 19

CMR’s “MarLIN” Metadata System The external context – Australian Government Commonwealth Statement – via Office of Spatial Data Management (OSDM) extract from: AUSTRALIAN GOVERNMENT CUSTODIANSHIP GUIDELINES [for spatial data in this instance]: The Rights and Responsibilities of Spatial Data Custodians  “1. Various Australian Government agencies hold large amounts of spatial data, and will continue to collect more in the future. To achieve efficient and effective acquisition, management and use of spatial data, custodian agencies will be given policy guidelines setting out custodianship rights and responsibilities.  A key part of any set of spatial data is the accompanying metadata [...] The custodian of the data is normally the best placed to supply this information. 19

CMR’s “MarLIN” Metadata System The external context – Australian Government Commonwealth Statement – via Office of Spatial Data Management (OSDM) extract from: AUSTRALIAN GOVERNMENT CUSTODIANSHIP GUIDELINES [for spatial data in this instance]: The Rights and Responsibilities of Spatial Data Custodians  “1. Various Australian Government agencies hold large amounts of spatial data, and will continue to collect more in the future. To achieve efficient and effective acquisition, management and use of spatial data, custodian agencies will be given policy guidelines setting out custodianship rights and responsibilities.  A key part of any set of spatial data is the accompanying metadata [...] The custodian of the data is normally the best placed to supply this information.  The custodian is expected to facilitate efficient and effective use of the government's data, so as to derive maximum benefit from the investment. Thus the metadata must always be readily available, not just for existing users, but for potential users. The custodian should maintain publicised points of contact for enquiries and be in a position to provide the appropriate metadata promptly.” 19

CMR’s “MarLIN” Metadata System The external context – CSIRO 20

CMR’s “MarLIN” Metadata System The external context – CSIRO extract from: [draft] CSIRO Scientific Data Management Policy – as submitted to Executive, March 2002 [technically still a “draft awaiting approval”] under Scientific Data Management Roles, Responsibilities and Actions: Corporate  “Senior management (CEO, Deputy CEOs, Business Unit Chief Executives) are to foster and encourage the development of a culture within CSIRO where:-  the value of scientific data and associated data management is recognised and rewarded;  scientific data assets are shared by staff within the Organisation, and where appropriate, with others outside the Organisation.” 20

CMR’s “MarLIN” Metadata System The external context – CSIRO extract from: [draft] CSIRO Scientific Data Management Policy – as submitted to Executive, March 2002 [technically still a “draft awaiting approval”] under Scientific Data Management Roles, Responsibilities and Actions: Corporate  “Senior management (CEO, Deputy CEOs, Business Unit Chief Executives) are to foster and encourage the development of a culture within CSIRO where:-  the value of scientific data and associated data management is recognised and rewarded;  scientific data assets are shared by staff within the Organisation, and where appropriate, with others outside the Organisation.” Research projects  “The incorporation of scientific data management objectives into routine R&D planning and development procedures. This should include procedures that will ensure the recording and updating of metadata, ensure the current and future security of the data asset (backup and archiving), ensure the protection of CSIRO’s intellectual property, and resolve issues of data ownership and future access.” 20

CMR’s “MarLIN” Metadata System The external context – CSIRO extract from: CSIRO Scientific Data Management Policy – as submitted to Executive, March 2002 [technically still a “draft awaiting approval”] under Scientific Data Management Roles, Responsibilities and Actions: Individual Officers  “Adopt a ‘one-CSIRO’ view of the data they collect, analyse, back- up and archive;  Adopt and utilise the CSIRO Metadata Standard and make the recording of metadata a routine part of their work practices; and  Ensure the security of CSIRO scientific data assets.” (Comment:)... the above are mainly “sticks”, “carrots” will be discussed later in the presentation. 21

CMR’s “MarLIN” Metadata System What is a “dataset”, in this context? 22

CMR’s “MarLIN” Metadata System What is a “dataset”, in this context? According to the ISO metadata standard (ISO 19115, 2003):  “Dataset: an identifiable collection of data. (NOTE - A dataset may be a smaller grouping of data which, though limited by some constraint such as spatial extent or feature type, is located physically within a larger dataset [...] A hardcopy map or chart may be considered a dataset.) ” 22

CMR’s “MarLIN” Metadata System What is a “dataset”, in this context? According to the ISO metadata standard (ISO 19115, 2003):  “Dataset: an identifiable collection of data. (NOTE - A dataset may be a smaller grouping of data which, though limited by some constraint such as spatial extent or feature type, is located physically within a larger dataset [...] A hardcopy map or chart may be considered a dataset.) ” “In practice” definition... a collection of data sharing common features such as data type, data collection activity or data assembly purpose, management / availability as a discrete unit, etc. Size of data “chunks” to be described (aka dataset granularity) is a subjective choice – whatever best suits the data custodian, or is most valuable to prospective data users Basically it comes down to a “lumping” or “splitting” decision (or set of guidelines) – however splitting down to the atomic level is probably undesirable in this context, for practical considerations. 22

CMR’s “MarLIN” Metadata System The CMR metadata story so far...  CMR has the “MarLIN” metadata system – implemented (plus ongoing enhancements) – in-house software, based on EA original (we can modify further as needed)  Records can be “internal”, = CSIRO only (for confidential or third party data), or “public” (open access)  Currently holds 2,100+ dataset descriptions – c.1,000 of these describe centrally-held datasets (Data Centre holdings)  Coverage of other Divisional holdings is patchy at present (a few groups have made an effort, many have not)  To address this in part, it is proposed to construct “skeleton” (template) records for all Divisional science projects (CMR plus CAR, i.e. future CMAR) – to act as a starting point for projects to describe their data holdings  Also, management / project “buy in” to the concept needs further developing. 23

CMR’s “MarLIN” Metadata System Searchability  Structured searches – including browse by subject categories, keywords, projects, custodian (site), voyages, species names, and more  Free text searches  Lists of titles – including recent additions / updates  Search by space and time criteria  Search for “own” records via Edit interface  Search via ASDD (Australia-wide metadata gateway) – public records only  Also – text searches via “Google” etc. will find relevant public records. 24

CMR’s “MarLIN” Metadata System Example: search (browse) by taxonomic group 25

CMR’s “MarLIN” Metadata System Example: search (browse) by taxonomic group

CMR’s “MarLIN” Metadata System Free text search... (e.g. for specialist terms, person names, etc.) 26

CMR’s “MarLIN” Metadata System Free text search... (e.g. for specialist terms, person names, etc.) 26

CMR’s “MarLIN” Metadata System Free text search... (e.g. for specialist terms, person names, etc.) 27

CMR’s “MarLIN” Metadata System Full metadata record (start) (etc.) 28

CMR’s “MarLIN” Metadata System Full metadata record (start) (etc.)  Includes grid square representation for spatial “footprint” of datasets (where possible) 28

CMR’s “MarLIN” Metadata System Full metadata record (start) (etc.)  Includes grid square representation for spatial “footprint” of datasets (where possible) 28

CMR’s “MarLIN” Metadata System Present MarLIN Edit interface (metadata entry tool) - start (etc.) 29

CMR’s “MarLIN” Metadata System Where to find MarLIN 30

CMR’s “MarLIN” Metadata System Where to find MarLIN 30

CMR’s “MarLIN” Metadata System Where to find MarLIN 30

CMR’s “MarLIN” Metadata System Sample MarLIN usage – 6 months to 11/5/2005  CMR accesses (searches + page views) – 945 / month  other Australian accesses (non CMR): 6,000 / month  international, + “origin unavailable” accesses: 10,000 / month (NB excludes search engine hits – over 50,000 / week)... presumably, some people arrive here by accident, others are “just looking”, however a percentage are genuinely after CMR data! Also, process works the other way – persons requesting data from our holdings can be pointed to MarLIN for the relevant metadata. 31

CMR’s “MarLIN” Metadata System Creating MarLIN Content – roles? Data Centre staff  describe Data Centre holdings (centrally managed data)  maintain the metadata system, provide user assistance, etc.  may undertake rescue and description of legacy data (where resources available)  maintain currency of Data Centre’s metadata records 32

CMR’s “MarLIN” Metadata System Creating MarLIN Content – roles? Data Centre staff  describe Data Centre holdings (centrally managed data)  maintain the metadata system, provide user assistance, etc.  may undertake rescue and description of legacy data (where resources available)  maintain currency of Data Centre’s metadata records Project data custodians  describe their own data holdings (including data acquired from third parties)  maintain currency of their own metadata records  may pass data to central holdings for archiving / online access – with appropriate metadata 32

CMR’s “MarLIN” Metadata System Creating MarLIN Content – roles? Data Centre staff  describe Data Centre holdings (centrally managed data)  maintain the metadata system, provide user assistance, etc.  may undertake rescue and description of legacy data (where resources available)  maintain currency of Data Centre’s metadata records Project data custodians  describe their own data holdings (including data acquired from third parties)  maintain currency of their own metadata records  may pass data to central holdings for archiving / online access – with appropriate metadata Project / RG leaders, Divisional management  promote / facilitate / monitor project-level metadata activity? 32

CMR’s “MarLIN” Metadata System Recap - what are we trying to do here? (activities)  Describe our data holdings – to the inside and outside world  Bring together relevant dataset documentation (or pointers to it) in a single, www-accessible location  includes incentive to source, digitise, and web-enable relevant images, text documents, item-level lists, etc.  Provide a tailored set of search tools which suit our data holdings and audience  Provide access to our metadata (and data where appropriate) – on a self serve, 24/7 basis  Connect our entered information to the wider world for “discovery” purposes, e.g. to metadata gateways and internet search engines. 33

CMR’s “MarLIN” Metadata System Benefits of good metadata to the Organisation (impacts)  Assists in data management – knowing what we have is a precursor to managing it well  Assists our researchers to do their job better – plan projects, locate data quickly, assess gaps, reduce duplication of effort / multiple purchases of third party data (Should be particularly valuable in multi-site, multi-Division environment such as CSIRO – counteract intentional or unintentional “silo” effect)  Facilitates information and data access, for internal and external users (e.g. self service via the web, not reliant on human responses)  Communicates / promotes our resources to the wider world, demonstrates compliance with relevant policies / legislation  Auto-generation of relevant data listings – for reporting purposes, etc.  Captures “corporate memory” – key aspects about data necessary for its current and future discovery, appraisal, and re-use. 34

CMR’s “MarLIN” Metadata System Concluding remarks  MarLIN content – a “work in progress” (continually moving target...) – some runs on the board, still a way to go... 35

CMR’s “MarLIN” Metadata System Concluding remarks  MarLIN content – a “work in progress” (continually moving target...) – some runs on the board, still a way to go...  Potential “trailblazer” / proof-of-concept for a one-CSIRO system (one day)  but still waiting for “buy in” at corporate level to good data management principles and tools 35

CMR’s “MarLIN” Metadata System Concluding remarks  MarLIN content – a “work in progress” (continually moving target...) – some runs on the board, still a way to go...  Potential “trailblazer” / proof-of-concept for a one-CSIRO system (one day)  but still waiting for “buy in” at corporate level to good data management principles and tools  Good vehicle for promotion / reporting / dissemination of information about Divisional research activities (in tandem with communications, scientific publications, etc.), if that is what we want 35

CMR’s “MarLIN” Metadata System Concluding remarks  MarLIN content – a “work in progress” (continually moving target...) – some runs on the board, still a way to go...  Potential “trailblazer” / proof-of-concept for a one-CSIRO system (one day)  but still waiting for “buy in” at corporate level to good data management principles and tools  Good vehicle for promotion / reporting / dissemination of information about Divisional research activities (in tandem with communications, scientific publications, etc.), if that is what we want  Compliance issues seem to have a back seat at present, may become more significant in future, e.g....  obligations on public-funded agencies to describe their data holdings  obligations on completed projects to archive and describe their data along with final reports (etc.) 35

CMR’s “MarLIN” Metadata System NB, Data Centre undertakes a range of activities, MarLIN is only one of them... 36

CMR’s “MarLIN” Metadata System “Data Centre Space” Divisional Data Warehouse Long term project data stores (“data islands”) Historic + non- digital project data “MarLIN” metadata system (data catalogue) “CMR Project Space” OBIS AODC-JF (2005-6) Oceans Portal (2005-6) ASDD (metadata) “Data Trawler” (data access) National Facility Research Vessel Manned data requests service External Automated Systems Digital project data – finite duration other DC systems (including off line archives) Data Centre holdings DATA SOURCES DATA USERS Client Groups: CMR Project Staff CMR Managers National Facility Users External Collaborators General Public export to / access via... Project data advice / assistance (3 CMR sites) CAAB c-squares - master species dictionary - species distributions - etc. - spatial search and retrieval - online mapping MarLIN as a component of Data Centre activities 37

CMR’s “MarLIN” Metadata System  Are we on the right track here (at all?)... are we currently doing enough / too much / not enough in the metadata arena?... does the system fulfil current user (client) expectations – e.g. Divisional staff users, others?... how do we determine “metadata usefulness”? Some questions / discussion points 38

CMR’s “MarLIN” Metadata System  Are we on the right track here (at all?)... are we currently doing enough / too much / not enough in the metadata arena?... does the system fulfil current user (client) expectations – e.g. Divisional staff users, others?... how do we determine “metadata usefulness”?  Who are the “business owners” (beneficiaries) of the concept? Whose problem is it if metadata does not get created? (current / future) Some questions / discussion points 38

CMR’s “MarLIN” Metadata System  Are we on the right track here (at all?)... are we currently doing enough / too much / not enough in the metadata arena?... does the system fulfil current user (client) expectations – e.g. Divisional staff users, others?... how do we determine “metadata usefulness”?  Who are the “business owners” (beneficiaries) of the concept? Whose problem is it if metadata does not get created? (current / future)  What is the situation in other CSIRO Divisions? Some questions / discussion points 38

CMR’s “MarLIN” Metadata System  Are we on the right track here (at all?)... are we currently doing enough / too much / not enough in the metadata arena?... does the system fulfil current user (client) expectations – e.g. Divisional staff users, others?... how do we determine “metadata usefulness”?  Who are the “business owners” (beneficiaries) of the concept? Whose problem is it if metadata does not get created? (current / future)  What is the situation in other CSIRO Divisions?  How does the upcoming CMR / CAR merger impact on future metadata activities? Some questions / discussion points 38

CMR’s “MarLIN” Metadata System  Are we on the right track here (at all?)... are we currently doing enough / too much / not enough in the metadata arena?... does the system fulfil current user (client) expectations – e.g. Divisional staff users, others?... how do we determine “metadata usefulness”?  Who are the “business owners” (beneficiaries) of the concept? Whose problem is it if metadata does not get created? (current / future)  What is the situation in other CSIRO Divisions?  How does the upcoming CMR / CAR merger impact on future metadata activities?  What are the resource implications (i.e., costs) of maintaining accurate and current system content?  e.g. AAD model – 1 person [1 x FTE] to oversee system content, nag assist metadata creators, etc. – for xx science staff Some questions / discussion points 38

Thank You To visit MarLIN: go to CMR home page ( >> Data Centre ( >> MarLIN ( Please contact Data Centre staff at any site for further assistance / information. Contact CSIRO Phone: Web:

CMR’s “MarLIN” Metadata System (supplementary slides)

CMR’s “MarLIN” Metadata System Potential barriers / obstacles to be overcome  Lack of incentives (carrots, i.e. rewards, recognition) and / or obligations (sticks) for projects / staff to create metadata  Perceived lack of usefulness (incomplete coverage, information can be out of date) – researchers use other more “reliable” methods to locate the data they need  Describing own data is a low-level chore, other tasks have higher priority (or are more interesting, or produce more immediate benefits)  Some people do not wish to share “their” data anyway, describing it in a publicly accessible system is counter-productive  A lot of data is not in a re-useable state (e.g. not digitised, incomplete, disorganised, not documented), lack of resources to address this  Metadata entry tool is cumbersome, and asks too many “hard” questions about the data  Once created, metadata then has ongoing maintenance overhead associated with it (never ending task...) 39

CMR’s “MarLIN” Metadata System Possible solutions...  Lack of carrots / sticks  Perceived lack of usefulness (incomplete coverage)  Low priority task  Resistance to sharing “own” data  Poor data state  Metadata entry tool complicated  Ongoing metadata maintenance requirement 40

CMR’s “MarLIN” Metadata System Possible solutions...  Lack of carrots / sticks  Perceived lack of usefulness (incomplete coverage)  Low priority task  Resistance to sharing “own” data  Poor data state  Metadata entry tool complicated  Ongoing metadata maintenance requirement  Cultural change – including top- down policy  Integration into project workflow  Increased participation; “skeleton” record for every project  as (1) above; maybe with regular audit / reporting function  Cultural change  Better data management practices  Targeted resources, for priority legacy datasets?  Revamp / upgrade metadata entry tool; user instruction / assistance  as (1) above; also some Data Centre involvement 40

CMR’s “MarLIN” Metadata System online and offline resources metadata repository describes / points to... metadata entry tool (edit interface) content creator (internal) www user (internal / external) metadata search tool content moderator metadata gateways, search engines remote browse / search capability re-edit... Components of the metadata system database administrator 41

CMR’s “MarLIN” Metadata System sneak preview of “upgraded” MarLIN Edit interface (under construction, May ’05) 42

CMR’s “MarLIN” Metadata System future agency system metadata systems describe / point to... CMR MarLIN CMR data DEH EDD DEH data BoM BoM data GA GA data etc. National Metadata Infrastructure 43

CMR’s “MarLIN” Metadata System future agency system metadata systems describe / point to... CMR MarLIN CMR data DEH EDD DEH data BoM BoM data GA GA data etc. ASDD Australian Spatial Data Directory – national cross-agency metadata gateway search via ASDD – search across multiple agencies, basic functionality search via MarLIN – search only CMR holdings, but extra functionality (also view “CMR internal” records not visible to external users) National Metadata Infrastructure 43