CERIF for Datasets: Background and Key Findings Workshop, London 26 th July 2013 CERIF slides reproduced from presentations by euroCRIS members : Keith Jeffery, Brigitte Joerg, Anna Clements
C4D workshop, Glasgow & London. July 2013 n JISC MRD Programme n Consortium : Sunderland, Glasgow, St Andrews, NERC, EPSRC, DCC and euroCRIS n “CERIFication” of the metadata about research datasets n Focus on MEDIN* standard : NERC requirement for * C4D Summary
C4D workshop, Glasgow & London. July 2013 Datasets & metadata Datasets have sparked interest in metadata standards that support their: n Discoverability n Description n Usability n Re-use
C4D workshop, Glasgow & London. July 2013 For example … n CKAN : –Software platform; default schema is DC n eGMSDescription –UK e-Government metadata standard; based on DC –‘flat’ model; single entity (a resource or dataset); keep adding attributes n DCAT –RDF schema vocabulary for PSI (public sector info) –Some normalisation; can’t capture different roles/semantics in relationships
C4D workshop, Glasgow & London. July 2013 Houssos, N., Joerg, B., Matthews, B.. A multi-level metadata approach for a Public Sector Information data infrastructure. CRIS2012. Prague June
C4D workshop, Glasgow & London. July 2013 n Common European Research Information Format n A conceptual model for describing the complete research domain n A standard for the development, implementation and interoperability of current research information systems (CRIS) and their various application n Est. 1991; maintained by … so what about CERIF?
C4D workshop, Glasgow & London. July 2013 n Not for profit organisation of experts –Research organisations; funders; publishers; systems providers; standards organisations n 109 institutional, 38 personal & 20 affiliate members (euroCRIS annual report 2012) n 41 countries; not just Europe n Main activity is the development, maintenance and of implementation CERIF … and euroCRIS?
C4D workshop, Glasgow & London. July 2013 euroCRIS : Strategic Partners
C4D workshop, Glasgow & London. July 2013 In the UK : The CERIF landscape
C4D workshop, Glasgow & London. July 2013 n 1/3 of UK HEIs have a CERIF-compliant CRIS* n Driven by desire to better support research management at the institutional level n … and streamline reporting to funders UK CERIF adoptipn Source: UKOLN (R. Russell), Adoption of CERIF in Higher Education Institutions in the UK: A Landscape Study, March
C4D workshop, Glasgow & London. July CERIF 91 PROJECT 2000 CLASSIFICATION RESULTSEQUIPMENT PROJECT OrgUnitPERSON EXPERTISE Roles CERIF 2000 Model - Networking of DBs - Exchange of Records - EC Recommendation to Member States - Data Model - Multilinguality - Controlled Vocabulary - Roles / Types - User-driven - EC Recommendation to Member States 2ndLevel Base Language Semantics Link CERIF 2006 / 2008 Model - Data Model - Model Normalization - Robust/Consistent Structure - Extensible Structure - Semantic Layer - XML Exchange Specification - Elaboration on Publication - CERIF Core Semantics ( ) Data Model -- Infrastructure - Facility, Equipment, Service - Measurement & Indicator - Entities and Link Tables - Geographic Bounding Box - CERIF 1.3 Vocabulary - UUIDs - Terms - Schemes - CERIF 1.4 new XML format - CERIF 1.5 Federated Identifiers CERIF 1.5 CERIF 1.4 (XML) CERIF Linked Data Acronym : ERGO Participants : Keith Jefffery, Anne Asserson, Rutherford Appleton Lab, Univ Bergen,, many more CERIF Data Model -- C4D datasets
C4D workshop, Glasgow & London. July 2013 CERIF Entity Types Base Entities Result Entities Infrastructure Entities 2nd Level Entities Link Entities CERIF Features Multiple Language Semantics Measures & Indicators Geographic Bounding Box
C4D workshop, Glasgow & London. July 2013
Person ID URI Gender FirstNames OtherNames FamilyNames NameVariants ResearchInterest Keywords Project ID URI Acronym StartDate EndDate Title Abstract Keywords OrganisationUnit ID URI Acronym Name HeadCount CurrencyCode Turnover ResearchActivity Keywords
C4D workshop, Glasgow & London. July 2013 cfOrganisationUnit cfID cfURI cfAcronym cfHeadCount cfCurrencyCode cfTurnover cfTitle cfAbstract cfKeywords cfName cfKeywords cfDescription cfKeywords cfFamilyNames cfFirstNames cfOtherNames cfNameVariants cfPerson cfID cfURI cfGender cfBirthdate cfProject cfID cfURI cfAcronym cfStartDate cfEndDate
C4D workshop, Glasgow & London. July 2013
ResultProduct ID URI ResultPublication ID URI Title Subtitle Abstract Bibl. Note PublicationDate TotalPages StartPage EndPage Keywords ResultPatent ID URI PatentNumber Title CountryCode RegistrationDate ApprovalDate Description Keywords
C4D workshop, Glasgow & London. July 2013 cfResultPublication cfID cfURI cfNumber PublicationDate cfStartPage cfEndPage cfTotalPages cfEdition cfSeries cfIssue cfVolume cfISBN cfISSN cfResultPatent cfID cfURI cfPatentNumber cfCountryCode cfRegistrationDate cfApprovalDate cfTitle cfAbstract cfKeywords cfSubtitle cfVersionInfo cfBibliographic Note cfAbbreviation cfDescription cfKeywords cfName cfResultProduct cfID cfURI cfVersionInfo cfAbstract cfKeywords cfName
C4D workshop, Glasgow & London. July 2013 n CERIF has many advantages as the canonical model (the research information entities, attributes, associations and semantics) for contextual metadata for datasets: –Covers all aspects of research information: researchers, projects, organisations, funding, outputs, equipment, services, and so on; –An optimal (relational) architecture allowing the expression of any kind of relation between entities/attributes with every relation “time-stamped” and semantically defined; –Very fine-grained structure, allowing output of the metadata to virtually any format; –A separated “semantic layer” allowing the use of multiple (any) controlled vocabularies (classifications, typologies) as well as their cross-linking and mapping; –Ability to cope with multiple languages Advantages of CERIF
C4D workshop, Glasgow & London. July 2013 Mapping to CERIF 24 of 30 MEDIN elements mapped to CERIF
C4D workshop, Glasgow & London. July 2013
DataCite version 3.0 Mandatory Identifier Creator Title Publisher Publication Year Recommended Subject Contributor Dates relevant to work Resource Type Optional Scheme URI Title Type Subject Scheme Related Identifier Relation Type Description GeoLocation Language of Resource Alternate Identifier Related Metadata Size Data Format Version Rights Geolocation Place More work required?
C4D workshop, Glasgow & London. July CERIF 1.6 released for testing 25 th July 2013
C4D workshop, Glasgow & London. July 2013 Mapping to other schemata C4D vs RE3Data vs DCI vs DataCite
C4D workshop, Glasgow & London. July 2013 n CERIF metadata model can be used to record rich metadata about datasets can related to other pieces of the research landscape can evolve / extend within formal euroCRIS governance structure BUT … n Needs testing in production environments n Is cfResProd appropriate? Not just a research result? n Ongoing need for agreed vocabularies CASRAI RCUK harmonisation Key Findings
C4D workshop, Glasgow & London. July 2013 n Have used C4D as basis for checking whether DataFinder is rich and detailed enough n Once the C4D profile has been finalised, DaMaRo will embark on implementation of C4D-compliant outputs n Most fields map to C4D Case Study: DaMaRo
C4D workshop, Glasgow & London. July 2013 n Further consultation with euroCRIS/CERIF TG in terms of best approach n Aiming to achieve most comprehensive set of metadata (incorporating RE3Data, DataCite, etc.) n Move new Pure model to production (after REF) n Exporting and importing CERIF-XML from systems; exploring this with n Aggregation of data into national data register model Next Steps