Contextual Metadata Jan Dvorak CERIF Task Group euroCRIS Charles University in Prague, CZ InfoScience Praha, CZ The 2013 euroCRIS Seminar :: September 9-10, 2013 in Brussels, Belgium
Research Metadata Discovery metadata for information to be found Serve many specific use-cases, scenarios, niches Many standards Tens of major ones Hundreds of domain-specific standards …… Thousands on experiment-level
The Purpose of Metadata Enable the re-use of resources Knowledge stored in publications Data in datasets Functionality in software Participation in events Infrastructure Facilities Equipment Services
Common Grounds Organisations Universities, Research institutes, Hi-tech companies Funding bodies & organisations Publishers Facility operators People Researchers Management
One Domain Research
Consistency Several possible views of the same objects Inconsistencies would be unprofessional (at the very least)
Common Metadata Format? To drive all the discovery metadata views A lingua franca for research
Requirements Complete coverage of research information Interlinked: the context Allow for many perspectives on the research information Accommodate multilinguality: support translations Accept the world keeps changing: record history Declared semantics: definitions rather than terms Formal syntax – machine processable & understandable
… the answer CERIF C ommon E uropean R esearch I nformation F ormat C ommon E xchange R esearch I nformation F ormat
CERIF: a concise history CERIF91 – flat file CERIF 2000 – database structured CERIF 2006 – semantics moved into Semantic Layer XML exchange format CERIF 1.5 (2012) – federated identifiers XML exchange format polished CERIF 1.6 (2013) – datasets supported
CERIF: Complete Coverage cfExpertise AndSkills cfEquipment cfFunding cfFacility cfService cfCitation cfEvent cfLanguagecfCurrency cfCountry cfCurriculum Vitae cfPrize cfQualification cfGeographic BoundingBox cfPostalAddress cfElectronicAddress cfPerson cfProject cfOrganisation Unit cfResultPaten t cfResult Publication cfResultProduct cfIndicator cfMeasurement cfFederated Identifier
CERIF: Many Perspectives Start from any entity: Project – funding, consortium, project team, outputs Publication – authors, publisher, funding Research dataset – creator/contributor, origin project, publications that build upon it Person – outputs, datasets, projects, events, … …… A mesh, a fully connected graph
CERIF: Multilinguality Any free-text attribute is treated as: Possibly multi-valued Each value qualified with Language code Translation mode Original value Human translation Machine translation
CERIF: Interlinking (Almost) any entity connected to any other entity Most entities connected to itself “is-part-of / has part” “builds upon / is used by”
CERIF: Record History Every relationship records the time interval in which it is/was/will be true Open ends represented by effective ±∞ When something changes: the old relationship is not removed, only its end date is set a new relationship is inserted, starting now Historic data accumulates
CERIF: Declared Syntax Terms can be misleading Senior researcher vs. Research associate It’s the real meaning that matters Definition Description Examples
Research Information Infrastructure Discovery metadata generated from CERIF references Detailed (meta)data