Workshop on the DOI System DOI SYSTEM: DATA MODEL International DOI Foundation
DOI Data Model and interoperability Application profiles Kernel metadata Metadata declaration Role of DOI name metadata Origins of the DOI Data Model Semantic interoperability The indecs principles Applications of indecs The use of a data dictionary Example: rights management Outline / Key concepts in this section doi>
DOI Handbook Chapter 4, DOI Data Model DOI System and Data Dictionaries Factsheet: Further reading on key concepts in this section doi>
DOI data model The underlying model of how data within the DOI System relates to other data –Therefore vital for interoperability Whereas the Handle System component is needed for every DOI name, the DOI Data Model component has not yet been used to its full extent –Some applications have used it behind the scenes Interoperability becomes more important as an economic feature when there are multiple services or multiple uses – which there will be eventually –Dont design only for today doi>
Application Profile The properties of groups of DOI names are defined as APs Service Instance APs have one or more Services Application Profile (AP) Framework Entities are identified by DOI names Service Definition Services have definitions Service Definition doi>
Service InstanceService Definition Service Instance Application Profile Application Profile Service DefinitionService Instance Service Definition The properties of groups of DOI names are defined as APs APs have one or more Services Services have definitions Application Profile (AP) Framework New APs and services may be created or made available One change to an AP to affect all DOI names within that AP Entities are identified by DOI names Service Definition doi>
DOI Application Profile A DOI Application Profile is a DOI name view: mechanism for unity in diversity: what do all these DOI names have in common? Based on any interest groups view of a type of creation (a DOI Name User Community). Functional granularity: create a grouping when you need it. DOI-APs can overlap: things can be in multiple DOI-APs. DOI-AP has metadata kernel, Registration Agency, Governance /Development Group Zero Set = initial implementation DOI names (just a single URL redirection; zero additional metadata).
Activity tracking Activity tracking Full implementation Full implementation Initial implementation Initial implementation Single redirection (persistent identifier ) Metadata W3C, WIPO, NISO, ISO, IETF, etc Multiple resolution
Defined App Profiles Defined App Profiles Zero App Profile Single redirection (persistent identifier ) Metadata W3C, WIPO, NISO, ISO, IETF, etc Multiple resolution Activity tracking Activity tracking
Each DOI-AP starts from a basic Kernel (8 elements) and may add whatever else it needs: defined by the DOI Name User Community. DOI name metadata vocabulary being developed - in tandem with ONIX etc Can/should coincide with or provide sector requirements Different DOI-APs metadata will interoperate if vocabularies are developed within indecs-based model. DOI Kernel
DOI name /ISBN resourceIdentifier ISBN resourceName Two for the dough PrincipalAgent,roleJanet Evanovich, author StructuralTypePhysical fixation CharacterText Mode Visual referentTypeBook DOI Kernel Contains critical minimum metadata for basic recognition (but not complete disambiguation). Standard base vocabulary DOI -AP entity (e.g. book) must be analysable in terms of other attributes (e.g. media, mode, content, subject).
DOI Kernel as the basis of each application profile DOI AP metadata for application Compulsory kernel Each Profile can be thought of as built from the kernel + extensions:
Each DOI-AP can be thought of as built from the kernel + extensions…...But the kernel is actually what several APs have in common (compare the different views of a person) : Son Legal person Agent Alien Scholar Library user Composer credit card holder Shoe purchaser Author Lottery entrant Hospital patient Citizen Car driver Rights owner Marathon runner Software licensee Parent Tax payer Club member e-consumer Back account holder Husband Charity giver Hotel guest Speeding ticket recipient Disney World visitor Frequent Flyer Concert-goer Passenger Employee Voter Dog owner DOI Kernel as the basis of each application profile
This kernel cannot be logically defined from first principles In the absence of existing Application Profiles to define this overlap = kernel, we have made a reasonable estimate from the logical analysis of DOI Kernel as the basis of each application profile
DOI AP 1 metadata for AP DOI AP 2 DOI AP 3 DOI-APs: all metadata in well-formed structure kernel for any DOI name
Metadata declarations WHAT: Base kernel metadata must be declared. DOI-AP-specific metadata is a matter for the DOI Name User Community (Governance Group/Registration Agency) to decide. HOW: Either local webpage or central repository or both (as decided by User Community rules). Automated access to metadata declaration via Handle data types? XML schemas.
Roles of declared metadata = Functional specification of the DOI kernel (a) to assign a unique DOI name to the creation [DOI] (b) to link the DOI name to the principal local identifier of a creation (if any) to enable the integration of DOI name-related applications and metadata with others [resourceIdentifier] (c) to enable a searcher or application to identify the creation by its most common name and the parties(s) responsible for its creation or publication [resourceName, principalAgent, agent Role]
Roles of declared metadata (continued) (d) to enable a searcher or application to distinguish the fundamental type of creation (abstract, physical, digital or spatio-temporal), and thereby also to distinguish between creations of different types with the same names and creators. [structuralType] (e) to enable a searcher or application or distinguish the mode of the creation (visual, audio, etc.) [mode, character] (f) to enable a searcher or application to determine to which DOI name user/application set the creation belongs [DOI-AP].
Handles and metadata: a possible development Handle data types could create a way of processing metadata as a distributed database of services: e.g. Data types (and results) must be consistent, so the DOI name Handle data type vocabulary must be developed with great care within indecs-based model. Some data types could be application specific. etc.
Origins of DOI data model The underlying model of how data within the DOI System relates to other data Two components –Data Dictionary + DOI Application Profile Framework Based on the indecs analysis –Provides tool for precise description of entity through metadata (and mapping to other schemes). Met the needs of DOI System development aim: do not re- invent the wheel DOI System and indecs development were in parallel DOI Application Profile framework –Provides means of relating entities: grouping entities and expressing relationships –A mechanism for grouping DOI names with similar properties doi>
popular... Metadata is data about data. Everyone logical... An item of metadata is a relationship that someone claims exists between two entities*. framework functional... Metadata is the life-blood of e-commerce. *entity = something which has identity Definitions of metadata
#1: All metadata is just a view e.g. Views of a person: some (generic) ways in which you might be identified in metadata schemes... Son Legal person Agent Alien Scholar Library user Composer credit card holder Shoe purchaser Author Lottery entrant Hospital patient Citizen Car driver Rights owner Marathon runner Software licensee Parent Tax payer Club member e-consumer Back account holder Husband Charity giver Hotel guest Speeding ticket recipient Disney World visitor Frequent Flyer Concert-goer Passenger Employee Voter Dog owner In each of these roles you will have different IDs and attributes. Three conclusions
#1: All metadata is just a view Creations are the same. An identifier for a published article may refer to... A manuscript The abstract work A draft A (class of) physical copy in a publication A (class of) digital copy (not in a publication) A (class of) digital copy in a publication A (class of) digital format A specific digital copy A (class of) paper copy A specific paper copy An edition A reprint A translation etc…and many combinations of the above Similar views apply to other types of creations. Three conclusions
#1: All metadata is just a view Views must not be confused for digital content and rights management. Mistaken identity can be catastrophic. Increasingly, views need to be interoperable e.g. production workflow, rights, marketing within one business; supply chain transfer; etc. The need for automated, interoperable views in digital commerce will be enormous. Three conclusions
#2: (Almost) all terms need identifiers Each of the values of a view must be defined and identified if other views are to recognize them (what do you mean by an abstract work? an edition? a format? a scholar? a name?) So views need comprehensive controlled vocabularies (note our reliance on ISO language, territory, currency, time codes). Automation needs disambiguity. Terms of rights must be unambiguous. Anything may be a term of an agreement. Emergence of the value of structured ontologies for commerce (like the indecs model). Three conclusions
#3: Events are the key to interoperability Most metadata is thing or people based. static views e.g. a creation In the net future, metadata interoperability will be achieved by describing events; relating things and people dynamic views e.g. A created B Event descriptions will also be the key to rights metadata (transactions are events) Three conclusions
Assigning metadata to a referent, to enable semantic interoperability –say what the referent is –Resolution of an identifier may give the referent; or only metadata; or a manifestation Semantic: –Do two identifiers from different schemes actually denote the same referent? –If A says owner and B says owner, are they referring to the same thing? –If A says released and B says disseminated, do they mean different things? Interoperability: the ability for identifiers to be used in services outside the direct control of the issuing assigner –Identifiers assigned in one context may be encountered, and may be re-used, in another place or time - without consulting the assigner. You cant assume that your assumptions made on assignment will be known to someone else. Persistence = interoperability with the future Meaning doi>
Precisely what is being named? –Suppose I have here a pdf version of Defoes Robinson Crusoe issued by Norton. I find an identifier – is it of: –All works by Daniel Defoe –The work Robinson Crusoe? –The Norton edition of Robinson Crusoe? –The pdf version of the Norton edition of…. ? –The pdf version of…held on this server…? Most digital objects of interest have compound form, simultaneously embodying several referents. Multiple identifiers may be necessary (compare music CDs) doi> A pointer is not enough
Metadata scheme e.g. ONIX Metadata scheme e.g. LOM Agreed term-by-term mapping or Crosswalk
Metadata scheme e.g. ONIX Metadata scheme e.g. LOM
Metadata scheme e.g. ONIX Metadata scheme e.g. LOM ONIX:Author = NormanRights:Writer Metadata Scheme NormanRights Term Author Term Writer Central dictionary
Metadata interoperability: semantic problems Mappings are not simple: Different names (and languages) for the same thing (Author vs Writer) Same name for different things (title, Title) Data elements at different levels of speciality (title vs FullTitle, AlternativeTitle). Different allowed values for elements (pii vs not pii) Data at different levels of granularity (journal_article vs SerialArticleWork/SerialArticleVersion). Data in different structures (article as attribute of journal or vice versa). Data from different sources (local codes vs ONIX codes). Different contextual meaning (DOI name of what…?) Different representation (1 title vs n titles). Different mandatory requirements (ISSN mandatory vs optional) Schemas are being updated all the time..... etc. To manage all of this requires a coherent structured approach. doi>
Semantic layer Rights metadata Data Dictionary Dictionary = a common base semantic layer Communication layer Rights Expression Language XrML, XCML, ODRL, etc Application layer Technology Platform DRM systems, Semantic Web DRM doi>
Semantic = meaning Does A mean the same as B ? –= in practice, does A need a different identifier from B? –versions; works and manifestations; editions For a machine, A means same as B = A has same attributes as B –Which attributes? The answer is entirely contextual: –Do A and B belong to the same class for the purposes of … –The class is defined by a set of attributes (metadata) (RDF, etc) We group similar things together; what is identified is usually a class –e.g. the class of all copies of the hardback printed second edition of this book from this publisher = the same ISBN Ultimately, no one thing is the same as another thing (or they wouldnt be two things) –Roughly speaking, to say of two things that they are identical is nonsense, and to say of one thing that it is identical with itself is to say nothing at all. –Liebnizs Law (no two objects have exactly the same properties) –A class contains similar things Automation = logic doi>
Distinguishing different entities We can always add another attribute to make two like things unlike: –the class of all copies of the hardback printed second edition of this book from this publisher = the same ISBN; –the class of all copies of the hardback printed second edition of this book from this publisher with the luxury leather binding = different ISBN A 4 Blue 75gm 320p A 4 Blue 75gm 320p LR SAME NOT SAME AB doi>
Distinguishing different entities Consequence: No set of metadata elements is definitive for all purposes Practical consideration of purpose = some defined set of attributes: e.g. Bridgeman v Corel (2004): Bridgeman images not copyrightable UK law as they were substantially exact reproductions of public domain works albeit in a different medium, nor US law which requires a distinguishable variation between two distinct copyright items. doi>
2001: Ontologies and Semantic Web Ontologies Of course, this is not the end of the story, because two databases may use different identifiers for what is in fact the same concept, such as zip code. A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing. Ideally, the program must have a way to discover such common meanings for whatever databases it encounters. A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies. doi>
The key to defining what is identified logically –enabling people to use their existing metadata –Ontologies can deliver data dictionaries suitable for mapping Fundamental, generic, extensible methods can be used to construct interoperable ontologies – by putting metadata into context: Ontology approach: deeper view of metadata entityattribute relationshipentity relationship agent context resource timeplace context doi>
3 levels of attribution attribute relationship context 1. attribute view – simplest, most direct book isbn title Words & Rules author Stephen Pinker publisher Wiedenfield dateOfPublication 1999 placeOfPublication UK (values may be strings, IDs etc) entityattribute Ontology approach: attribute doi>
3 levels of attribution attribute relationship context 2. association view – richer, more indirect book hasTitle Words & Rules book hasAuthor Stephen Pinker book hasPublisher Wiedenfeld book hasDateOfPublication 1999 book hasPlaceOfPublication UK allows multiple occurrences allows ranges of target values treats attributes as entities relationshipentity Ontology approach: relationship doi>
3 levels of attribution attribute relationship context 3. context view – richest, most indirect publishingEvent hasAgentType publisher Weidenfeld publishingEvent hasResourceType book publishingEvent hasTimeType dateOfPublication 2002 publishingEvent hasPlaceType placeOfPublication UK most efficient handling of complex metadata agent context resource timeplace Ontology approach: context doi>
Interoperability of Data in E-Commerce Systems project (= MPEG21 Rights Data Dictionary) Focus on multimedia rights metadata: recognized that rights and descriptive metadata were inseparable. Produced an event-based reference model/framework (parties, resources, agreements) indecs: 50% EC funding + consortium members including: EDItEUR (international book industry standards/ONIX) IFPI (international record industry) MPAA (international film industry) Various copyright societies and associations Various technology providers Library and author representatives International DOI Foundation Metadata in networks needs to support interoperability across –media (e.g. books, serials, audiovisual, software, abstract works). –functions (e.g. cataloguing, discovery, workflow, rights mgmt). –levels of metadata (from simple to complex). –semantic barriers. –linguistic barriers. The framework doi>
Principles: Unique Identification: every entity should be uniquely identified within an identified namespace. Functional Granularity: it should be possible to identify an entity whenever it needs to be distinguished [1 st class]. Designated Authority: the author of an item of metadata should be securely identified. Appropriate Access: everyone requires access to the metadata on which they depend, and privacy and confidentiality for their own metadata from those who are not dependent on it. Definition of metadata: An item of metadata is a relationship that someone claims to exist between two referents (description). Delivered: Generic data model of e-commerce Applicable to all types of intellectual property Specifications for supporting services Standardisation proposals Documentation at Led to: Contextual ontology architecture: contexts, roles, identities The framework doi>
Agent PlaceTime Resource Context EntityTypes An Entity may have typed relationships with Entities of any kind (including those of its own kind) EntityTypes An Entity may have typed relationships with Entities of any kind (including those of its own kind) AttributeTypes An Entity may have Attributes of any kind. (Attributes, which are a type of Resource, may have their own Attributes). AttributeTypes An Entity may have Attributes of any kind. (Attributes, which are a type of Resource, may have their own Attributes). Contextual Relationships Role Relator Descriptor Name Identifier Annotation Category Flag Quantity Attributes (illustrative: any Entity or Attribute may have Attributes of any type) Every Relationship has a Relator Verb Figure 1 COA MetaModelOverview Non Contextual Relationships (illustrative: any Type of Entity may relate to any other) Contextual ontology metamodel overview doi>
: Defining what is identified Many individual metadata schemes for specific sectors, applications, etc.; vary from simple to complex data models 1995+: Dublin Core: need for standardisation on WWW –15 (+) elements for output for simple resource description –Now ISO Ontology-based activities: –1995+ : Common Information System CIS (CISAC) – rights, music –1998: Functional Requirements of Bibliographic Records, FRBR (IFLA) – library cataloguing – : Interoperability of Data in E-Commerce Systems, indecs (multiple partners) – generic intellectual property For e-commerce read automation Influenced by CIS and FRBR –2000: ABC/Harmony – generic events-aware model –Should enable re-use of existing metadata doi>
Licensing Event UseEvent Permits (MAY) 1-n UseEvent Prohibits (MUST NOT) 0-n Payment Reporting Event etc Requires (MUST) 0-n Has Exception Has Precondition This structure allows for whatever level of flexibility or granularity may be required now or in the future. e.g. Terms of a Licence as a group of Events Event = time, place, entities doi>
Contextual Ontology usage examples ISO MPEG-21 Rights Data Dictionary ( DDEX Digital Data EXchange - music industry ( ONIX: Book industry (+) messaging schemas ( ) ONIX: Rights: ONIX for Licensing Terms, Repertoire and Distribution Digital Library Federation - communication of licence terms (ERMI: working with ONIX for licensing terms) DOI Data Dictionary ( ) Rightscoms OntologyX - licensee of early output, plus their own later work ( ) RDA (Resource Description and Access); next generation of AACR/MARC cataloguing – RDA/ONIX common framework ACAP: Automated Content Access Protocol ( ) Consistent with FRBR, ABC-Harmony, OWL, CIDOC CRM, etc doi>
OntologyX RightsCom (Mi3p etc) indecsDD IDF + ONIX Development of indecs Black = what Red = who indecs (2000) indecs Framework Ltd IFPI/RIAA, MPA, IDF, DentsuMMG, Rightscom CONTECS (2001+) 2004 ISO MPEG21 RDD IDF : Defining what is identified doi>
DOI name of one item may be related to DOI name of another –Through multiple resolution, metadata, Application Profiles… Example: A DOI name of a work could resolve to several available formats, languages, etc. doi> DOI names to express relationships Article DOI Name Chinese version DOI Name 56789
DRM: Technical Protection Measures which use RMI But: simple management WITHOUT technical protection also needs RMI What is being managed for any rights purpose has to be identified We need to accommodate existing and new identifier schemes A consistent approach to all kinds of inter-related entities is necessary: Rights: an example of DOI System potential People make Stuff use Deals about do identity management content management license management doi>
Describing rights using data Primary rights events (claims, deals) are described using pieces of data from all these domains: Rights Statement (claim): [party] owns [right] in [creation] in [time] and [place] Rights Agreement (deal): [party] agreed with [party] in [time] and [place] that [event] Pieces of "rights metadata" used in each rights statement are things which need to be identified doi>
Other pieces of data also need standard identifiers (time, party..) Describing rights using data Primary rights events (claims, deals) are described using pieces of data from all these domains: Rights Statement (claim): [party] owns [right] in [creation] in [time] and [place] Rights Agreement (deal): [party] agreed with [party] in [time] and [place] that [event] Creations typically have standard identifiers, which may have associated structured data, or which may act as keys to get this data doi>
Permission: [party] can [verb] [amount] to [creation] at [time] in [place]. Prohibition: [party] cant [verb] to [creation] at [time] in [place]. Requirement: [party] must [verb] [amount] to [creation/party] at [time] in [place]. Rights Transfer: [party] can [grant right] to [party] in [creation] at [time] in [place]. Secondary rights events (licences) are also described using pieces of data: Describing rights using data doi>
Describing rights using data Pieces of "rights metadata" used in each rights declaration Permission: [party] can [verb] [amount] to [creation] at [time] in [place]. Prohibition: [party] cant [verb] to [creation] at [time] in [place]. Requirement: [party] must [verb] [amount] to [creation/party] at [time] in [place]. Rights Transfer: [party] can [grant right] to [party] in [creation] at [time] in [place]. doi>
What are these pieces of "rights metadata"? A mix of data from many sources: 1 Rights events Statements, agreements, transfers, permissions, prohibitions, requirements, assertions, approvals… 2 Descriptive metadata Creations, creation types, contributor roles, user roles, tools, classifications, measures … Rights, persons, companies, intellectual property, jurisdictions … 3 Legal terms Terms, currencies, conventions… 4 Financial metadata These sets of rights metadata" are standardized and maintained in different places. doi>
This mix of data from many sources is used in many different places by different people in chains of rights events: Distributed rights management agreement transfer statement agreement permission prohibition permission assertion agreement requirement etc [party] can [verb] [amount] to [creation] at [time] in [place]. Compound entity can be expanded to reveal more data doi>
agreement transfer statement agreement permission prohibition permission assertion agreement requirement etc Each of these is an information object: which needs to be identified (and may be a compound object); which may need to link to or use information objects in other databases; which should be interoperable Distributed rights management doi>
DOI Data Model and interoperability Application profiles Kernel metadata Metadata declaration Role of DOI name metadata Origins of the DOI Data Model Semantic interoperability The indecs principles Applications of indecs The use of a data dictionary Example: rights management Summary doi>
Workshop on the DOI System DOI SYSTEM: DATA MODEL International DOI Foundation