An Introduction to Data Modeling with Fedora Thorny Staples Fedora Commons, Inc.
Fedora Abstractions Data objects Content models Behaviors of objects Policies about objects Relationships among objects
DC Persistent ID RELS-EXT AUDIT n n Reserved Datastreams Custom Datastreams (any type, any number) A data object is one unit of content POLICY
Content Models Create classes of data objects Expressed as Cmodel objects A Cmodel object defines the number and types of data streams for objects of that class A Cmodel object binds to service objects to enable appropriate behaviors to be inherited by data objects
Optional Object Behaviors Data objects can have different views or transformations Sets of abstract behaviors that different kinds of objects can subscribe to Corresponding sets of services that specific objects can execute The business logic is hidden behind an abstraction
Persistent ID (PID) Service Definition Metadata SystemMetadata Datastreams Cmodel Object Persistent ID (PID) Service Binding Metadata (WSDL) SystemMetadata Datastreams Web Service service contract service subscription data contract Persistent ID (PID) RDF data Datastreams System Metadata Service Mechanism Object Service Definition Object Persistent ID (PID ) System Metadata Datastreams Data Objects
For example: Can also add parameter and a date-time stamp to access earlier versions of the object A behavior call is a URL that contains : Object PID + SDef Name + Method Name
Policies Machine enforceable expressions of rules, what they are applied to and who they affect. Who is affected can be defined in different authorization sources, such as LDAP services Rules can be as simple as “allow” or “deny”. Rules are applied to objects as a whole, any datastream, or a dissemination, as well as each API call and more.
Historical Census Object 1870 Aggregate Census file of the US A character data representation of the dataset is the master It has a datastream that is used to access an SQL database that is only accessible through the object The SQL data can always be rebuilt from the character data It has a DDI codebook which has descriptive and structural metadata about the object
Relationships Among Objects Describes adjacency relationships among objects RDF data of the form: PID – typeOfRelationship – relatedObjectPID Can used to assemble aggregations of objects
Content Modeling Styles Atomistic objects have a small number of datastreams that are each expressions of the whole, with relationships to other objects Many objects Much more flexible Compound objects usually include many datastreams, including information about the whole and its parts Forces a mixture semantics of whole and parts More difficult to take advantage of the Fedora abstractions cleanly
Book Objects XML file is the main datastream that represents the book as a whole Using the atomistic approach, a book with 400 pages would be 401 objects Using the compound approach, 1 object with as many as 1201 datastreams for image files and the book file Example
Objects Representing Aggregations Creating parent objects for complex resources Representing explicit collections Representing implicit collections Creating digital surrogates for physical entities
Explicit Aggregations The parent aggregation object has explicit references to the PIDs of its children These references can be relationships listed in the objects Rels-Ext datastream Or they can be PIDs embedded in an XML datastream that gives a more descriptive context and can explicitly order them Example
Implicit Aggregations One object that represents the aggregation as a whole –information about its meaning –Rules for how to find the members Any number of objects can assert an “isMemberOf” relationship to the PID of the aggregation
Collections can be expressed as implicit aggregations PID5 PID3 PID2 isMemberOfCollection Query PID1 isMemberOfCollection Collection Object Resource Index