Download presentation
Presentation is loading. Please wait.
Published byGeraldine Perry Modified over 8 years ago
1
Towards a Digital Library Theory: A Formal Digital Library Ontology Marcos André Gonçalves, Layne T. Watson, and Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, fox@vt.edu (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004)
2
Outline Background: The 5S Model Motivation for this Work Digital Library Formal Ontology Taxonomy of DL Services Applications of the Theory Conclusions and Future Work
3
Background: The 5S Model Why 5S? DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc. DL construction: difficult, ad-hoc, lacking support for tailoring/customization Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. Lack of specific DL models, formalisms, languages
4
Background: The 5S Model Informally, DLs can be defined as complex information systems that: help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) (re)present info in usable ways (spaces) communicate info with users (streams)
5
Background: The 5S Model
6
Background: 5S and DL formal definitions and compositions (April 2004 TOIS)
7
Background: The 5S Model Summary of TOIS 2004 Formal Definitions: A digital library is a 10-tuple (Streams, Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv, Soc) in which Streams is a set of streams, which are sequences of arbitrary types (e.g., bits, characters, pixels, frames); Structs is a set of structures, which are tuples, (G, ), where G= (V, E) is a directed graph and : (V E) L is a labeling function; Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.
8
Background: The 5S Model Scs = {sc 1, sc 2, …, sc d } is a set of scenarios where each sc k = is a sequence of events that also can have a number of parameters {p ik }. Events represent changes in computational states; parameters represent specific locations in a state and respective values. St2 is a set of functions : V Streams ( ) that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream. Coll = {C 1, C 2, …, C f } is a set of DL collections where each DL collection C k = {do 1k, do 2k, …, do f_kk } is a set of digital objects. Each digital object do k = (h k, Stm 1k, Stt 2k, k ) is a tuple where Stm 1k Streams, Stt 2k Structs, k St2, and hk is a handle which represents a unique identifier for the object.
9
Background: The 5S Model Cat = {DM C_1, DM C_2, …, DM C_f } is a set of metadata catalogs for Coll where each metadata catalog DM C_k = {(h, mss hk )}, and mss hk = {ms hk1, ms hk2, …, ms hkn_hk } is a set of descriptive metadata specifications. Each descriptive metadata specification ms hki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes. A repository Rep = {(C i, DM C_i )} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exist operations to manipulate the family of pairs (e.g., get, store, delete). Serv = {Se 1, Se 2, …, Se s } is a set of services where each service Se k = {sc 1k,.., sc s_kk } is described by a set of related scenarios. Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm 1, sm 2, …, sm j }, and Ac = {ac 1, ac 2, …, ac r } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. Being basically an electronic entity, a member sm k of SM distinguishes itself from actors by defining or implementing a set of operations {op 1k, op 2k, …, op nk } sm k
10
Background
11
Motivation Previous definitions emphasize syntactic aspects, i.e., how digital library concepts are composed or built from previously defined concepts. Complete a formal DL theory by: Making explicit the implicit relationships that exist among the DL formal concepts defined in [Gonc04] Providing set of axiomatic rules that precisely define and constrain the semantics of the relationships Categorizing and classifying DL services on the basis of the ontology Research questions How should DL services be built from the other DL components Which are the fundamental and elementary DL services ? How can services be built/composed from other DL services? We will explore semantic relations and rules of the DL domain by using ontologies.
12
Digital Library Formal Ontology An ontology is a tuple = (Ontol_Concepts, Ontol_Rels) where: Ontol_Concepts is a family of ontological concepts, Ontol_Rels is a family of relations. Relations in Ontol_Rels are operationally realized by one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation. Ontol_Rules is a family of rules of a particular ontology.
13
Relationships Intra-Model Video contains Audio (MM) Metadata Catalog describes Collection (LIS) Probabilistic Space is_a Measure Space Service extends Service (reuse) Service Manager inherits_from Service Manager (OO) Inter-Model Event executes Operation Actor participates_in Scenario Service Manager runs Service Service employs/produces Streams Structures Spaces Digital Library Formal Ontology
14
Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e = event. Relations: contains Sc e Symbolic Rule. x, y (x contains y Sc(x) e(y) j: (j x.Dom y = x(j)) ) precedes e e Sc; happens_before e e Sc Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x z contains y x = z(i) y=z(j) i + 1 = j)) Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z contains x z contains y x = z(i) y=z(j) i < j)) includes Se Se Sc Sc; extends Se Se Sc Sc Symbolic Rule 1. x, y (x includes y Sc(x) Sc(y) ( z: e(z) y contains z x contains z) ( p, q: e(p) e(q) p precedesy q p precedesx q)) Symbolic Rule 2. x, y (x extends y Sc(x) Sc(y) ( z: e(z) y contains z x contains z) ( p, q: e(p) e(q) p happens_beforey q p happens_beforex q)) Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))
15
Digital Library Formal Ontology
16
Consistency Rules Catalog-Collection A complete catalog has at least one set of metadata specifications for each digital object in the collection it describes (surjective partial function). In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function). Scenarios-Society A scenario x is consistent with regards to a set of service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.
17
Digital Library Formal Ontology Characterizing employs/produces relationships In the table each service is characterized by parameters (input, output) of the initial and final events of the scenarios that compose those services All other previous definitions and keys apply here. That set is complemented with the following definitions:
18
Services Related Definitions A query q is the representation of user interest or information need. Hyptxt is an hypertext; wherein an anchor is a node. A log_entry is a descriptive metadata specification about an event of a scenario. Let {do i } = {do i1, do i2,…, do in } be a set of digital objects and C t = {c 1, c 2,…,c n } be a set of labels for categories. A classifier class Ct : {do i } 2 Ct is a function that maps a digital object to a set of categories. A cluster clu k = {do 1k, do 2k, …, do nk } is a subset of a set of digital objects.
19
ServiceUser input Other Service Input Output Acquiring{do i }CiCi CjCj BrowsinganchorHyptxt k {do i } Catalogingdo i, ms i_k (h i, mss i_m )(h i, mss i_(m+k) ) Classifyingdo i class Ct (do i, {c k_i }) Clustering{do i }X{clu k_i } Expanding (query){do i }I C_i, q i qj qj IndexingCiCi noneI C_i Linkingdo i Hyptxt k Hyptxt ik Loggingnonee i ({p i })log_entry i Ratingdo i,ac j none{(do i,ac j,r k )} Searchingq, C i I C_i {do k } Visualizing{do i }tfr k sp ik
20
Infrastructure Services: dealing with basic concepts such as collections and catalogs Repository-Building: create collections (digital objects) and/or catalogs (metadata specifications). Preservational: generate instances by copying collections (digital objects) or transforming (converting/translating) objects into different formats for preservation purposes Add_Value: either aggregate value/information to collections (digital objects) or connect objects together. Information Satisfaction: dealing with higher level societal requirements KEY in next slide: Fundamental: minimal set of services or essential to existence of a DL Composite DL service: takes input from some other service; otherwise the service is called elementary. Applications: A Taxonomy of DL Services
22
Application: A Taxonomy of DL Services
23
DL Services I/O Behavior Regarding the prior figure, which shows: Instantiations of the “Services Definition” model Inputs and outputs of examples of infrastructure and information satisfaction DL services Key: C DL = Collection I CDL = index for collection CDL {do i } = digital object Soc = Society
24
Applications: A Taxonomy of DL Services
25
Application: Defining Quality in Digital Libraries Formal theory can help to define “what’s a good digital library” by: Formally defining metrics of quality for each formal concept (and relationships) Helping defining and applying numerical measures to these metrics Consider this in the Information Life Cycle
27
Defining Quality in Digital Libraries
28
Metadata specifications and metadata format - completeness Completeness of metadata specifications refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not. Metric Completeness(ms x ) = 1 - (no. of missing attributes in ms x / total attributes of the schema to which ms x conforms)
29
Defining Quality in Digital Libraries Metadata specifications and metadata format - completeness OCLC NDLTD Union Catalog
30
Defining Quality in Digital Libraries Services - Extensibility and Reusability A service Y reuses a service X if the behavior of Y incorporates the behavior of X. A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events. Metrics Macro-Reusability(Serv) = ( reused(se i ), se i Serv)/ |Serv|, where reused is a 1, if sm j, se j reuses s i ; 0, otherwise. Micro-Reusability(Serv) = ( LOC(sm x ) * reused(se i ), sm x SM, se i Serv, se x runs se i )/ | LOC(sm), sm SM|, where LOC corresponds to the number of lines of code of a service manager
31
Defining Quality in Digital Libraries Services - Extensibility and Reusability Macro-Reusability = 3/16 = 0.187 Micro-Reusability = 3630 / 11910 = 0.304
32
Application: Re-engineering a DL Specification Language 5SL: Specification Language Reengineering Using the relationships to redefine/reorganize the semantics and organization of the XML elements within the several sections of the DL specification
33
Re-engineering a DL Specification Language
35
5SLGen: Automatic DL Generation
36
Conclusions and Future Work Presented a DL formal ontology which specifies the semantics of the relationships among the DL concepts therefore completing a theory for DLs Applied the resulting ontology to: Define a taxonomy of DL services Create a Quality Model for DLs Re-engineer a DL specification language
37
Conclusions and Future Work Future Work Include: Including Pre- and Post-Conditions in the Service Behavior Analysis New Applications of the Model/theory New Design and Generation Tools Quality tools Modeling Complex Heterogeneous/Integrated Systems Archaeology (ETANA) Develop theorems and proofs Writing books…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.