Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University.

Similar presentations


Presentation on theme: "Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University."— Presentation transcript:

1 Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University

2 Overview of the Talk Digital Libraries for search & access Beyond Access: Adding value to digital content Information Network Overlay Architecture Implementing the Architecture

3 Digital Libraries – Ingest Focus

4 Input Phase Research Questions Indexing and search non-textual non-textual cross language cross languagePreservation Scale issues everything becomes hard at mega-scale everything becomes hard at mega-scaleOCR especially non-Roman especially non-RomanWorkflow getting stuff in cheaply/reliably getting stuff in cheaply/reliably Intellectual property hard enough at intra-national level hard enough at intra-national levelDescription Meatadata issues Meatadata issues

5 Digital Libraries – Federation Phase Z39.50 Dienst SDLIP OAI-PMH SRW/SRU

6 Federation Phase Research Questions Heterogeneity State Maintenance Reliability Network level Network level Management level Management levelRanking

7 We have been very successful!

8 So, are we done? The primary goal of digital libraries has been often been misconstrued as providing accessibility to a massive volume of resources. The real opportunity is to reestablish the library as a collaborative place where people learn from each other and organize around ideas and knowledge.

9 Opportunities: Not the same old information Flow Suppliers (Publishers) Intermediaries (Librarians) Consumers

10 …Towards a participatory information environment Shared Information Context Shared Information Context Producers Consumers Experts Novices Professionals

11 Data Information Knowledge Wisdom descriptionIPpreservationmodeling

12 Digital Libraries: Beyond Search and Access Build on foundation of near universal access Provide context for: Content aggregation: combining information entities in novel ways Content aggregation: combining information entities in novel ways Knowledge integration: capturing semantic relationships between information entities Knowledge integration: capturing semantic relationships between information entities Information reuse: allowing secondary, tertiary products Information reuse: allowing secondary, tertiary products Information transformation: combining information entities with computational services Information transformation: combining information entities with computational services collaboration and contribution: blurring the line between authors, publishers, users, experts… collaboration and contribution: blurring the line between authors, publishers, users, experts…

13 Information Foundation Value-add, customized Projections

14 NSDL Context

15 A bit of NSDL background Mission: “Improve Science, Math, Engineering education through digital libraries” Original NSDL solicitation in 1999 Over 180 projects funded Core integration (Columbia, Cornell, UCAR) charged with providing organizational, technical infrastructure Funding through 2006 http://www.nsdl.org

16 Users Collections Metadata repository Existing Metadata-Centric Approach Services The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL. OAI-PMH

17 Characteristics of the Metadata Repository Oracle database Qualified Dublin Core Item records with collection association OAI-PMH ingest and exposure Current collection ~ 800,000 Metadata quality issues

18 Problems in this approach Mere access does not equate to value Reeves Impact of Media and Technology in Schools Reeves Impact of Media and Technology in Schools Static metadata records don’t capture changing and multiple contexts of use and applicability Recker and Wiley Designing Instruction with Learning Objects Recker and Wiley Designing Instruction with Learning Objects Patterns of use, informal opinions, descriptions often more useful than taxonomic classification. Collis and Strijker Technology and Human Issues in Reusing Learning Collis and Strijker Technology and Human Issues in Reusing Learning

19 Requirements of a New Approach Represent (directly or by reference) multiple entities, standards standards taxonomies taxonomies agents (user profiles and roles) agents (user profiles and roles) curricula curricula that are contributed by multiple parties, users as actors users as actors reuse of primary resources for secondary, tertiary produces reuse of primary resources for secondary, tertiary produces that are inter-related to express context, applicability to standards applicability to standards usage in curricula usage in curricula usage patterns by particular groups/people usage patterns by particular groups/people and can be integrated with services and simulations

20 Some use cases Multi-sourced rich metadata (beyond DC) Provenance issues Provenance issues Annotations, comments, usage scenarios Quality control Quality control Relations to state standards, curricula Strand maps Strand maps Curricula, lesson plans Instructional architect, VUE Instructional architect, VUE

21 Information Network Overlay Data Stores Document Repositories Databases Web Resources Publisher Repositories Network API Source Layer Network Representation Layer Client Layer

22 Information Network Instance

23 Translate to Technical Requirements Rich information objects Integration of local and remote sources Integration of local and remote sources Mixed genre Mixed genre Dynamic information objects Integration with local and distributed services Integration with local and distributed services Graph-based information model Nodes are information objects Nodes are information objects Edges are relationships among those objects Edges are relationships among those objects Access and management API exposing full functionality for programmatic access exposing full functionality for programmatic access Fine granularity access management

24

25 Fedora History Cornell Research (1997-present) DARPA and NSF-funded research DARPA and NSF-funded research First reference implementation developed First reference implementation developed Distributed, Interoperable Repositories (experiments with CNRI) Distributed, Interoperable Repositories (experiments with CNRI) Policy Enforcement Policy Enforcement First Application (1999-2001) University of Virginia digital library prototype University of Virginia digital library prototype Technical implementation: adapted to web; RDBMS storage Technical implementation: adapted to web; RDBMS storage Scale/stress testing for 10,000,000 objects Scale/stress testing for 10,000,000 objects Open Source Software (2002-present) Andrew W. Mellon Foundation grants Andrew W. Mellon Foundation grants Technical implementation: XML and web services Technical implementation: XML and web services Fedora 1.0 (May 2003) Fedora 1.0 (May 2003) Fedora 2.0 (Jan 2005) Fedora 2.0 (Jan 2005)

26 Fedora Features Digital Object Model Container for content and metadata Container for content and metadata Aggregate local and remote content Aggregate local and remote content Associate behaviors with objects (integrate content and web services) Associate behaviors with objects (integrate content and web services)Relationships Define and query object-to-object relationships Define and query object-to-object relationships Repository web service Digital object storage Digital object storage Web service APIs (SOAP and REST) to manage, access, search Web service APIs (SOAP and REST) to manage, access, search

27 Objects, Representations, Relationships

28 Digital object identifier Reserved Datastreams Key object metadata Disseminators Pointers to service definitions to provide service-mediated views Datastreams Set of content or metadata items Fedora Digital Object Model Component View Persistent ID (PID) Dublin Core (DC) Datastream Audit Trail (AUDIT) Relations (RELS-EXT) Disseminator Default Disseminator

29 Simple Fedora model for aggregating static content Representations map to datastreams Datastreams may be local or surrogates (redirect) to remote data REST (or SOAP) URL’s provide uniform client access to representations

30 Simple Content Aggregation

31 Aggregating local and remote content

32 Dynamic Content Take advantage of computational services to process content Representations map to service-based transforms of static data Opaque at the access level (client sees only representations, not how they are produced) Motivating examples Canonical XML metadata format – XSLT to Dublin Core Canonical XML metadata format – XSLT to Dublin Core Document source in TeX, programmatic transform to PDF, PS, HTML, etc. Document source in TeX, programmatic transform to PDF, PS, HTML, etc. Linkage of data to analysis tools Linkage of data to analysis tools

33 Dynamic Representations

34 Expressing Relationships Between Objects Object-to-object Relationships Ontology of common relationships (RDF schema) Ontology of common relationships (RDF schema) Relationships stored in special datastream (RELS-EXT) Relationships stored in special datastream (RELS-EXT) Resource Index (RI) RDF-based index of repository (Kowari triple-store) RDF-based index of repository (Kowari triple-store) RI Search Powerful querying of graph of inter-related objects Powerful querying of graph of inter-related objects REST-based query interface (using RDQL or ITQL) REST-based query interface (using RDQL or ITQL) Can be used in dynamic disseminations Can be used in dynamic disseminations

35 Uses of Object Relationships Define collections (e.g., collection objects) Assert semantic relationships among objects Enable network overlay Surrogate objects referring to external entities Surrogate objects referring to external entities Assert relationships among them Assert relationships among them Assert other relationships (e.g., annotations) Assert other relationships (e.g., annotations)

36 Fedora Relationship Ontology (RDFS) isPartOf / hasPart isMemberOf / hasMember isDescriptionOf / hasDescription hasEquivalent … others

37 Deployment Plans Production release Phase 1 – July 2005 black box replacement for metadata repository black box replacement for metadata repository Future releases API available at public level API available at public level Relationship building Relationship building

38 Example 1 – Branding Provenance of Data and Metadata

39 Example 2 – Aggregations Semantic, Management, etc.

40 Some open questions Scalability of this model Management Control – trusted actors Cross-ontology relationships Exposing to the user - visualization

41 Concluding Goals Exploit the increasing ubiquity of digital content Provide the architecture for adding value to underlying content Aggregation Aggregation Reuse Reuse Integration with computational services Integration with computational services


Download ppt "Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University."

Similar presentations


Ads by Google