Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
A. Grigorov, A. Georgiev, M. Petrov, S. Varbanov, K. Stefanov Building a Knowledge Repository for Life-long Competence Development.
Repositories: Disruptive Technology or Disrupted Technology? Sandy Payette, Executive Director DORSDL Workshop at ECDL 2008 September 2008.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Fedora Commons: Introduction and Update Swedish National Library June 24, 2008.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science.
Update on the Fedora Project Where we’ve been and where we’re going Fedora Users Conference Rutgers University May Sandy Payette Co-Director.
Representing and Storing Complex Digital Objects Fedora CS 431 – April 11, 2005 Carl Lagoze – Cornell University Acknowledgements: Sandy Payette (Cornell)
UKOLN is supported by: A non-technical introduction to: OAI-ORE ( Defining Image Access project meeting.
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.
The Fedora Project Where we’ve been and where we’re going Mellon OS Retreat March 2005 Sandy Payette Cornell University.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Update on the Fedora Project Common Solutions Group September 2005 Tim Sigmon University of Virginia Special thanks to the Fedora Team for these slides!
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
Fourteenth International Conference on Grey Literature - Rome, 2012 Alessia Bardi, Sandro La Bruzzo, Paolo Manghi
Tutorial – Semantic Digital Libraries, May 9, 2007 WWW 2007 Copyright , DERI NUI Galway, University of Vienna, Fraunhofer IPSI, Cornell University.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Fedora Commons Overview and Future Plans Sandy Payette, Executive Director Cornell University Library Metadata Working Group June 13, 2008.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
Marc Santos / Ingo Dahn University Koblenz-Landau, Knowledge Media Institute Koblenz.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science NOTE: CSG
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Core Integration Web Services Dean Krafft, Cornell University
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Fedora Content Modeling for Improved Services for Research Databases Open Repositories 2009 Mikael Karstensen Elbæk Alfred Heller Gert Schmeltz Pedersen.
NSDL & Access Management David Millman Columbia University Jan ‘02.
The Technical Infrastructure of the NSDL Dean Krafft, Cornell University
DSpace - Digital Library Software
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The Mellon-Funded Fedora Project A Presentation to the European Digital Library Conference September 17, 2002 Sandy Payette and Thornton Staples.
Fedora An Architecture for Complex Objects and their Relationships Old Dominion University, VA April 7, 2005 Sandy Payette Cornell University.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
An Introduction to Data Modeling with Fedora Thorny Staples Fedora Commons, Inc.
The Fedora Project March 10, 2003
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
Overview: Fedora Architecture and Software Features
NSDL: OAI and a large-scale digital library
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
VI-SEEM Data Repository
PREMIS Tools and Services
NSDL Data Repository (NDR)
Malte Dreyer – Matthias Razum
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
The Fedora Project April 28-29, 2003 CNI, Washington DC
Presentation transcript:

Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University

Overview of the Talk Digital Libraries for search & access Beyond Access: Adding value to digital content Information Network Overlay Architecture Implementing the Architecture

Digital Libraries – Ingest Focus

Input Phase Research Questions Indexing and search non-textual non-textual cross language cross languagePreservation Scale issues everything becomes hard at mega-scale everything becomes hard at mega-scaleOCR especially non-Roman especially non-RomanWorkflow getting stuff in cheaply/reliably getting stuff in cheaply/reliably Intellectual property hard enough at intra-national level hard enough at intra-national levelDescription Meatadata issues Meatadata issues

Digital Libraries – Federation Phase Z39.50 Dienst SDLIP OAI-PMH SRW/SRU

Federation Phase Research Questions Heterogeneity State Maintenance Reliability Network level Network level Management level Management levelRanking

We have been very successful!

So, are we done? The primary goal of digital libraries has been often been misconstrued as providing accessibility to a massive volume of resources. The real opportunity is to reestablish the library as a collaborative place where people learn from each other and organize around ideas and knowledge.

Opportunities: Not the same old information Flow Suppliers (Publishers) Intermediaries (Librarians) Consumers

…Towards a participatory information environment Shared Information Context Shared Information Context Producers Consumers Experts Novices Professionals

Data Information Knowledge Wisdom descriptionIPpreservationmodeling

Digital Libraries: Beyond Search and Access Build on foundation of near universal access Provide context for: Content aggregation: combining information entities in novel ways Content aggregation: combining information entities in novel ways Knowledge integration: capturing semantic relationships between information entities Knowledge integration: capturing semantic relationships between information entities Information reuse: allowing secondary, tertiary products Information reuse: allowing secondary, tertiary products Information transformation: combining information entities with computational services Information transformation: combining information entities with computational services collaboration and contribution: blurring the line between authors, publishers, users, experts… collaboration and contribution: blurring the line between authors, publishers, users, experts…

Information Foundation Value-add, customized Projections

NSDL Context

A bit of NSDL background Mission: “Improve Science, Math, Engineering education through digital libraries” Original NSDL solicitation in 1999 Over 180 projects funded Core integration (Columbia, Cornell, UCAR) charged with providing organizational, technical infrastructure Funding through

Users Collections Metadata repository Existing Metadata-Centric Approach Services The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL. OAI-PMH

Characteristics of the Metadata Repository Oracle database Qualified Dublin Core Item records with collection association OAI-PMH ingest and exposure Current collection ~ 800,000 Metadata quality issues

Problems in this approach Mere access does not equate to value Reeves Impact of Media and Technology in Schools Reeves Impact of Media and Technology in Schools Static metadata records don’t capture changing and multiple contexts of use and applicability Recker and Wiley Designing Instruction with Learning Objects Recker and Wiley Designing Instruction with Learning Objects Patterns of use, informal opinions, descriptions often more useful than taxonomic classification. Collis and Strijker Technology and Human Issues in Reusing Learning Collis and Strijker Technology and Human Issues in Reusing Learning

Requirements of a New Approach Represent (directly or by reference) multiple entities, standards standards taxonomies taxonomies agents (user profiles and roles) agents (user profiles and roles) curricula curricula that are contributed by multiple parties, users as actors users as actors reuse of primary resources for secondary, tertiary produces reuse of primary resources for secondary, tertiary produces that are inter-related to express context, applicability to standards applicability to standards usage in curricula usage in curricula usage patterns by particular groups/people usage patterns by particular groups/people and can be integrated with services and simulations

Some use cases Multi-sourced rich metadata (beyond DC) Provenance issues Provenance issues Annotations, comments, usage scenarios Quality control Quality control Relations to state standards, curricula Strand maps Strand maps Curricula, lesson plans Instructional architect, VUE Instructional architect, VUE

Information Network Overlay Data Stores Document Repositories Databases Web Resources Publisher Repositories Network API Source Layer Network Representation Layer Client Layer

Information Network Instance

Translate to Technical Requirements Rich information objects Integration of local and remote sources Integration of local and remote sources Mixed genre Mixed genre Dynamic information objects Integration with local and distributed services Integration with local and distributed services Graph-based information model Nodes are information objects Nodes are information objects Edges are relationships among those objects Edges are relationships among those objects Access and management API exposing full functionality for programmatic access exposing full functionality for programmatic access Fine granularity access management

Fedora History Cornell Research (1997-present) DARPA and NSF-funded research DARPA and NSF-funded research First reference implementation developed First reference implementation developed Distributed, Interoperable Repositories (experiments with CNRI) Distributed, Interoperable Repositories (experiments with CNRI) Policy Enforcement Policy Enforcement First Application ( ) University of Virginia digital library prototype University of Virginia digital library prototype Technical implementation: adapted to web; RDBMS storage Technical implementation: adapted to web; RDBMS storage Scale/stress testing for 10,000,000 objects Scale/stress testing for 10,000,000 objects Open Source Software (2002-present) Andrew W. Mellon Foundation grants Andrew W. Mellon Foundation grants Technical implementation: XML and web services Technical implementation: XML and web services Fedora 1.0 (May 2003) Fedora 1.0 (May 2003) Fedora 2.0 (Jan 2005) Fedora 2.0 (Jan 2005)

Fedora Features Digital Object Model Container for content and metadata Container for content and metadata Aggregate local and remote content Aggregate local and remote content Associate behaviors with objects (integrate content and web services) Associate behaviors with objects (integrate content and web services)Relationships Define and query object-to-object relationships Define and query object-to-object relationships Repository web service Digital object storage Digital object storage Web service APIs (SOAP and REST) to manage, access, search Web service APIs (SOAP and REST) to manage, access, search

Objects, Representations, Relationships

Digital object identifier Reserved Datastreams Key object metadata Disseminators Pointers to service definitions to provide service-mediated views Datastreams Set of content or metadata items Fedora Digital Object Model Component View Persistent ID (PID) Dublin Core (DC) Datastream Audit Trail (AUDIT) Relations (RELS-EXT) Disseminator Default Disseminator

Simple Fedora model for aggregating static content Representations map to datastreams Datastreams may be local or surrogates (redirect) to remote data REST (or SOAP) URL’s provide uniform client access to representations

Simple Content Aggregation

Aggregating local and remote content

Dynamic Content Take advantage of computational services to process content Representations map to service-based transforms of static data Opaque at the access level (client sees only representations, not how they are produced) Motivating examples Canonical XML metadata format – XSLT to Dublin Core Canonical XML metadata format – XSLT to Dublin Core Document source in TeX, programmatic transform to PDF, PS, HTML, etc. Document source in TeX, programmatic transform to PDF, PS, HTML, etc. Linkage of data to analysis tools Linkage of data to analysis tools

Dynamic Representations

Expressing Relationships Between Objects Object-to-object Relationships Ontology of common relationships (RDF schema) Ontology of common relationships (RDF schema) Relationships stored in special datastream (RELS-EXT) Relationships stored in special datastream (RELS-EXT) Resource Index (RI) RDF-based index of repository (Kowari triple-store) RDF-based index of repository (Kowari triple-store) RI Search Powerful querying of graph of inter-related objects Powerful querying of graph of inter-related objects REST-based query interface (using RDQL or ITQL) REST-based query interface (using RDQL or ITQL) Can be used in dynamic disseminations Can be used in dynamic disseminations

Uses of Object Relationships Define collections (e.g., collection objects) Assert semantic relationships among objects Enable network overlay Surrogate objects referring to external entities Surrogate objects referring to external entities Assert relationships among them Assert relationships among them Assert other relationships (e.g., annotations) Assert other relationships (e.g., annotations)

Fedora Relationship Ontology (RDFS) isPartOf / hasPart isMemberOf / hasMember isDescriptionOf / hasDescription hasEquivalent … others

Deployment Plans Production release Phase 1 – July 2005 black box replacement for metadata repository black box replacement for metadata repository Future releases API available at public level API available at public level Relationship building Relationship building

Example 1 – Branding Provenance of Data and Metadata

Example 2 – Aggregations Semantic, Management, etc.

Some open questions Scalability of this model Management Control – trusted actors Cross-ontology relationships Exposing to the user - visualization

Concluding Goals Exploit the increasing ubiquity of digital content Provide the architecture for adding value to underlying content Aggregation Aggregation Reuse Reuse Integration with computational services Integration with computational services