Download presentation
Presentation is loading. Please wait.
Published byChester Neal Modified over 9 years ago
1
Building a National Science Digital Library on Fedora Dean Krafft, Cornell University dean@cs.cornell.edu
2
Building NSDL on Fedora Structure of the talk: The Fedora-based NSDL Data Repository (NDR) and NSDL 2.0 Scaling Challenges Inspiring Contribution and Collaboration - ExpertVoices Other NSDL 2.0 Services and Tools Q&A
3
What is the NSDL? An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education A digital library describing over a million carefully selected online STEM resources from over 100 collections (at http://nsdl.org) A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees A large community of researchers, librarians, content providers, developers, students, and teachers
5
Going beyond the card catalog NSDL 1.0: Metadata Repository – Oracle-based union catalog of metadata records aggregated with OAI-PMH NSDL 2.0: A library that guides not just resource discovery, but resource selection, use, and contribution Supports creating “context” for resources Presents resources in context: in a lesson plan; with ratings; correlated with education standards Supports creating a permanent archive of resources Enables community tools for structuring, evaluation, annotation, contribution, collaboration Goal: Create a dynamic, living library
6
NSDL 2.0: NSDL Data Repository Goals: Architecture of participation: service-based, not a monolithic application/single user experience Remixable data sources and data transformations Harnessing (and capturing) collective intelligence A free market of millions of inter-related resources (create the “long tail”) Two-way data flow: NSDL ↔ users Solution: Fedora-based NSDL Data Repository
7
Implementing the NDR with Fedora Multiple Object Types: Resources (with local or remote content) Metadata Aggregations (collections) Metadata Providers (branding) Agents RDF relationships that use the Fedora Resource Index to support arbitrary graph queries: Structural (part of) Equivalence Annotation
9
Draft NDR API Characteristics Uses REST calls for all interactions; uses handles (DOIs) for all external references Ensures external applications can’t violate the NDR model constraints Disseminations allow combining metadata from multiple sources, or related content Authentication: Requests signed with private key associated with an agent Authorization: Agent can become a metadata provider or aggregator; can create resources
10
NDR Architecture
11
An Information Network Overlay Think of the NDR as a lens for viewing science content on the net Content can be: Local: stored directly in the NDR Remote: accessed through a URL Computed: derived from a database or web service Archived: an older version stored at SDSC It all has a repository-based URL
12
Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on the Web
13
Scaling Challenges “You can tell the pioneers by the arrows in their backs”
14
The Resource Index First application to build a large-scale triplestore Initial NDR design (Sept. ’05): 438 triples per object, total of ~600 million Kowari only tested to 200 million triples Redesigned to approx 70 triples/object Kowari challenges: Memory mapped, requires 64-bit addressing Memory leak in Kowari, fixed by Chris Wilper RI corruption problems – implemented fixes, instituted best practices, monitoring, and verification before backup
15
Loading the repository Fedora initially optimized for quick access (i.e. load/modify not so optimal) Initial test load of repository (roughly ½ size): over 875,000 metadata records over 2 million digital objects over 163 million RDF triples (lots) Initial test load last December took weeks – it got slower as the repository got bigger
16
More Scaling Challenges OAI provider worked fine for small repositories, but initial queries didn’t scale – redesigned queries Fedora buffers RI updates, flush very expensive – redesigned API, working on Fedora solution that peeks in buffer Sockets weren’t being closed quickly enough – fixed Initial modify times 26 sec/object - fixed
17
The Good News The NDR has intercepted most of the scaling arrows Many updates to add multi-threading, fix threading/concurrency problems Every Fedora API-M operation tuned to <100ms Result: Overall performance has improved 1-2 orders of magnitude for many NDR operations Fedora journaling system to support redundant servers nearly complete The Fedora team has been highly responsive to every single NDR issue
18
How should we use the NDR? The NDR provides powerful capabilities for: Creating context around resources Enabling the NSDL community to directly contribute resources and context Representing a web of relationships among science resources and information about those resources How do we use it? Here’s one specific example …
19
ExpertVoices
20
What is Expert Voices? A system using blogging technology to: Support STEM conversations among scientists, teachers and students Tie NSDL resources to real-world science news Create context for resources to enhance discovery, selection and use Enable NSDL community members to become NSDL contributors: of resources, questions, reviews, annotations, and metadata
21
Broadening Participation: An Expert Voices Learning Scenario “Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL resources Expert creates an entry for Hurricane Gertrude “On track to hit Ft. Lauderdale in 72 hours” “Currently undergoing eyewall replacement cycle” “Expecting 15 foot storm surge” Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain site (storm surge context) Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers Students experience engaging real-time, real-world applications of science lessons
22
Expert Voices Implementation Initial blog system is multi-user WordPress WordPress plug-ins provide NDR integration and Shibboleth authentication Publication of blog entry creates: Content, as a new resource with simple metadata New NDR resources included in entry New metadata for any referenced resources in content Graph of relationships between entry and all referenced resources Blog available as independent RSS feed
23
NDR Entry for Expert Voices Blog Entry New Metadata New Audience MD Referenced New Resource 1 Referenced Existing Resource 2 Annotates Metadata for Member of Metadata Provider Metadata Provider Existing Collection Topic- based Blog Member of Inferred relationship between resources
25
But Expert Voices is just the beginning…
26
NDR Application: OnRamp A multi-user, multi-project content management system Built on Fedora – content objects can transition to become NDR resources Decentralized workflow for the creation and distribution of both simple and complex content – possible first step in general Fedora workflow system Disseminates content in multiple publication and online forms Delivery estimated 3Q06
28
NDR Application: Integrated Wiki Community of approved contributors (e.g. teachers, librarians, scientists) are granted edit access on OpenNSDL wiki New resources and metadata are created as wiki pages and reflected into the NDR Non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking User and project pages organize NDR resources
30
Other applications in development Automated grade-level assignment based on vocabulary analysis (SDSC) Educational Standards assignment (Syracuse) iVia-based Expert-Guided crawl: Tool for Pathways and others to turn websites into resource collections (UC Riverside) Automated subject assignment (UC Riverside) MyNSDL: Bookmark and tag STEM resources within and outside the NSDL (Cornell)
31
… NSDL 2.0 Ecosystem Protocol: OAI-PMH HTTP REST NDR API STEM Collections Search Service Archive Service Fedora- based NDR
32
Summary Fedora and all its capabilities were essential to the creation of NSDL 2.0: a digital library that allows scientists, teachers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL. The NDR demonstrates that Fedora is a powerful and flexible tool that can scale to a complex repository with millions of dynamic objects.
33
Acknowledgements NSDL NSF Program Officers Lee Zia David McArthur NSDL Core Integration Team UCAR: Kaye Howe, PI and Executive Director Cornell: Dean Krafft, PI Columbia: Kate Wittenberg, PI Fedora Development Team Cornell: Sandy Payette & Carl Lagoze Univ. of Virginia: Thornton Staples
34
Questions?
35
Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY 14850 USA dean@cs.cornell.edu This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.http://creativecommons.org/licenses/by-nd/2.5/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.