Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building a National Science Digital Library on Fedora Dean Krafft, Cornell University

Similar presentations


Presentation on theme: "Building a National Science Digital Library on Fedora Dean Krafft, Cornell University"— Presentation transcript:

1 Building a National Science Digital Library on Fedora Dean Krafft, Cornell University dean@cs.cornell.edu

2 Building NSDL on Fedora Structure of the talk:  The Fedora-based NSDL Data Repository (NDR) and NSDL 2.0  Scaling Challenges  Inspiring Contribution and Collaboration - ExpertVoices  Other NSDL 2.0 Services and Tools  Q&A

3 What is the NSDL?  An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education  A digital library describing over a million carefully selected online STEM resources from over 100 collections (at http://nsdl.org)  A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees  A large community of researchers, librarians, content providers, developers, students, and teachers

4

5 Going beyond the card catalog  NSDL 1.0: Metadata Repository – Oracle-based union catalog of metadata records aggregated with OAI-PMH  NSDL 2.0: A library that guides not just resource discovery, but resource selection, use, and contribution  Supports creating “context” for resources  Presents resources in context: in a lesson plan; with ratings; correlated with education standards  Supports creating a permanent archive of resources  Enables community tools for structuring, evaluation, annotation, contribution, collaboration  Goal: Create a dynamic, living library

6 NSDL 2.0: NSDL Data Repository  Goals:  Architecture of participation: service-based, not a monolithic application/single user experience  Remixable data sources and data transformations  Harnessing (and capturing) collective intelligence  A free market of millions of inter-related resources (create the “long tail”)  Two-way data flow: NSDL ↔ users  Solution: Fedora-based NSDL Data Repository

7 Implementing the NDR with Fedora  Multiple Object Types:  Resources (with local or remote content)  Metadata  Aggregations (collections)  Metadata Providers (branding)  Agents  RDF relationships that use the Fedora Resource Index to support arbitrary graph queries:  Structural (part of)  Equivalence  Annotation

8

9 Draft NDR API Characteristics  Uses REST calls for all interactions; uses handles (DOIs) for all external references  Ensures external applications can’t violate the NDR model constraints  Disseminations allow combining metadata from multiple sources, or related content  Authentication: Requests signed with private key associated with an agent  Authorization: Agent can become a metadata provider or aggregator; can create resources

10 NDR Architecture

11 An Information Network Overlay  Think of the NDR as a lens for viewing science content on the net  Content can be:  Local: stored directly in the NDR  Remote: accessed through a URL  Computed: derived from a database or web service  Archived: an older version stored at SDSC  It all has a repository-based URL

12 Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on the Web

13 Scaling Challenges “You can tell the pioneers by the arrows in their backs”

14 The Resource Index  First application to build a large-scale triplestore  Initial NDR design (Sept. ’05):  438 triples per object, total of ~600 million  Kowari only tested to 200 million triples  Redesigned to approx 70 triples/object  Kowari challenges:  Memory mapped, requires 64-bit addressing  Memory leak in Kowari, fixed by Chris Wilper  RI corruption problems – implemented fixes, instituted best practices, monitoring, and verification before backup

15 Loading the repository  Fedora initially optimized for quick access (i.e. load/modify not so optimal)  Initial test load of repository (roughly ½ size):  over 875,000 metadata records  over 2 million digital objects  over 163 million RDF triples (lots)  Initial test load last December took weeks – it got slower as the repository got bigger

16 More Scaling Challenges  OAI provider worked fine for small repositories, but initial queries didn’t scale – redesigned queries  Fedora buffers RI updates, flush very expensive – redesigned API, working on Fedora solution that peeks in buffer  Sockets weren’t being closed quickly enough – fixed  Initial modify times 26 sec/object - fixed

17 The Good News  The NDR has intercepted most of the scaling arrows  Many updates to add multi-threading, fix threading/concurrency problems  Every Fedora API-M operation tuned to <100ms  Result: Overall performance has improved 1-2 orders of magnitude for many NDR operations  Fedora journaling system to support redundant servers nearly complete  The Fedora team has been highly responsive to every single NDR issue

18 How should we use the NDR?  The NDR provides powerful capabilities for:  Creating context around resources  Enabling the NSDL community to directly contribute resources and context  Representing a web of relationships among science resources and information about those resources  How do we use it? Here’s one specific example …

19 ExpertVoices

20 What is Expert Voices?  A system using blogging technology to:  Support STEM conversations among scientists, teachers and students  Tie NSDL resources to real-world science news  Create context for resources to enhance discovery, selection and use  Enable NSDL community members to become NSDL contributors: of resources, questions, reviews, annotations, and metadata

21 Broadening Participation: An Expert Voices Learning Scenario  “Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL resources  Expert creates an entry for Hurricane Gertrude  “On track to hit Ft. Lauderdale in 72 hours”  “Currently undergoing eyewall replacement cycle”  “Expecting 15 foot storm surge”  Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain site (storm surge context)  Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers  Students experience engaging real-time, real-world applications of science lessons

22 Expert Voices Implementation  Initial blog system is multi-user WordPress  WordPress plug-ins provide NDR integration and Shibboleth authentication  Publication of blog entry creates:  Content, as a new resource with simple metadata  New NDR resources included in entry  New metadata for any referenced resources in content  Graph of relationships between entry and all referenced resources  Blog available as independent RSS feed

23 NDR Entry for Expert Voices Blog Entry New Metadata New Audience MD Referenced New Resource 1 Referenced Existing Resource 2 Annotates Metadata for Member of Metadata Provider Metadata Provider Existing Collection Topic- based Blog Member of Inferred relationship between resources

24

25 But Expert Voices is just the beginning…

26 NDR Application: OnRamp  A multi-user, multi-project content management system  Built on Fedora – content objects can transition to become NDR resources  Decentralized workflow for the creation and distribution of both simple and complex content – possible first step in general Fedora workflow system  Disseminates content in multiple publication and online forms  Delivery estimated 3Q06

27

28 NDR Application: Integrated Wiki  Community of approved contributors (e.g. teachers, librarians, scientists) are granted edit access on OpenNSDL wiki  New resources and metadata are created as wiki pages and reflected into the NDR  Non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking  User and project pages organize NDR resources

29

30 Other applications in development  Automated grade-level assignment based on vocabulary analysis (SDSC)  Educational Standards assignment (Syracuse)  iVia-based Expert-Guided crawl: Tool for Pathways and others to turn websites into resource collections (UC Riverside)  Automated subject assignment (UC Riverside)  MyNSDL: Bookmark and tag STEM resources within and outside the NSDL (Cornell)

31 … NSDL 2.0 Ecosystem Protocol: OAI-PMH HTTP REST NDR API STEM Collections Search Service Archive Service Fedora- based NDR

32 Summary  Fedora and all its capabilities were essential to the creation of NSDL 2.0: a digital library that allows scientists, teachers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL.  The NDR demonstrates that Fedora is a powerful and flexible tool that can scale to a complex repository with millions of dynamic objects.

33 Acknowledgements  NSDL NSF Program Officers  Lee Zia  David McArthur  NSDL Core Integration Team  UCAR: Kaye Howe, PI and Executive Director  Cornell: Dean Krafft, PI  Columbia: Kate Wittenberg, PI  Fedora Development Team  Cornell: Sandy Payette & Carl Lagoze  Univ. of Virginia: Thornton Staples

34 Questions?

35 Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY 14850 USA dean@cs.cornell.edu This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.http://creativecommons.org/licenses/by-nd/2.5/


Download ppt "Building a National Science Digital Library on Fedora Dean Krafft, Cornell University"

Similar presentations


Ads by Google