Building a National Science Digital Library on Fedora Dean Krafft, Cornell University

Slides:



Advertisements
Similar presentations
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Advertisements

JMS messaging service  All write-only Fedora operations are published to subscribed clients  Messaging system can be durable – if client/consumer/subscriber.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Blogs, Wikis and Social Software Chris Peters Information Technology Specialist Library Development Washington State Library
Project Prism Virtual Remote Control: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2002.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
The Subject Librarian's Role in Building Digital Collections: Where Information Management and Subject Expertise Meet Ruth Vondracek Oregon State University.
Library as Community: reviews, ratings, feeds and the future Dinah Sanders, Product Manager.
What is a blog? “Web log” In simple terms, a blog is a web page where what you write goes in chronological order on the front page Author can write, viewers.
Open Repositories 2008 The NCore Platform: An Open-Source Suite of Tools and Services for Implementing Digital Libraries Dean B. Krafft Cornell University.
Building a National Science Digital Library Dean Krafft, Cornell University
NDR (resource references, metadata, collection data, etc.) NCS (& DDS) Expert Voices wiki.nsdl.org Harvest Manager OAI-PMH service (proai) NDR Search NCS.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Moodle: using an open learning management system to support student learning Keith Landa Purchase College
Resources for Teaching Teachers Earth Science Content and Pedagogy The Association for Science Teacher Education Rusty Low Shelley Olds January 2006.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
CERN – IT Department CH-1211 Genève 23 Switzerland t CERN Open Source Collaborative tools: Digital Library Software Tim Smith CERN/IT.
Tutorial – Semantic Digital Libraries, May 9, 2007 WWW 2007 Copyright , DERI NUI Galway, University of Vienna, Fraunhofer IPSI, Cornell University.
Filipe MS Bento University of Aveiro, Portugal » PhD Research grant by VuFind as a Participatory Scientific Information Discovery, Access, Evaluation.
Open Repositories 2008 The NCore Platform: An Open-Source Suite of Tools and Services for Implementing Digital Libraries Dean B. Krafft Cornell University.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
Tech Terminology for non-technical people Tim Bornholtz 2006 Annual Conference.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
DLESE and NSDL: Digital Library Components of Cyberinfrastructure International Workshop of Cyberinfrastructure for Geosciences IWCG Beijing, China.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
The National Science Digital Library & Shibboleth.
Nada Dabbagh, PhD Professor & Director Division of Learning Technologies George Mason University Fairfax, VA USA.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Tooltime: Using NSDL 2.0 Dean Krafft, Cornell University
Blogging and Publishing in the NSDL Dean Krafft, Carol Minton Morris (Cornell) Blythe Bennett (Syracuse)
How to use Thematic Units……. The key to successful thematic unit development and teaching is careful and thoughtful planning, combined with a thorough.
Education and Outreach Overview Susan Van Gundy Core Integration NSDL Central Office, UCAR.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Core Integration Web Services Dean Krafft, Cornell University
1 The NSDL Program Stephen Griffin National Science Foundation.
A Training Program for Shareable Metadata Metadata for You & Me is a collaboration between the University of Illinois Library and Indiana University. This.
NSDL 2.0: Creating a collaborative digital library Dean Krafft, Cornell University
1 © Xchanging 2010 no part of this document may be circulated, quoted or reproduced without prior written approval of Xchanging. MOSS Training – UI customization.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
The Technical Infrastructure of the NSDL Dean Krafft, Cornell University
NSDL 2.0: Building a Collaborative Digital Library Dean Krafft, Cornell University
The Technical Infrastructure of the NSDL Dean Krafft, Cornell University
Web 2.0: Making the Web Work for You, Illustrated Unit A: Research 2.0.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Building Tools and Services on the NDR Dean Krafft, Cornell University
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Electronic Theses and Dissertations: The bepress Approach Ben Hermalin Interim Dean, Haas School of Business, UC Berkeley & Co-Founder, bepress.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
A Training Program for Shareable Metadata Metadata for You & Me is a collaboration between the University of Illinois Library and Indiana University. This.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
NSDL: A New Tool for Teaching and Learning.
Joseph JaJa, Mike Smorul, and Sangchul Song
Overview: Fedora Architecture and Software Features
An Architecture for Complex Objects and their Relationships
Working with the NSDL 2.0 Data Repository
Social Media Revolution (Refresh)
NSDL Data Repository (NDR)
The National Science Digital Library (NSDL)
Presentation transcript:

Building a National Science Digital Library on Fedora Dean Krafft, Cornell University

Building NSDL on Fedora Structure of the talk:  The Fedora-based NSDL Data Repository (NDR) and NSDL 2.0  Scaling Challenges  Inspiring Contribution and Collaboration - ExpertVoices  Other NSDL 2.0 Services and Tools  Q&A

What is the NSDL?  An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education  A digital library describing over a million carefully selected online STEM resources from over 100 collections (at  A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees  A large community of researchers, librarians, content providers, developers, students, and teachers

Going beyond the card catalog  NSDL 1.0: Metadata Repository – Oracle-based union catalog of metadata records aggregated with OAI-PMH  NSDL 2.0: A library that guides not just resource discovery, but resource selection, use, and contribution  Supports creating “context” for resources  Presents resources in context: in a lesson plan; with ratings; correlated with education standards  Supports creating a permanent archive of resources  Enables community tools for structuring, evaluation, annotation, contribution, collaboration  Goal: Create a dynamic, living library

NSDL 2.0: NSDL Data Repository  Goals:  Architecture of participation: service-based, not a monolithic application/single user experience  Remixable data sources and data transformations  Harnessing (and capturing) collective intelligence  A free market of millions of inter-related resources (create the “long tail”)  Two-way data flow: NSDL ↔ users  Solution: Fedora-based NSDL Data Repository

Implementing the NDR with Fedora  Multiple Object Types:  Resources (with local or remote content)  Metadata  Aggregations (collections)  Metadata Providers (branding)  Agents  RDF relationships that use the Fedora Resource Index to support arbitrary graph queries:  Structural (part of)  Equivalence  Annotation

Draft NDR API Characteristics  Uses REST calls for all interactions; uses handles (DOIs) for all external references  Ensures external applications can’t violate the NDR model constraints  Disseminations allow combining metadata from multiple sources, or related content  Authentication: Requests signed with private key associated with an agent  Authorization: Agent can become a metadata provider or aggregator; can create resources

NDR Architecture

An Information Network Overlay  Think of the NDR as a lens for viewing science content on the net  Content can be:  Local: stored directly in the NDR  Remote: accessed through a URL  Computed: derived from a database or web service  Archived: an older version stored at SDSC  It all has a repository-based URL

Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on the Web

Scaling Challenges “You can tell the pioneers by the arrows in their backs”

The Resource Index  First application to build a large-scale triplestore  Initial NDR design (Sept. ’05):  438 triples per object, total of ~600 million  Kowari only tested to 200 million triples  Redesigned to approx 70 triples/object  Kowari challenges:  Memory mapped, requires 64-bit addressing  Memory leak in Kowari, fixed by Chris Wilper  RI corruption problems – implemented fixes, instituted best practices, monitoring, and verification before backup

Loading the repository  Fedora initially optimized for quick access (i.e. load/modify not so optimal)  Initial test load of repository (roughly ½ size):  over 875,000 metadata records  over 2 million digital objects  over 163 million RDF triples (lots)  Initial test load last December took weeks – it got slower as the repository got bigger

More Scaling Challenges  OAI provider worked fine for small repositories, but initial queries didn’t scale – redesigned queries  Fedora buffers RI updates, flush very expensive – redesigned API, working on Fedora solution that peeks in buffer  Sockets weren’t being closed quickly enough – fixed  Initial modify times 26 sec/object - fixed

The Good News  The NDR has intercepted most of the scaling arrows  Many updates to add multi-threading, fix threading/concurrency problems  Every Fedora API-M operation tuned to <100ms  Result: Overall performance has improved 1-2 orders of magnitude for many NDR operations  Fedora journaling system to support redundant servers nearly complete  The Fedora team has been highly responsive to every single NDR issue

How should we use the NDR?  The NDR provides powerful capabilities for:  Creating context around resources  Enabling the NSDL community to directly contribute resources and context  Representing a web of relationships among science resources and information about those resources  How do we use it? Here’s one specific example …

ExpertVoices

What is Expert Voices?  A system using blogging technology to:  Support STEM conversations among scientists, teachers and students  Tie NSDL resources to real-world science news  Create context for resources to enhance discovery, selection and use  Enable NSDL community members to become NSDL contributors: of resources, questions, reviews, annotations, and metadata

Broadening Participation: An Expert Voices Learning Scenario  “Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL resources  Expert creates an entry for Hurricane Gertrude  “On track to hit Ft. Lauderdale in 72 hours”  “Currently undergoing eyewall replacement cycle”  “Expecting 15 foot storm surge”  Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain site (storm surge context)  Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers  Students experience engaging real-time, real-world applications of science lessons

Expert Voices Implementation  Initial blog system is multi-user WordPress  WordPress plug-ins provide NDR integration and Shibboleth authentication  Publication of blog entry creates:  Content, as a new resource with simple metadata  New NDR resources included in entry  New metadata for any referenced resources in content  Graph of relationships between entry and all referenced resources  Blog available as independent RSS feed

NDR Entry for Expert Voices Blog Entry New Metadata New Audience MD Referenced New Resource 1 Referenced Existing Resource 2 Annotates Metadata for Member of Metadata Provider Metadata Provider Existing Collection Topic- based Blog Member of Inferred relationship between resources

But Expert Voices is just the beginning…

NDR Application: OnRamp  A multi-user, multi-project content management system  Built on Fedora – content objects can transition to become NDR resources  Decentralized workflow for the creation and distribution of both simple and complex content – possible first step in general Fedora workflow system  Disseminates content in multiple publication and online forms  Delivery estimated 3Q06

NDR Application: Integrated Wiki  Community of approved contributors (e.g. teachers, librarians, scientists) are granted edit access on OpenNSDL wiki  New resources and metadata are created as wiki pages and reflected into the NDR  Non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking  User and project pages organize NDR resources

Other applications in development  Automated grade-level assignment based on vocabulary analysis (SDSC)  Educational Standards assignment (Syracuse)  iVia-based Expert-Guided crawl: Tool for Pathways and others to turn websites into resource collections (UC Riverside)  Automated subject assignment (UC Riverside)  MyNSDL: Bookmark and tag STEM resources within and outside the NSDL (Cornell)

… NSDL 2.0 Ecosystem Protocol: OAI-PMH HTTP REST NDR API STEM Collections Search Service Archive Service Fedora- based NDR

Summary  Fedora and all its capabilities were essential to the creation of NSDL 2.0: a digital library that allows scientists, teachers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL.  The NDR demonstrates that Fedora is a powerful and flexible tool that can scale to a complex repository with millions of dynamic objects.

Acknowledgements  NSDL NSF Program Officers  Lee Zia  David McArthur  NSDL Core Integration Team  UCAR: Kaye Howe, PI and Executive Director  Cornell: Dean Krafft, PI  Columbia: Kate Wittenberg, PI  Fedora Development Team  Cornell: Sandy Payette & Carl Lagoze  Univ. of Virginia: Thornton Staples

Questions?

Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY USA This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.