The Technical Infrastructure of the NSDL Dean Krafft, Cornell University

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
JMS messaging service  All write-only Fedora operations are published to subscribed clients  Messaging system can be durable – if client/consumer/subscriber.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Choosing Technology That Can Evolve With User Needs VALA 2006 Melbourne, Australia February 2006 Sandy Payette Co-Director, Fedora Project Researcher,
Fedora Commons: Introduction and Update Swedish National Library June 24, 2008.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science.
The Subject Librarian's Role in Building Digital Collections: Where Information Management and Subject Expertise Meet Ruth Vondracek Oregon State University.
Open Repositories 2008 The NCore Platform: An Open-Source Suite of Tools and Services for Implementing Digital Libraries Dean B. Krafft Cornell University.
Building a National Science Digital Library Dean Krafft, Cornell University
NDR (resource references, metadata, collection data, etc.) NCS (& DDS) Expert Voices wiki.nsdl.org Harvest Manager OAI-PMH service (proai) NDR Search NCS.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
NSDL – A Tool for Teaching and Learning Eileen McIlvain Pathways Liaison NSDL Core Integration BEN Scholars Workshop December 8-10, 2006.
Digital Library Architecture and Technology
Making the Most of Digital Learning Resources for STEM with NSDL 2010 Robert Noyce Teacher Scholarship Program Conference Washington DC July 8-9, 2010.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
Tutorial – Semantic Digital Libraries, May 9, 2007 WWW 2007 Copyright , DERI NUI Galway, University of Vienna, Fraunhofer IPSI, Cornell University.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Network of Communities: Synergy Through Common Formats, Reuse, and Models for Contribution Cathy Manduca, Sean Fox, Bruce Mason representing SERC, comPADRE,
Open Repositories 2008 The NCore Platform: An Open-Source Suite of Tools and Services for Implementing Digital Libraries Dean B. Krafft Cornell University.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
Building a National Science Digital Library on Fedora Dean Krafft, Cornell University
DLESE and NSDL: Digital Library Components of Cyberinfrastructure International Workshop of Cyberinfrastructure for Geosciences IWCG Beijing, China.
NSDL: OAI and a large- scale digital library Carl Lagoze, Cornell University NSDL Director of Technology
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Information Network Overlay Architecture Adding Value to Digital Content Carl Lagoze CS 431 – May 4, 2005 Cornell University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
The National Science Digital Library & Shibboleth.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Tooltime: Using NSDL 2.0 Dean Krafft, Cornell University
Blogging and Publishing in the NSDL Dean Krafft, Carol Minton Morris (Cornell) Blythe Bennett (Syracuse)
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
Jim Dorward Sarah Giersch Kaye Howe Rena Janke Mimi Recker Andy Walker NSF Awards: NSDL ;TPC Using Online Science & Math Resources in Classrooms.
Core Integration Web Services Dean Krafft, Cornell University
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
1 The NSDL Program Stephen Griffin National Science Foundation.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
NSDL 2.0: Creating a collaborative digital library Dean Krafft, Cornell University
Fedora Content Modeling for Improved Services for Research Databases Open Repositories 2009 Mikael Karstensen Elbæk Alfred Heller Gert Schmeltz Pedersen.
The Technical Infrastructure of the NSDL Dean Krafft, Cornell University
DSpace - Digital Library Software
NSDL 2.0: Building a Collaborative Digital Library Dean Krafft, Cornell University
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Building Tools and Services on the NDR Dean Krafft, Cornell University
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
A centre of expertise in digital information management 10 minute practical guide to the JISC Information Environment (for publishers!)
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Overview: Fedora Architecture and Software Features
NSDL: OAI and a large-scale digital library
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
NSDL Data Repository (NDR)
The National Science Digital Library (NSDL)
Malte Dreyer – Matthias Razum
Presentation transcript:

The Technical Infrastructure of the NSDL Dean Krafft, Cornell University

NSDL Technical Overview Structure of the talk:  NSDL 1.0 Architecture and Lessons Learned  The Fedora-based NSDL Data Repository (NDR) and NSDL 2.0  Inspiring Contribution and Collaboration - ExpertVoices  Other NSDL 2.0 Services and Tools  Q&A

What is the NSDL?  An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education  A digital library describing over a million carefully selected online STEM resources from over 100 collections (at  A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees  A large community of researchers, librarians, content providers, developers, students, and teachers

What are the building blocks?  A distributed set of NSDL collections  A central repository of aggregated information about Science, Technology, Engineering, and Mathematics digital resources  A set of services that build on the repository, initially search and archive  A set of portals that expose NSDL STEM resources to a variety of user communities

Infrastructure overview: NSDL 1.0 STEM Collections on the Web Central Metadata Repository Search Service Archive Service Collection Registration System NSDL.org Portal Protocol: OAI-PMH HTTP REST SQL

NSDL 1.0 Ingest  Create a “union catalog” of Dublin Core metadata records for STEM resources  Harvest those records from collections using OAI-PMH (openarchives.org)  Normalize and augment metadata records (change to qualified DC)  Store records in an Oracle DB and re- serve qualified DC through OAI-PMH

Metadata Aggregation … Collection OAI Servers Metadata Repository (Oracle DB) OAI-PMH Server Metadata Ingest Service Harvest Management/ Collection Registration System OAI-PMH NSDL Dublin Core NSDL DC

NSDL 1.0 Search  Harvest metadata records from MR using OAI-PMH  Crawl URLs specified in the metadata using Nutch  Build a search index using metadata plus full-text of available content pages  Expose search in a web portal at nsdl.org for K-gray access to NSDL resources

Search Architecture Web Content Pathway Portal Pathway Portal NSDL.org Portal Lucene Query Engine REST Query Interface OAI Harvester OAI-PMH Search and Discovery Server Nutch Harvester http/ftp Lucene index generator Metadata Repository …

NSDL 1.0 Lessons  Rather than one portal for everyone, support communities with common interests: Pathways now provide discipline and area-specific portals  Metadata is expensive: unlike traditional libraries, e.g. through OCLC, digital collections have very “mixed quality” metadata, with unusual and inconsistent coding  On the good side: Oracle DB and OAI-PMH server scaled successfully to over 1 million catalog records

NSDL 1.0 Lessons continued  OAI-provided collections need 3 types of expertise: domain (resources & pedagogy), metadata (vocabulary & formatting), and technical (XML schema, UTF8, HTTP, OAI-PMH).  In many cases it took several months from first contact to successful OAI harvest, and the average harvest failure rate has stayed at 25%-50%, with only 23% of that transient failures  Incremental harvesting fundamental to efficient processing, but problematic: issues with persisting deleted records and recovering from partial harvests  Result: some automation, but high people cost

NSDL 1.0 Summary  Metadata Repository was quick to implement using known technologies, but  Limited model  Metadata-centric orientation  No content – only metadata  Limited relationships – collection/item  Limits on context, structure, and access  Severe limits on contribution and collaboration  One-way data flow: NSDL → Users

Going beyond the card catalog  Create an NSDL that guides not just resource discovery, but resource selection, use, and contribution  Supports creating “context” for resources  Presents resources in context: in a lesson plan; with ratings; correlated with education standards  Supports creating a permanent archive of resources  Enables community tools for structuring, evaluation, annotation, contribution, collaboration  Goal: Create a dynamic, living library

NSDL 2.0: NSDL Data Repository  Goals:  Architecture of participation: service-based, not a monolithic application/single user experience  Remixable data sources and data transformations  Harnessing (and capturing) collective intelligence  A free market of millions of inter-related resources (create the “long tail”)  Two-way data flow: NSDL ↔ users  Solution: Fedora-based NSDL Data Repository

Fedora: the NDR middleware  A Flexible, Extensible Digital Object Repository Architecture (  Open source project with $2.2 million in Mellon funding  Collaboration of Cornell and Univ. of Virginia  Key funded users include:  eSciDoc project (collaboration of the Max Planck Society and FIZ Karlsruhe)  VTLS Corp., Harris Corp., Library of Congress  Australian Research Repositories Online to the World (ARROW)  Royal Library Denmark, National Library, and DTU

The Fedora Vision: A Repository for Rich Information Networks

What is Fedora?  An architecture, toolkit, and implementation: middleware, not a vertical application  DSpace in contrast: a vertical application with a fixed workflow targeted at users  Stores arbitrary internal and external digital objects, disseminations (transformations and combinations), relationships among objects  Entirely SOAP/REST based, disseminations are URLs  XML data store; RDBMS cache; RDF triplestore supports relationship queries

Fedora Key Features  Content aggregation  Digital object model to combine information entities in novel ways  Knowledge integration  Ontology-based relationships among objects  Information reuse  Create secondary, tertiary objects  Information transformation  Combine objects with computational services  Collaboration and contribution  Enable annotation, info sharing, workflow, contextualization  Information management and preservation  XML-based object storage  Service-oriented architecture; web services  Store relationships and service linkages with objects

Digital object identifier Reserved Datastreams Key object metadata Disseminators Web-service methods for distributing views of recombined content Datastreams Set of content or metadata items (local or external URL redirects) Fedora Digital Object Model Component View Persistent ID (PID) Dublin Core (DC) Datastream Audit Trail (AUDIT) Relations (RELS-EXT) Disseminator Default Disseminator

Implementing the NDR with Fedora  Multiple Object Types:  Resources (with local or remote content)  Metadata  Aggregations (collections)  Metadata Providers (branding)  Agents  Relationships with arbitrary graph queries:  Structural (part of)  Equivalence  Annotation

Draft NDR API Characteristics  Uses REST calls for all interactions  Specializes Fedora for NDR objects/relationships  Disseminations allow combining metadata from multiple sources, or related content  Authentication: Requests signed with private key associated with an agent  Authorization: Agent can become a metadata provider or aggregator; can create resources  Documentation being developed at

NDR Architecture

An Information Network Overlay  Think of the NDR as a lens for viewing science content on the net  Content can be:  Local: stored directly in the NDR  Remote: accessed through a URL  Computed: derived from a database or web service  Archived: an older version stored at SDSC  It all has a repository-based URL

Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on the Web

Status of the NDR  Repository in test load  over 875,000 metadata records  over 2 million digital objects  Over 163 million RDF triples (lots)  Scaling challenges  moved to 64-bit architecture with 32GB memory  need to carefully structure RDF queries  can scale current system by factor of five  need to move to more powerful triplestore (Oracle)  Estimating fully operational beta version of new NDR in June

How should we use the NDR?  The NDR provides powerful capabilities for:  Creating context around resources  Enabling the NSDL community to directly contribute resources and context  Representing a web of relationships among science resources and information about those resources  How do we use it? Here’s one specific example …

Issues in STEM Education  Issue: Need to support scientific inquiry  Issue: Students need a better understanding of the processes of scientific research  Issue: Teachers are often under- prepared to teach science and math  Issue: Scientists need tools to make science and math research more available

Addressing the Needs  In Response: NSDL is building an educational tool that…  Models scientific inquiry and exposes the processes of scientific research  Promotes and facilitates conversations between research and education communities  Brings content expertise into the classroom to support under-prepared teachers  Allows scientists, teachers, and media specialists to collaboratively develop instructional context around NSDL resources

ExpertVoices

What is Expert Voices?  A system using blogging technology to:  Support STEM conversations among scientists, teachers and students  Tie NSDL resources to real-world science news  Create context for resources to enhance discovery, selection and use  Enable NSDL community members to become NSDL contributors: of resources, questions, reviews, annotations, and metadata  Expert Voices ≠ LiveJournal  Contributors are carefully selected, contributions are about science, the process of science, and education

Expert Voices As An Educational Tool  Topic-based discussion (e.g. tsunamis) with pointers to related resources  Research outreach (Criterion 2) – explaining and documenting NSF-funded research  Experts can add resources with topical context to the NSDL  Resources can be reviewed and annotated  Question/answer and discussion forum: scientist ↔ teacher ↔ student ↔ librarian

Broadening Participation: An Expert Voices Learning Scenario  “Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL resources  Expert creates an entry for Hurricane Gertrude  “On track to hit Ft. Lauderdale in 72 hours”  “Currently undergoing eyewall replacement cycle”  “Expecting 15 foot storm surge”  Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain site (storm surge context)  Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers  Students experience engaging real-time, real-world applications of science lessons

Broadening Participation: An Expert Voices Outreach Scenario  NSF grantee: Bioluminescence researcher wants to make research K-12 accessible  Creates an Expert Voices conversation  Enables his students and researchers to document process and results – how science really works  Writes about publications and educational resources (e.g.  Adds these to the NSDL, creating audience-level metadata  Entries serve as annotations that create K-12 context for the college-level research

Expert Voices Implementation  Open source multi-user blogging system  Published entries become NSDL resources  Owner controls publication of entries and visibility of comments  Entries can contain linked references to NSDL resources, references to URLs that should become resources, and new resource metadata  Integrated with NSDL community sign-on

Expert Voices Implementation  Initial blog system is multi-user WordPress  WordPress plug-ins provide NDR integration and Shibboleth authentication  Publication of blog entry creates:  Content, as a new resource with simple metadata  New NDR resources  New metadata for any referenced resources in content  Graph of relationships between entry and all referenced resources  Blog available as independent RSS feed

NDR Entry for Expert Voices Blog Entry New Metadata New Audience MD Referenced New Resource 1 Referenced Existing Resource 2 Annotates Metadata for Member of Metadata Provider Metadata Provider Existing Collection Topic- based Blog Member of Inferred relationship between resources

But Expert Voices is just the beginning…

NDR Application: OnRamp  NDR-integrated multi-user, multi-project content management system  Supports NSDL single sign-on and group management  Decentralized workflow for the creation and distribution of both simple and complex content  Disseminates content in multiple publication and online forms  Delivery estimated 3Q06

NDR Application: Instructional Architect  Created by Mimi Recker and colleagues at Utah State University  Teacher develops a lesson plan, incorporating NSDL and other resources  Assigns subject, grade level, ed standard  Distributes to class or public  Available now, with NDR integration in design

NDR Application: Integrated Wiki  Community of approved contributors (e.g. teachers, librarians, scientists) are granted edit access on OpenNSDL wiki  New resources and metadata are created as wiki pages and reflected into the NDR  Non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking  User and project pages organize NDR resources

NDR Application: Content Assignment Tool  Developed by Anne Diekema, Elizabeth Liddy, et al. at the Syracuse University Center for Natural Language Processing  Uses text analysis and machine learning to suggest Educational Standards alignment for resources  Content expert assigns standard, and system learns from the assignment  Standalone tool available now; standards associated with resources in the NDR by 3Q06

Content Assignment Tool

Other applications in development  Automated grade-level assignment based on vocabulary analysis (San Diego Supercomputer Center)  iVia-based Expert-Guided crawl: Tool for Pathways and others to turn websites into resource collections (UC Riverside)  Automated subject assignment (UC Riverside)  MyNSDL: Bookmark and tag STEM resources within and outside the NSDL (Cornell)

… NSDL 2.0 Ecosystem Protocol: OAI-PMH HTTP REST NDR API STEM Collections Search Service Archive Service Fedora- based NDR

What does this mean for the user?  NSDL 2.0 applications situate resources in context, aiding both discovery and use  Users become contributors, adding new resources, ratings, annotations, and organizational structure – frequently as a side effect of using the library  Specialized portals, tagging, and powerful relationship queries and filtering support user- specific “views” into the library

Summary  NSDL 1.0 created a large, production digital library of STEM resources for education.  NSDL 2.0 and its tools allow scientists, mathematicians, teachers, engineers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL.

Acknowledgements  NSDL NSF Program Officers  Lee Zia  David McArthur  NSDL Core Integration Team  UCAR: Kaye Howe, PI and Executive Director  Cornell: Dean Krafft, PI  Columbia: Kate Wittenberg, PI  Fedora Development Team  Cornell: Sandy Payette & Carl Lagoze  Univ. of Virginia: Thornton Staples

Questions?

Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY USA This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.