NSDL 2.0: Creating a collaborative digital library Dean Krafft, Cornell University
Structure of the Talk Project Overview and NSDL 1.0 The Fedora-based NSDL Data Repository (NDR) and NSDL 2.0 Inspiring Contribution and Collaboration: ExpertVoices, Soft Matter Wiki, etc. IS Challenges for NSDL Q&A
What is the NSDL? An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education A digital library describing nearly two million carefully selected online STEM resources from well over 100 collections (at A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees A large community of researchers, librarians, content providers, developers, students, and teachers
Portals to the NSDL
NSDL 1.0 A “Union Catalog” OAI-PMH Harvesting Central Metadata Database OAI Server for catalog Search index of metadata/content Initial K-Gray Portal: nsdl.org
Infrastructure overview: NSDL 1.0 STEM Collections on the Web Central Metadata Repository Search Service Archive Service Collection Registration System NSDL.org Portal Protocol: OAI-PMH HTTP REST SQL
NSDL 1.0 Lessons Metadata Repository was quick to implement using known technologies, but Limited model Metadata-centric orientation No content – only metadata Limited relationships – collection/item Limits on context, structure, and access Severe limits on contribution and collaboration One-way data flow: NSDL → Users
Photo by Jon Crispin
NSDL 2.0 Create an NSDL that guides not just resource discovery, but: Supports creating “context” for resources Presents resources in context: linked to related concepts; with user ratings; with codes and data Enables community tools for selecting, organizing, evaluating, annotating, contributing, and collaborating Provides two-way data flow: NSDL ↔ users Goal: Create a dynamic, living library
In Architectural terms, create an NSDL Data Repository that Supports storing both content and metadata Allows arbitrary relationships among resource and metadata objects: organization, annotation, citation Accessible through web service architecture of remixable data sources and transformations
The Fedora Vision: A Repository for Rich Information Networks
Fedora: the NDR middleware A Flexible, Extensible Digital Object Repository Architecture ( Open source project with $2.2 million in Mellon funding Collaboration of Cornell and Univ. of Virginia Key funded users include: eSciDoc project (collaboration of the Max Planck Society and FIZ Karlsruhe) Public Library of Science (Topaz Foundation) VTLS Corp., Harris Corp., Library of Congress Australian Research Repositories Online to the World Royal Library Denmark, National Library, and DTU
What is Fedora? An architecture, toolkit, and implementation: middleware, not a vertical application Stores arbitrary internal and external digital objects, disseminations (transformations and combinations), relationships among objects Entirely SOAP/REST based, disseminations are URLs XML data store; RDBMS cache; RDF triplestore supports relationship queries
NSDL Data Repository (NDR) References to roughly 2 million selected STEM resources on the web Sourced metadata statements about those resources A REST API to allow authenticated access by Pathways and providers Support for annotation, aggregation, and other relationships
Sample NDR Objects & Relationships Publication Resource Data Set Metadata Publication Metadata Data Set Resource Code Resource Cites Metadata for Member of Metadata Provider MatForge Collection Soft Matter Collection Member of Cites Metadata for Cornell CCMR MatDL Pathway Selector for Selector for
Draft NDR API Characteristics Uses REST calls for all interactions; uses handles (DOIs) for all external references Ensures external applications can’t violate the NDR model constraints Disseminations allow combining metadata from multiple sources, or related content Authentication: Requests signed with private key associated with an agent Authorization: Agent can become a metadata provider or aggregator; can create resources
An Information Network Overlay Think of the NDR as a lens for viewing science content on the net Content can be: Local: stored directly in the NDR Remote: accessed through a URL Computed: derived from a database or web service Archived: an older version stored at SDSC It all has a repository-based URL
Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on the Web
NSDL 2.0 Technical Challenges Scaling the RDF triple-store past 200 million triples Constraining RDF queries to be reasonably computable Building meaningful search indices on explicit metadata, annotations, resource content, and relationships
Applying the NDR The NDR provides powerful capabilities for: Creating context around resources Enabling the NSDL community to directly contribute resources and context Representing a web of relationships among science resources and information about those resources How do we use it? Here’s one specific example …
ExpertVoices
What is Expert Voices? A multi-user blogging tool Topic-based discussions (e.g. forensics) with pointers to related resources An outreach tool to explain and document NSF-funded research A way for NSDL community members to become NSDL contributors: of resources, questions, reviews, annotations, metadata A question/answer and discussion forum: scientist ↔ teacher ↔ student ↔ librarian
What isn’t EV? Expert Voices ≠ LiveJournal Contributors are carefully selected, contributions are about science, the process of science, and education Comic by Michael Lalonde/orneryboy.com
Hurricane Floyd/Photo by NASA
Photo by Jon Crispin
Broadening Participation: An Expert Voices Learning Scenario “Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL Expert creates an entry for Hurricane Gertrude “On track to hit Ft. Lauderdale in 72 hours” “Currently undergoing eyewall replacement cycle” “Expecting 15 foot storm surge” Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain web page Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers Students experience engaging real-time, real-world applications of science lessons
Expert Voices Implementation Open source multi-user blogging system Published entries become NSDL resources Owner controls publication of entries and visibility of comments Entries can contain linked references to NSDL resources, references to URLs that should become resources, and new resource metadata Integrated with NSDL Shibboleth-based community sign-on
But Expert Voices is just the beginning…
Soft Matter Wiki: Planned NDR Integration Community of approved contributors (e.g. teachers, librarians, materials scientists) are granted edit access to Soft Matter wiki New resources and metadata are created as wiki pages and reflected into the NDR Relevant non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking User and project pages organize NDR resources
NDR Entry for Soft Matter Wiki Wiki Entry New Metadata New Audience MD Referenced New Resource 1 Referenced Existing Resource 2 Annotates Metadata for Member of Metadata Provider Metadata Provider Existing Collection Soft Matter Wiki Member of Inferred relationship between resources
MyNSDL: NDR-integrated tagging, bookmarking, and recommendation Based on Connotea open-source folksonomic tagging/bookmarking system Tags and bookmarking structure are reflected back into the NDR Authorized users can “automatically” recommend new NSDL resources simply by tagging them Gives user a personal view of NSDL resources
NDR Application: Content Assignment Tool Developed by Anne Diekema, Elizabeth Liddy, et al. at the Syracuse University Center for Natural Language Processing Uses text analysis and machine learning to suggest Educational Standards alignment for resources Content expert assigns standard, and system learns from the assignment Standalone tool available now; standards associated with resources in the NDR 4Q06
Other applications in development Automated grade-level assignment based on vocabulary analysis (San Diego Supercomputer Center) OnRamp – multi-user, multi-project NDR- integrated content management system Instructional Architect: Lesson plan development for K12 teachers (Utah State) iVia-based Expert-Guided crawl: Tool for Pathways and others to turn websites into resource collections (in development at UC Riverside)
Other proposed applications Moodle Course Management System – courses integrated with NSDL resources Electronic lab notebook – integrating lab notes with code, data sets, and reference materials within the library archival framework
… NSDL 2.0 Ecosystem Protocol: OAI-PMH HTTP REST NDR API STEM Collections Search Service Archive Service Fedora- based NDR
What are the Information Science challenges?
Trust Photo © 2005 Reuters
Contribution
Trust and reputation in NSDL We brand NSDL as a source of “trusted” resources What is our trust mechanism? Transitive trust approval Community rating/filtering/reputation Trusted vs. complete “views” What is the right balance of trust vs. community contribution?
Community Formation Build the tools and they will come? What can we learn from Wikipedia, MySpace, Flickr, and YouTube? How do we leverage existing societies and groupings (NSTA, ACM, AAPT, AAAS)? Is there an NSDL community, or are there many small communities?
Courtesy Kathy Sierra/WickedlySmart.com
Creating Passionate Users How do we help NSDL users “kick ass”? What can we learn from game design? Motivating goal Challenging interaction Meaningful payoff Multiple levels Can we use fun, emotion, seduction, surprise, and visuals – and still be academics?
Courtesy Kathy Sierra/WickedlySmart.com
Photo by Jon Crispin
Challenges of ubiquity Should we target NSDL materials at limited devices (iPods, cell phones)? How does ubiquitous NSDL access change teacher/student interactions? Should we build tools to capture field data from these devices?
Other IS Challenges Personalization: SDI, automated activity analysis, targeted user views Visualizing the library: alternatives to text search for discovery and context Location awareness: specializing library views by physical location
Summary NSDL 2.0 and its tools allow scientists, mathematicians, teachers, engineers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL. NSDL CI needs solve the IS problems needed to turn Capability into Reality.
Acknowledgements NSDL NSF Program Officers Lee Zia David McArthur NSDL Core Integration Team UCAR: Kaye Howe, PI and Executive Director Cornell: Dean Krafft, PI Columbia: Kate Wittenberg, PI Fedora Development Team Cornell: Sandy Payette & Carl Lagoze Univ. of Virginia: Thornton Staples
Apology Courtesy Kathy Sierra/WickedlySmart.com
Questions?
Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY USA This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.