Linked Data For Libraries (LD4L)

Slides:

Advertisements

Similar presentations

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.

Advertisements

Collaborating to Manage Research Data Zheng (John) Wang Rick Johnson Hesburgh Libraries University of Notre Dame 12/10/13.

The Linked Data for Libraries (LD4L) Project: A Progress Report Dean B. Krafft, Cornell University Library Tom Cramer, Stanford University Libraries CNI.

Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.

1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.

VIVO and Linked Open Data December 13, 2010 Dean B. Krafft Chief Technology Strategist and Director of IT Cornell University Library.

Creating a Data Interchange Standard for Researchers, Research, and Research Resources: VIVO-ISF Dean B. Krafft Brian Lowe Coalition for Networked Information.

One Body, Many More Heads, One Year Later Open Repositories 2012, Edinburgh.

Open Library Environment Designing technology for the way libraries really work November 19, 2008 ~ ASERL, Atlanta Lynne O’Brien Director, Academic Technology.

MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.

UKOLN is supported by: OAI-ORE a perspective on compound information objects ( Defining Image Access.

Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.

Expertise/Knowledge Tools: Exploring Options Council of Research Associate Deans April 2006.

Repositories as Platforms for Researcher e-Portfolios Susan Gibbons Associate Dean, River Campus Libraries University of Rochester

Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.

Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.

One Body, Many Heads for Repository-Powered Library Applications Chris Awre Head of Information Management Library and Learning Innovation University of.

Shared IR Project Overview Rick Johnson Lead Project Director Shared IR University of Notre Dame Hydra Connect 2014 January 22, 2014Hydra Connect

VIVO: Enabling National Networking of Scientists Michael Conlon, PhD Principal Investigator

1 Open Library Environment Designing technology for the way libraries really work December 8, 2008 ~ CNI, Washington DC Lynne O’Brien Director, Academic.

Greg Harris President & CEO We Can Work It Out Establishing the World’s First Rock and Roll Library.

HUBZERO AT INDIANA UNIVERSITY: THE INDIANA CTSI HUB Bill Barnett EDUCAUSE October 14, 2010.

CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.

VIVO and the Scholarly Identity Landscape Internet2 Member Meeting April 23, 2012.

Hydra from 35,000ft Chris Awre Hydra Europe Symposium London School of Economics, 23 rd April 2015.

Metadata: An Overview Katie Dunn Technology & Metadata Librarian

1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.

EXtensible Catalog David Lindahl University of Rochester.

The National Data Service: A Library Perspective Dean B. Krafft Chief Technology Strategist, Cornell University Library NDS Consortium Planning Workshop.

G ET A HEAD ON Y OUR R EPOSITORY Tom Cramer Chief Technology Strategist Stanford University Libraries.

5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.

DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.

One Body, Many Heads for Repository-Powered Digital Content Applications Hydra Europe Symposium, Trinity College, Dublin, 7 th April 2014 Chris Awre Head.

Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.

One Body, Many Heads for Repository-Powered Library Applications Tom Cramer Chief Technology Strategist Stanford University Libraries CNI * 13 December.

What is VIVO? Research and Scholarship Discovery Across the University of Florida – A service of the Clinical and Translational Science Institute A Grant.

Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.

PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.

Sakaibrary Project Update: Subject Research Guides and Next Steps Jon Dunn Indiana University July 2, 2008.

Hypatia Hydra Platform for Access to Information in Archives DLF Forum * Baltimore * October 31, 2011 Stanford University Bradley Daigle Julie Meloni Tom.

10/07/2008 Semantic Web Technologies & Higher Education.

Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”

VIVO Update and Many Flavors of Search Mike Conlon University of Florida.

VIVO and Scholarly Repositories: Synergistic Opportunities.

“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.

Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.

Introduction to the Semantic Web and Linked Data

G ET A HEAD ON Y OUR R EPOSITORY Tom Cramer Chief Technology Strategist Stanford University Libraries.

April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.

VuFind: Community & Code. vufind.org Overview Intro to VuFind Features & Technologies Community, Support, Sustainability …

Project Update Mike Conlon VIVO Project Director.

8th Sakai Conference4-7 December 2007 Newport Beach Sakaibrary Project Update: Subject Research Guides December 6, 2007.

Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.

Building on VIVO and going the next step: Adding or Linking to Local and National repositories and/or research data; research resources and core facilities;

VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.

Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.

Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.

Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.

VIVO is... A community of 133 sites in 26 countries Organizations represented on VIVO governance groups: Brown, Cornell, Duke, George Washington University,

International Planetary Data Alliance Registry Project Update September 16, 2011.

VIVO: DISCOVERY, MANAGEMENT AND CONNECTING RESEARCHERS Heather Seibert-Racine : Research and Scholarly Communications 12 th Annual Joyner Library Paraprofessional.

Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.

DataNet Collaboration

Sakaibrary Project Update: Subject Research Guides

Richard Green (for Chris Awre) Open Repositories Conference, Dublin

VI-SEEM Data Repository

Hydra: a case study Chris Awre

An ecosystem of contributions

NSDL Data Repository (NDR)

Sustaining Networks of Researchers:

Presentation transcript:

Linked Data For Libraries (LD4L) Dean B. Krafft March 17, 2014

Overview Intro to the LD4L project Why use Linked Data? LD4L Building Blocks: VIVO LD4L Building Blocks: Hydra LD4L Building Blocks: LibraryCloud/ShelfRank So, what are we actually doing? Summing Up

Linked Data for Libraries (LD4L) On December 5, 2013, the Andrew W. Mellon Foundation made a two-year $999K grant to Cornell, Harvard, and Stanford starting Jan ‘14 Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources Leverages existing work by both the VIVO project and the Hydra Partnership

The Project Team Cornell: Dean Krafft, Jon Corson-Rikert, Brian Lowe, Simeon Warner, and 1.5 new FTE Harvard: David Weinberger, Paul Deschner, and Paolo Ciccarese Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur, and 1 new FTE

“The goal is to create a Scholarly Resource Semantic Information Store model that works both within individual institutions and through a coordinated, extensible network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to information resources when they describe, annotate, organize, select, and use those resources, together with the social value evident from patterns of usage.”

LD4L can provide a common language for all the rich context around scholarly information resources that cuts across the boundaries of different disciplines, libraries, systems, and countries

Project Outcomes Create a SRSIS ontology that is “sufficiently expressive to encompass traditional catalog metadata from both Cornell and Harvard, the basic linked data elements described in the Stanford Linked Data Workshop Technology Plan, and the usage and other contextual elements from StackLife” Create a SRSIS Semantic editing, display, and discovery system based on the “Vitro semantic web platform and the SRSIS ontology, each instance of the system will support the incremental ingest of semantic data from multiple information sources, including the Cornell, Harvard, and Stanford MARC-based catalogs, StackLife, LibGuides, VIVO, Harvard Profiles, CAP, and OAI-PMH metadata providers, among others.” Create a Project Hydra compatible interface to SRSIS, “an ActiveTriples software component that facilitates the easy use of SRSIS and other linked-data within Hydra-based systems.”

Why Use a Linked Data Approach?

The Semantic Web Turn data into a web of simple links Use ontology to explain how things are linked Use reasoning to add new links automatically Be flexible and extensible Reasoning example: sameAs An ontology is a representation of entities and relations … … for a part of reality … … expressed in human and computer interpretable form

Hierarchical information organization

Changing the focus to relationships

Benefits of the Semantic Web Approach Focuses on connections Connections have meaning, not just hierarchy Shared topics Human relationships Linkage through events Geographic proximity Temporal alignment Supports many dimensions of nearness

RDF “triples”

Connectivity at the micro level Triples connect subjects with objects via a consistent set of relationships Jane Smith holds position in author of member of Dept. of Genetics College of Medicine Journal article Book chapter Book Genetics Institute Subject Predicate Object

Using Semantic Web Technology vs. Linked Open Data

What is Linked Open Data (LOD)? Structured information, not just documents with text A common, simple format (triples) Open Available, visible, mine-able Anyone can post, consume, and reuse Linked Directly by reference Indirectly through common references and inference

An HTTP request can return HTML or data

Commonality among references Shared types foaf: Person, Organization geo: Country bibo: Book, Academic Article, Journal Shared relationships geo: hasBorderWith, isInGroup, hasMember foaf: knows, homepage dc: creator skos: subject Shared instances defined as types and linked by relationships Agrovoc concepts: rice, sustainability DBPedia URIs for places, concepts, events Unique identifiers for researchers and organizations

Why Use LOD for Data Sharing? LOD requires Being willing to make data open Providing enough structure and consistency so data can be linked The goal is a higher return on investment over time http://rww.readwriteweb.netdna-cdn.com/images/semweb_pwc1.png http://www.pwc.com/us/en/technology-forecast/spring2009/semantic-web-technologies.jhtml

The Linked Open Data Cloud The critical next step in the broad accessibility of research, researchers, and research data is to make the underlying metadata and relationships available as linked open data. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

LD4L Building Blocks: VIVO

What is VIVO? Software: An open-source semantic-web-based researcher and research discovery tool Data: Institution-wide, publicly-visible information about research and researchers Standards: A standard ontology (VIVO data) that interconnects researchers, communities, and campuses using Linked Open Data Community: An open community with strong national and international participation

VIVO connects scientists and scholars with and through their research and scholarship

So – How Does VIVO Actually Work?

VIVO is a Semantic Web application Provides data readable by machines, not just text for humans Provides self-describing data via shared ontologies Defined types Defined relationships Provides search & query augmented by relationships Does simple reasoning to categorize and find associations Teaching faculty = any faculty member teaching a course All researchers involved with any gene associated with breast cancer (through research project, publication, etc.)

The VIVO/Vitro Platform Ingest tools – getting batch data in Ontology editing tools – change what is being described and represented Instance editing tools – Edit instances of any of the things represented in the ontology (people, publications, organizations, etc.) Template/display system – Display instances and sets in a useful way

What does VIVO model? People and more Relationships among the above Organizations, grants, programs, projects, publications, events, facilities, and research resources Relationships among the above Meaningful Bidirectional Navigable context Links to URIs elsewhere Concepts, identifiers People, places, organizations, events

The VIVO ontology Describes people and organizations in the process of doing research, scholarship, and creative activities Stays discipline neutral and supports description and discovery across all disciplines Uses existing domain terminology to describe the content of research Remains modular, flexible, and extensible An ontology is a representation of entities and relations … … for a part of reality … … expressed in human and computer interpretable form

Data, data, data VIVO harvests much of its data automatically from verified sources Reduces the need for manual input of data Provides an integrated and flexible source of publicly visible data at an institutional level External data sources Internal data sources Authoritative data, diverse formats, filter out private information Talk about verified data Talking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input. Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative! The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution. Individuals may also edit and customize their profiles to suit their professional needs

Typical data sources HR – people, appointments Research administration – grants & contracts Registrar – courses Faculty reporting system(s) publications, service, research areas, awards Events calendar Internal and external news External repositories – e.g., Pubmed, Scopus

ResearchFacilities & Services Complexity of Inputs People Grants Data Google Scholar Center/ Dept/ Program websites ResearchFacilities & Services Courses Tech transfer Publications VP Research Univ. Communications HPC HR data Faculty Reporting Grad School Pubmed Cross Ref Researcher.gov arXiv other databases NIH RePorter Self-editing Other campuses

Structured data for visualizations

Linked data indexing for search Scripps VIVO UF VIVO WashU VIVO eagle-I Research resources IU VIVO Harvard Profiles RDF Ponce VIVO Other VIVOs Cornell Ithaca VIVO Solr search index Weill Cornell VIVO Iowa Loki RDF Alter-nate Solr index vivo search.org Digital Vita RDF Linked Open Data

Indexes 125K people and 1.3 million publications

Weill – Publications

VIVO is Extensible

Adding Research Resources and Facilities to VIVO CTSAconnect OHSU, Harvard, Cornell, Florida, Buffalo & Stony Brook eagle-i sister NIH project – Harvard, OHSU, 7 others Facilities, services, techniques, protocols, skills, and research outputs beyond publications Extended ways to represent expertise Improve attribution for data and other contributions to science

Connecting researchers, resources, and clinical activities

Supporting Humanities and Artistic Works Performances of a work Translations Collections and exhibits Steven McCauley and Theodore Lawless, Brown University http://www.vivoweb.org/files/vivo2013/friday_pm/ VIVO-Humanities_McCauley.pdf

The VIVO Community is now over 100 institutions worldwide

5th Annual VIVO Conference August 6-8, 2014 Austin, TX www.vivoconference.org

How does LD4L build on VIVO The LD4L ontology will use components of the VIVO-ISF ontology LD4L will use VIVO ontology design patterns The basis for SRSIS implementations at each institution will be Vitro plus LD4L ontology The multi-institution LD4L demonstration search will be an adaptation of VIVOsearch.org LD4L will link to existing VIVO data

LD4L Building Blocks: Hydra

http://projecthydra.org Hydra slides courtesy of Tom Cramer

What Is Hydra? A robust repository fronted by feature-rich, tailored applications and workflows (“heads”) One body, many heads Collaboratively built “solution bundles” that can be adapted and modified to suit local needs. A community of developers and adopters extending and enhancing the core If you want to go fast, go alone. If you want to go far, go together. The only way to build a rich & robust solution is to engage a large community of developers. The only way to build a sustainable solution is to spur adoption by a community of institutions w/ vested interest in shared success.

Fundamental Assumption #1 No single system can provide the full range of repository-based solutions for a given institution’s needs, …yet sustainable solutions require a common repository infrastructure.

For Instance… Generally a single PDF Simple, prescribed workflow ETD Deposit System General Purpose Institutional Repository Digitization Workflow System Simple Complex Generally a single PDF Simple, prescribed workflow Streamlined UI for depositors, reviewers & readers Heterogeneous file types Simple to complex objects One- or two-step workflow General purpose user interfaces Potentially hundreds of files type per object Complex, branching workflow Sophisticated operator (back office) interfaces A single application could not effectively cope with these three use cases; however any institution would want to safeguard the outputs of all these disparate systems in a digital repository for management and preservation. HYDRA gives a framework where ONE BODY (the repo) can support MULTIPLE HEADS (tailored applications)

Hydra Heads: Emerging Solution Bundles Institutional Repositories University of Hull University of Virginia Penn State University Images Northwestern University (Digital Image Library) Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

Hydra Heads: Emerging Solution Bundles Archives & Special Collections Stanford University University of Virginia Rock & Roll Hall of Fame Media Indiana University Northwestern University Rock & Roll Hall of Fame WGBH Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

Hydra Heads: Emerging Solution Bundles Workflow Management (Digitization, Preservation) Stanford University University of Illinois – Urbana-Champagne Northwestern University Exhibits Notre Dame Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

Hydra Heads: Emerging Solution Bundles ETDs Stanford University University of Virginia Etc. (Small) Data everyone… Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

Fundamental Assumption #2 No single institution can resource the development of a full range of solutions on its own, …yet each needs the flexibility to tailor solutions to local demands and workflows.

Hydra Philosophy -- Community An open architecture, with many contributors to a common core Collaboratively built “solution bundles” that can be adapted and modified to suit local needs A community of developers and adopters extending and enhancing the core One framework, many contributors The only way to build a rich & robust solution is to engage a large community of developers. The only way to build a sustainable solution is to spur adoption by a community of institutions w/ vested interest in shared success.

Hydra Philosophy -- Technical Tailored applications and workflows for different content types, contexts and user interactions A common repository infrastructure Flexible, atomistic data models Modular, “Lego brick” services Library of user interaction widgets Easily skinned UI One body, many heads

Technical Framework - Components Fedora provides a durable repository layer to support object management and persistence Solr, provides fast access to indexed information Blacklight, a Ruby on Rails plugin that sits atop solr and provides faceted search & tailored views on objects Hydra Head, a Ruby on Rails plugin that provides create, update and delete actions against Fedora objects

Major Hydra Components

So What is Hydra? Framework for generating Fedora front-end applications w/ full CRUD functionality That follows design pattern with common componentry and platforms Fedora, Ruby on Rails, Solr, Blacklight That supports distinct UI’s, content types, workflows, and policies Being developed by a community of 22 partner institutions (and growing)

How does LD4L build on Hydra? We will create an ActiveTriples Hydra component to mimic ActiveFedora We will make it possible to use a TripleStore as a Hydra repository as well as Fedora The new Cornell Blacklight/Solr-based search will index from Cornell’s triple-based SRSIS We will explore using Hydra-based collections as data sources for LD4L data about resources

LD4L Building Blocks: LibraryCloud/ ShelfRank

Provides model for access to library data Includes access to ShelfRank for Harvard Library resources Provides concrete example for creating an ontology for usage Source data for Harvard SRSIS instance

Enough background, what are we actually doing?

Developing Use Cases Currently have draft set of 33 use cases on LD4L public wiki Users include: faculty, student, dean, researcher, librarian, bibliographer, and cataloger Examples: Research guided by community usage; Compose a syllabus; Build a virtual collection; Info-rich maps; Who an author influenced

Identifying Data Sources Bibliographic data: CUL/Harvard/SUL catalogs Person data: VIVO, Stanford CAP, Profiles Usage data: LibraryCloud, Cornell/Stanford circulation, BorrowDirect circulation Collections: Archival EAD, IRs, SharedShelf, Olivia, arbitrary OAI-PMH Annotations: Cornell CuLLR, Stanford DMS, Bloglinks, DBpedia, LibGuides Subjects & Authorities

Assembling the Ontology General: VIVO-ISF; OpenAnnotation; SKOS Bibliographic: BIBFRAME, BIBO, FaBiO Provenance: PROV-O, PAV People/Organizations: FOAF, PROV, Schema.org Licensing: Creative Commons; Dublin Core Terms; Software Ontology Many vocabularies/identifiers: VIAF, Getty, ORCID, ISNI

Project timeline 2014 Jan-June 2014: Initial ontology design; identify data sources; identify external vocabularies; begin SRSIS and Hydra ActiveTriples development July-Dec 2014: Complete initial ontology; complete initial ActiveTriples development; pilot initial data ingests into Vitro-based SRSIS instance at Cornell

Workshop – January 2015 Hold a two-day workshop for 25 attendees from 10-12 interested library, archive, and cultural memory institutions Demonstrate initial prototypes of SRSIS and ontology Obtain feedback on initial ontology design Obtain feedback on overall design and approach Make connections to support participants in piloting this approach at their institutions Understand how institutions see this approach fitting in with their own multi-institutional collaborations and existing cross-institutional efforts such as the Digital Public Library of America, VIVO, and SHARE

Project timeline Jan-June 2015 Pilot SRSIS instances at Harvard and Stanford Populate Cornell SRSIS instance from multiple data sources including MARC catalog records, EAD finding aids, VIVO data, CuLLR, and local digital collections Develop a test instance of the SRSIS Search application harvesting RDF across the three partner institutions Integrate SRSIS with ActiveTriples

Project timeline July-Dec 2015 Implement fully functional SRSIS instances at Cornell, Harvard, and Stanford Public release of open source SRSIS code and ontology Public release of open source ActiveTriples Hydra Component Create public demonstration of SRSIS Search-based discovery and access system across the three SRSIS instances

Summing Up

Work So Far Initial project meeting at Stanford Jan. 30-31 Developed initial set of use cases and data sources Set up initial working groups: Ontology, Engineering, Use Cases, and Outreach and Workshop Planning Identified a list of potential partners Find out more at: https://wiki.duraspace.org/display/ld4l/

Project Outcomes Open source extensible SRSIS ontology compatible with VIVO ontology, BIBFRAME, and other existing library LOD efforts Open source SRSIS semantic editing, display, and discovery system Project Hydra compatible interface to SRSIS, using ActiveTriples to support Blacklight search across multiple SRSIS instances

Questions?