Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linked Data For Libraries (LD4L)

Similar presentations


Presentation on theme: "Linked Data For Libraries (LD4L)"— Presentation transcript:

1 Linked Data For Libraries (LD4L)
Dean B. Krafft March 17, 2014

2 Overview Intro to the LD4L project Why use Linked Data?
LD4L Building Blocks: VIVO LD4L Building Blocks: Hydra LD4L Building Blocks: LibraryCloud/ShelfRank So, what are we actually doing? Summing Up

3 Linked Data for Libraries (LD4L)
On December 5, 2013, the Andrew W. Mellon Foundation made a two-year $999K grant to Cornell, Harvard, and Stanford starting Jan ‘14 Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources Leverages existing work by both the VIVO project and the Hydra Partnership

4 The Project Team Cornell: Dean Krafft, Jon Corson-Rikert, Brian Lowe, Simeon Warner, and 1.5 new FTE Harvard: David Weinberger, Paul Deschner, and Paolo Ciccarese Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur, and 1 new FTE

5 “The goal is to create a Scholarly Resource Semantic Information Store model that works both within individual institutions and through a coordinated, extensible network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to information resources when they describe, annotate, organize, select, and use those resources, together with the social value evident from patterns of usage.”

6 LD4L can provide a common language for all the rich context around scholarly information resources that cuts across the boundaries of different disciplines, libraries, systems, and countries

7 Project Outcomes Create a SRSIS ontology that is “sufficiently expressive to encompass traditional catalog metadata from both Cornell and Harvard, the basic linked data elements described in the Stanford Linked Data Workshop Technology Plan, and the usage and other contextual elements from StackLife” Create a SRSIS Semantic editing, display, and discovery system based on the “Vitro semantic web platform and the SRSIS ontology, each instance of the system will support the incremental ingest of semantic data from multiple information sources, including the Cornell, Harvard, and Stanford MARC-based catalogs, StackLife, LibGuides, VIVO, Harvard Profiles, CAP, and OAI-PMH metadata providers, among others.” Create a Project Hydra compatible interface to SRSIS, “an ActiveTriples software component that facilitates the easy use of SRSIS and other linked-data within Hydra-based systems.”

8 Why Use a Linked Data Approach?

9 The Semantic Web Turn data into a web of simple links
Use ontology to explain how things are linked Use reasoning to add new links automatically Be flexible and extensible Reasoning example: sameAs An ontology is a representation of entities and relations … … for a part of reality … … expressed in human and computer interpretable form

10 Hierarchical information organization

11 Changing the focus to relationships

12 Benefits of the Semantic Web Approach
Focuses on connections Connections have meaning, not just hierarchy Shared topics Human relationships Linkage through events Geographic proximity Temporal alignment Supports many dimensions of nearness

13 RDF “triples”

14 Connectivity at the micro level
Triples connect subjects with objects via a consistent set of relationships Jane Smith holds position in author of member of Dept. of Genetics College of Medicine Journal article Book chapter Book Genetics Institute Subject Predicate Object

15 Using Semantic Web Technology vs. Linked Open Data

16 What is Linked Open Data (LOD)?
Structured information, not just documents with text A common, simple format (triples) Open Available, visible, mine-able Anyone can post, consume, and reuse Linked Directly by reference Indirectly through common references and inference

17 An HTTP request can return HTML or data

18 Commonality among references
Shared types foaf: Person, Organization geo: Country bibo: Book, Academic Article, Journal Shared relationships geo: hasBorderWith, isInGroup, hasMember foaf: knows, homepage dc: creator skos: subject Shared instances defined as types and linked by relationships Agrovoc concepts: rice, sustainability DBPedia URIs for places, concepts, events Unique identifiers for researchers and organizations

19 Why Use LOD for Data Sharing?
LOD requires Being willing to make data open Providing enough structure and consistency so data can be linked The goal is a higher return on investment over time

20 The Linked Open Data Cloud
The critical next step in the broad accessibility of research, researchers, and research data is to make the underlying metadata and relationships available as linked open data. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.

21 LD4L Building Blocks: VIVO

22 What is VIVO? Software: An open-source semantic-web-based researcher and research discovery tool Data: Institution-wide, publicly-visible information about research and researchers Standards: A standard ontology (VIVO data) that interconnects researchers, communities, and campuses using Linked Open Data Community: An open community with strong national and international participation

23 VIVO connects scientists and scholars with and through their research and scholarship

24 So – How Does VIVO Actually Work?

25 VIVO is a Semantic Web application
Provides data readable by machines, not just text for humans Provides self-describing data via shared ontologies Defined types Defined relationships Provides search & query augmented by relationships Does simple reasoning to categorize and find associations Teaching faculty = any faculty member teaching a course All researchers involved with any gene associated with breast cancer (through research project, publication, etc.)

26 The VIVO/Vitro Platform
Ingest tools – getting batch data in Ontology editing tools – change what is being described and represented Instance editing tools – Edit instances of any of the things represented in the ontology (people, publications, organizations, etc.) Template/display system – Display instances and sets in a useful way

27 What does VIVO model? People and more Relationships among the above
Organizations, grants, programs, projects, publications, events, facilities, and research resources Relationships among the above Meaningful Bidirectional Navigable context Links to URIs elsewhere Concepts, identifiers People, places, organizations, events

28 The VIVO ontology Describes people and organizations in the process of doing research, scholarship, and creative activities Stays discipline neutral and supports description and discovery across all disciplines Uses existing domain terminology to describe the content of research Remains modular, flexible, and extensible An ontology is a representation of entities and relations … … for a part of reality … … expressed in human and computer interpretable form

29

30 Data, data, data VIVO harvests much of its data automatically from verified sources Reduces the need for manual input of data Provides an integrated and flexible source of publicly visible data at an institutional level External data sources Internal data sources Authoritative data, diverse formats, filter out private information Talk about verified data Talking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input. Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative! The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution. Individuals may also edit and customize their profiles to suit their professional needs

31 Typical data sources HR – people, appointments
Research administration – grants & contracts Registrar – courses Faculty reporting system(s) publications, service, research areas, awards Events calendar Internal and external news External repositories – e.g., Pubmed, Scopus

32 ResearchFacilities & Services
Complexity of Inputs People Grants Data Google Scholar Center/ Dept/ Program websites ResearchFacilities & Services Courses Tech transfer Publications VP Research Univ. Communications HPC HR data Faculty Reporting Grad School Pubmed Cross Ref Researcher.gov arXiv other databases NIH RePorter Self-editing Other campuses

33

34 Structured data for visualizations

35

36

37 Linked data indexing for search
Scripps VIVO UF VIVO WashU VIVO eagle-I Research resources IU VIVO Harvard Profiles RDF Ponce VIVO Other VIVOs Cornell Ithaca VIVO Solr search index Weill Cornell VIVO Iowa Loki RDF Alter-nate Solr index vivo search.org Digital Vita RDF Linked Open Data

38

39 Indexes 125K people and 1.3 million publications

40 Weill – Publications

41

42

43

44 VIVO is Extensible

45 Adding Research Resources and Facilities to VIVO
CTSAconnect OHSU, Harvard, Cornell, Florida, Buffalo & Stony Brook eagle-i sister NIH project – Harvard, OHSU, 7 others Facilities, services, techniques, protocols, skills, and research outputs beyond publications Extended ways to represent expertise Improve attribution for data and other contributions to science

46 Connecting researchers, resources, and clinical activities

47

48

49

50 Supporting Humanities and Artistic Works
Performances of a work Translations Collections and exhibits Steven McCauley and Theodore Lawless, Brown University VIVO-Humanities_McCauley.pdf

51 The VIVO Community is now over 100 institutions worldwide

52 5th Annual VIVO Conference
August 6-8, 2014 Austin, TX

53 How does LD4L build on VIVO
The LD4L ontology will use components of the VIVO-ISF ontology LD4L will use VIVO ontology design patterns The basis for SRSIS implementations at each institution will be Vitro plus LD4L ontology The multi-institution LD4L demonstration search will be an adaptation of VIVOsearch.org LD4L will link to existing VIVO data

54 LD4L Building Blocks: Hydra

55 Hydra slides courtesy of Tom Cramer

56 What Is Hydra? A robust repository fronted by feature-rich, tailored applications and workflows (“heads”) One body, many heads Collaboratively built “solution bundles” that can be adapted and modified to suit local needs. A community of developers and adopters extending and enhancing the core If you want to go fast, go alone. If you want to go far, go together. The only way to build a rich & robust solution is to engage a large community of developers. The only way to build a sustainable solution is to spur adoption by a community of institutions w/ vested interest in shared success.

57 Fundamental Assumption #1
No single system can provide the full range of repository-based solutions for a given institution’s needs, …yet sustainable solutions require a common repository infrastructure.

58 For Instance… Generally a single PDF Simple, prescribed workflow
ETD Deposit System General Purpose Institutional Repository Digitization Workflow System Simple Complex Generally a single PDF Simple, prescribed workflow Streamlined UI for depositors, reviewers & readers Heterogeneous file types Simple to complex objects One- or two-step workflow General purpose user interfaces Potentially hundreds of files type per object Complex, branching workflow Sophisticated operator (back office) interfaces A single application could not effectively cope with these three use cases; however any institution would want to safeguard the outputs of all these disparate systems in a digital repository for management and preservation. HYDRA gives a framework where ONE BODY (the repo) can support MULTIPLE HEADS (tailored applications)

59 Hydra Heads: Emerging Solution Bundles
Institutional Repositories University of Hull University of Virginia Penn State University Images Northwestern University (Digital Image Library) Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

60 Hydra Heads: Emerging Solution Bundles
Archives & Special Collections Stanford University University of Virginia Rock & Roll Hall of Fame Media Indiana University Northwestern University Rock & Roll Hall of Fame WGBH Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

61 Hydra Heads: Emerging Solution Bundles
Workflow Management (Digitization, Preservation) Stanford University University of Illinois – Urbana-Champagne Northwestern University Exhibits Notre Dame Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

62 Hydra Heads: Emerging Solution Bundles
ETDs Stanford University University of Virginia Etc. (Small) Data everyone… Future development progress will be 1) based on leveraging the existing toolsin the ecosystem to assemble new solutions, and 2) ongoing investments in and extensions to the infrastructure.

63 Fundamental Assumption #2
No single institution can resource the development of a full range of solutions on its own, …yet each needs the flexibility to tailor solutions to local demands and workflows.

64 Hydra Philosophy -- Community
An open architecture, with many contributors to a common core Collaboratively built “solution bundles” that can be adapted and modified to suit local needs A community of developers and adopters extending and enhancing the core One framework, many contributors The only way to build a rich & robust solution is to engage a large community of developers. The only way to build a sustainable solution is to spur adoption by a community of institutions w/ vested interest in shared success.

65 Hydra Philosophy -- Technical
Tailored applications and workflows for different content types, contexts and user interactions A common repository infrastructure Flexible, atomistic data models Modular, “Lego brick” services Library of user interaction widgets Easily skinned UI One body, many heads

66 Technical Framework - Components
Fedora provides a durable repository layer to support object management and persistence Solr, provides fast access to indexed information Blacklight, a Ruby on Rails plugin that sits atop solr and provides faceted search & tailored views on objects Hydra Head, a Ruby on Rails plugin that provides create, update and delete actions against Fedora objects

67 Major Hydra Components

68 So What is Hydra? Framework for generating Fedora front-end applications w/ full CRUD functionality That follows design pattern with common componentry and platforms Fedora, Ruby on Rails, Solr, Blacklight That supports distinct UI’s, content types, workflows, and policies Being developed by a community of 22 partner institutions (and growing)

69 How does LD4L build on Hydra?
We will create an ActiveTriples Hydra component to mimic ActiveFedora We will make it possible to use a TripleStore as a Hydra repository as well as Fedora The new Cornell Blacklight/Solr-based search will index from Cornell’s triple-based SRSIS We will explore using Hydra-based collections as data sources for LD4L data about resources

70 LD4L Building Blocks: LibraryCloud/ ShelfRank

71 Provides model for access to library data
Includes access to ShelfRank for Harvard Library resources Provides concrete example for creating an ontology for usage Source data for Harvard SRSIS instance

72 Enough background, what are we actually doing?

73 Developing Use Cases Currently have draft set of 33 use cases on LD4L public wiki Users include: faculty, student, dean, researcher, librarian, bibliographer, and cataloger Examples: Research guided by community usage; Compose a syllabus; Build a virtual collection; Info-rich maps; Who an author influenced

74 Identifying Data Sources
Bibliographic data: CUL/Harvard/SUL catalogs Person data: VIVO, Stanford CAP, Profiles Usage data: LibraryCloud, Cornell/Stanford circulation, BorrowDirect circulation Collections: Archival EAD, IRs, SharedShelf, Olivia, arbitrary OAI-PMH Annotations: Cornell CuLLR, Stanford DMS, Bloglinks, DBpedia, LibGuides Subjects & Authorities

75 Assembling the Ontology
General: VIVO-ISF; OpenAnnotation; SKOS Bibliographic: BIBFRAME, BIBO, FaBiO Provenance: PROV-O, PAV People/Organizations: FOAF, PROV, Schema.org Licensing: Creative Commons; Dublin Core Terms; Software Ontology Many vocabularies/identifiers: VIAF, Getty, ORCID, ISNI

76 Project timeline 2014 Jan-June 2014: Initial ontology design; identify data sources; identify external vocabularies; begin SRSIS and Hydra ActiveTriples development July-Dec 2014: Complete initial ontology; complete initial ActiveTriples development; pilot initial data ingests into Vitro-based SRSIS instance at Cornell

77 Workshop – January 2015 Hold a two-day workshop for 25 attendees from interested library, archive, and cultural memory institutions Demonstrate initial prototypes of SRSIS and ontology Obtain feedback on initial ontology design Obtain feedback on overall design and approach Make connections to support participants in piloting this approach at their institutions Understand how institutions see this approach fitting in with their own multi-institutional collaborations and existing cross-institutional efforts such as the Digital Public Library of America, VIVO, and SHARE

78 Project timeline Jan-June 2015
Pilot SRSIS instances at Harvard and Stanford Populate Cornell SRSIS instance from multiple data sources including MARC catalog records, EAD finding aids, VIVO data, CuLLR, and local digital collections Develop a test instance of the SRSIS Search application harvesting RDF across the three partner institutions Integrate SRSIS with ActiveTriples

79 Project timeline July-Dec 2015
Implement fully functional SRSIS instances at Cornell, Harvard, and Stanford Public release of open source SRSIS code and ontology Public release of open source ActiveTriples Hydra Component Create public demonstration of SRSIS Search-based discovery and access system across the three SRSIS instances

80 Summing Up

81 Work So Far Initial project meeting at Stanford Jan. 30-31
Developed initial set of use cases and data sources Set up initial working groups: Ontology, Engineering, Use Cases, and Outreach and Workshop Planning Identified a list of potential partners Find out more at:

82 Project Outcomes Open source extensible SRSIS ontology compatible with VIVO ontology, BIBFRAME, and other existing library LOD efforts Open source SRSIS semantic editing, display, and discovery system Project Hydra compatible interface to SRSIS, using ActiveTriples to support Blacklight search across multiple SRSIS instances

83 Questions?


Download ppt "Linked Data For Libraries (LD4L)"

Similar presentations


Ads by Google