Download presentation
Presentation is loading. Please wait.
Published byRobyn Hubbard Modified over 9 years ago
1
A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development Access Innovations, Inc. bob_kasenchak@accessinn.com @taxobob Copyright 2015 Access Innovations
2
What? Why? How? What Happened? Copyright 2015 Access Innovations PLOS POC Project to establish links from terms in the thesaurus to corresponding concepts in DBpedia; Used one branch for proof-of-concept Use DBpedia Spotlight query tool to automatically generate candidates; validate by hand; query off Abstracts from DBpedia and add to thesaurus; Add backlinks to DBpedia Results and lessons learned
3
What? Why? Access Innovations and PLOS POC project to equate terms from the PLOS Thesaurus to open data concepts to: Enable addition of information to term records, e.g., definition/abstract Move towards Linked Open Data by finding equivalent concept in DBpedia for each thesaurus term Provide linkouts from PLOS Subject Area pages to external references Copyright 2015 Access Innovations
4
DBpedia: Structured data from Wikipedia Abstract/Definition Images External Links Photos Foreign language wikis Subtopics &c. Copyright 2015 Access Innovations
5
PLOS Subject Area Landing Page Copyright 2015 Access Innovations Establish Links to add information… (abstract, external resources, etc.)
6
Copyright 2015 Access Innovations …By Adding Information to Thesaurus Copyright 2015 Access Innovations
7
Linked Open Data Cloud Copyright 2015 Access Innovations CKAN version: http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/
8
How? Establishing Links Copyright 2015 Access Innovations DBpedia Spotlight API (free) allows you to send text to be matched against DBpedia concepts Our input text was single terms from PLOS Thesaurus We hand QC’d all of the results Added a custom field in Data Harmony Thesaurus Master Tool to hold abstract/definition information Once links were verified, automatically queried abstracts from DBpedia and added to thesaurus term records
9
DBpedia Spotlight: Matching to Thesaurus Terms Copyright 2015 Access Innovations
10
How? Adding Backlinks to PLOS in DBpedia Copyright 2015 Access Innovations This is accomplished by editing Wikipedia… …but it’s complicated (more on this later)
11
What Happened? Results Copyright 2015 Access Innovations Spotlight match?Total Subject Area Terms% of Subject Area Terms Match is top hit7159.7% Match is in position 2-51512.6% No – matched manually108.4% No – no match found2117.6% Yes but false positive21.7%
12
What Happened? Results Copyright 2015 Access Innovations Lessons Learned during Matching Process: Your taxonomy is more granular than DBpedia: not every concept will have a match Spotlight performs better with a block of text than single terms – And our inputs were just terms from PLOS thesaurus – Results will HAVE to be QC’d – fully automating the process is a non- starter from an accuracy standpoint – Some of the false automatic matches were hilarious Overall: – Our methodology was basically sound – The process is pretty painless but requires QC
13
What Happened? Adding Backlinks Copyright 2015 Access Innovations Lessons Learned during Backlinking Process: Can’t edit DBpedia directly; this information is crawled from Wikipedia pages; Added links to some Wikipedia pages experimentally; Eventually they should show up in DBpedia; but There is some question as to the appropriateness of the links (per Wikipedia), so Even though the PLOS subject area pages are stable URIs, have relevant content etc. Best option is probably to publish the PLOS vocabulary (in OWL or perhaps SKOS) including the URI for each term, which would link to the URI for each Subject Area page Using OWL:sameAS instead of dbo:wikiPageExternalLink
14
What Next? Copyright 2015 Access Innovations Present results Refine methodology Figure out best practices for backlinking Apply to entire PLOS thesaurus (~11,000 terms) Declare victory
15
Linked Data and Taxonomies THANKS!ANYQUESTIONS? Bob Kasenchak Access Innovations, Inc. bob_kasenchak@accessinn.com @taxobob Copyright 2015 Access Innovations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.