Consuming JSON-LD: Experiments with Primo's Latest Linked Data Corey Harper 2015-09-05 IGeLU 2015 – Developers Day 1
Curl | jq | less to Beta Search REST API curl "http://bobcatdev.library.nyu.edu/primo_library/libweb/webservices/rest/v1/pnxs?q=any,contains,monkeys&inst=NYU" | jq -C '.docs' | less -r
require 'rdf‘ ‘json/ld’ 'rdf/turtle‘ 'openssl' Ruby RDF & JSON-LD require 'rdf‘ ‘json/ld’ 'rdf/turtle‘ 'openssl' resource = RDF::Resource(RDF::URI.new("http://bob catdev.library.nyu.edu/primo_library/libwe b/webservices/rest/v1/pnxs/L/nyu_aleph0 01770007?inst=NYU")) graph = RDF::Graph.new << JSON::LD::API.toRdf(resource) graph.dump(:ttl, prefixes: {list prefixes})
This doesn’t actually work. Not _quite_ valid JSON-LD What just happened? This doesn’t actually work. Not _quite_ valid JSON-LD It needs an actual context Add the context yourself, and you get errors when validating / linting http://json-ld.org/playground/ https://github.com/ExLibrisGroup/Primo.PN X-context/blob/master/PNX-context.json
Still works in JQ (but maybe not JSON Tools?)
Consuming JSON (-LD?): Experiments in with (and Without) Primo's New RESTful Analytics Search API 7
Distribution of Titles
Understanding your collections Understanding queries and usage Why do this stuff? Understanding your collections Understanding queries and usage Identifying Strengths Topic Modeling Clustering Recommendation systems (Automatic Classification?)
On Github (Also, DPLA Examples) https://github.com/chrpr/dpla-analytics/blob/master/primo/REST-api.ipynb
Tree Map of Title Words
Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram 2015-04-18 Harper - Can Metadata be Quantified? - DPLAFest 2015 14
APIs have much potential Limited in scope (for now) Analytics Writ Large APIs have much potential Limited in scope (for now) NYU External Data Warehousing Analyze Query Logs Analyze Dedup Merger Analyze URLs with (and outside) of Aleph
Tableau – Business Intelligence R – Stats Analysis Python Toolkit Tableau – Business Intelligence R – Stats Analysis Python SciPy, NumPy, Pandas, etc NLTK jq, awk, sed, grep, sort, uniq, tr, wc, etc. 2015-04-18 Harper - Can Metadata be Quantified? - DPLAFest 2015 16
Data Quality Control
Duplicate OCLC # Analysis
Collection Management Decisions Warehouse combines: Primo dedupmrg & frbr matches Ebook SUSHI/Counter Stats Aleph Circ Stats Offsite & De-accessioning Decisions Regression analysis to demo correlations 2015-04-18 Harper - Can Metadata be Quantified? - DPLAFest 2015 20
CRISP-DM "CRISP-DM Process Diagram" by Kenneth Jensen Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons http://commons.wikimedia.org/wiki/File:CRISP-DM_Process_Diagram.png 2015-04-18 Harper - Can Metadata be Quantified? - DPLAFest 2015 21
Possibilities & Next Steps Exploratory Data Analysis. More. Answering questions about data quality More Topic Maps Bi- and Tri-gram Tokenization, Hapaxes Data Cleanup and QA Processing income batch data Integrate with other data streams: Google Analytics, AppDynamics, Kibana Cross system logs, searches, etc. 2015-04-18 Harper - Can Metadata be Quantified? - DPLAFest 2015 22
corey.harper@nyu.edu 212.998.2479 @chrpr 23 Thanks! 2015-04-18 Harper - Can Metadata be Quantified? - DPLAFest 2015 23 23