Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper 2014-09-16 A brief intro to machine learning & data science for Libraries.

Slides:



Advertisements
Similar presentations
Lucy Diamond, Research Scientist, Ordnance Survey 18/04/2012 Rapid Assembly of Geo-Centred Linked Data Applications.
Advertisements

Drupal and the Semantic Web Bill Shaouy An Introduction.
Requirements Engineering Process
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
Linked Library Data Tuning Library Metadata for the [Semantic] Web Presented ALCTS RDA Webinar Series Corey A Harper.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Theo Andrew, Edinburgh University Library Choosing Suitable Open-Source Repository Software Choosing Suitable Open Source Repository Software Theo Andrew.
Lorcan Dempsey OCLC Big Heads – Heads of Technical Services of Large Research Libraries ALA 2013 Chicago 28 June things about
From content standards to RDF Gordon Dunsire Presented at AKM 15, Porec, 2011.
The Hydra Framework as a Series of Diagrams Naomi Dushay Stanford University Libraries April,
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Requirements Engineering for Semantic CMS
12/03/ Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
1 Linked Data outlook Haifa12 th September  Ex Libris Ltd., Internal and Confidential.
WEB MINING. Why IR ? Research & Fun
LIFECYCLE METADATA FOR DIGITAL OBJECTS Danielle Cunniff Plumer School of Information The University of Texas at Austin Summer 2014.
MetaLib: hidden treasures & endangered species Mario Kowalak (University Library, FU Berlin) IGeLU Conference, 8 – 10 Sept. 2013, Freie Universität Berlin.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
LEVERAGING THE DEEPER GRAPH (VIA QUERIES OR PATTERNS) STEVEN FOLSOM PAOLO CICCARESE LD4L USE CASE 4.
RDF AND LINKED DATA Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill.
Linked Open Data stuff Gordon Dunsire Opening Library Linked Data to National Heritage: Perspectives on International Practice (2nd Linked Open Data Conference),
Linked Data for Libraries, Archives, Museums. Learning objectives Define the concept of linked data State 3 benefits of creating linked data and making.
RDA AND LINKED DATA: MOVING BEYOND THE RULES Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill.
Linked Library Data Miiya Holmes October 6-7, 2012.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
LODLAM Presented at ELUNA 2014 by Corey A Harper Current Trends, Tools & Techniques, and the Role of Vendors.
ALEPH at the Crossroad IGeLU Oxford, 2014 Dalia Mendelsson The Library Authority.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Open Repositories 2015 Sharon Farnel, University of Alberta
1 Primo Product Working Group IGeLU Agenda for Today PWG Business Primo PWG members and roles Work of the group in 2009/10 Enhancements management.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
What Can Do for You! Fabian Christ
Context and Prosopography: Putting the 'Archives' Into LOD-LAM Corey A Harper SAA MDOR
Session 4B – User Experience (The Catalogue and You) New display models of bibliographic data and resources: cataloguing/resource description and search.
Next generation library catalogs and the integration of gazetteer information for geographical research Julie Sweetkind-Singer Assistant Director of Geospatial,
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
Jenn Riley Metadata Librarian Digital Library Program.
1 Preparations for Implementing RDA in Ex Libris’ Products ALA Annual Conference | Anaheim, CA | 24 June 2012 Mike Dicus, Product Manager Ex Libris (USA),
Natural language processing tools Lê Đức Trọng 1.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
MOODy :) Investigations into Massive Open Online Discovery at IU Juliet Hardesty Courtney Greene McDonald Bryan J Brown
An OAI-Compliant Federated Physics Digital Library for the NSDL Department of Computer Science Old Dominion University, Norfolk, VA In Collaboration.
Linked Data: Emblematic applications on Legacy Data in Libraries.
Introduction to the Semantic Web and Linked Data
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
UC San Diego Library: Where we are with Linked Data Arwen Hutt.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Ex Libris, LOD and BIBFRAME
Combine_and_stir (Aleph data + RDF + Python + other things) IGeLU 2015 Developer’s Day Budapest, Hungary Laura Akerman.
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Ann Miller University of Oregon Analysis Paralysis: Using Alma Analytics.
INHA UNIVERSITY, KOREA Rainer Simon Austrian Institute of Technology.
Archival Stewardship of using ePADD Glynn Edwards Stanford University Libraries March 2, 2016.
Putting Linked Data at the Service of Libraries
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
The Re3gistry software and the INSPIRE Registry
Data Warehousing and Data Mining
Consuming JSON-LD: Experiments with Primo's Latest Linked Data
Searching and browsing through fragments of TED Talks
AGMLAB Information Technologies
RDA in a non-MARC environment
Microsoft Azure Data Catalog
Presentation transcript:

Natural Language Processing for LODLAM Presented at IGeLU 2014 by Corey A Harper A brief intro to machine learning & data science for Libraries

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Context Narrative Story telling The Library's story, and the Archives story, but also…

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Users’ stories Scholars' stories Adding context through recombinant metadata

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Scholars & Users Stories – Tim Sherratt Also:

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Library Authority Data “Include links to other URIs. so that they can discover more things.” Short of providing and linking to URIs, this *is* authority data. This is what our authority files are for.

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Linked data is about context authorities provide context and yet our controlled vocabs are nearly gone because the interfaces to them were broken

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014

The Death of Browse Next-Gen Discovery Systems don't make use of Authority Control “Browse” was/is broken as a UI Design Rich data in Authorities, disconnected from narrative, context, search Richer “Authority” type data outside libraries... “Next Gen Next Gen Discovery…

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014

Fuzzy Wuzzy – Seat Geek Fuzzy Wuzzy – Awesome Library from SeatGeek

Slide courtesy of Doug Oard Univ. of Maryland

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Tools - Natural Language Processing DBPedia Spotlight Zemanta: Open Calais: Open Refine: DataTXT: AlchemyAPI: FuzzyWuzzy:

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014

Where does this lead? We need new interfaces new tools for new kind of catalogers for knowledge organization experts

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Linked Jazz Back End

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Primo PNX and Authorities Indexing Cross References New Browse Functionality Authority Control from Aleph / Alma What about non-MARC, or non- Aleph Data? Matching Strings to Authorities

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Enter Open Refine

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Match strings to vocabularies…

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Like LCNAF…

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Or Wikipedia

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Automated Authority Control?

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014

Open Refine RDF Skeleton

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014

Proposed System Architecture

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Hydra Modeling & Architecture Approaches to Provenance Prov-O Named Graphs Named Datastreams “n” nyucore “records” Same properties defined for each Keep data sources separate Merge for display in Blacklight & export to Primo

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Separate Metadata Datastreams source_metadata, enrich_metadata Reload one or both without affecting other or native metadata native_metadata Edited only through Hydra UI Partitioned from external sources

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Metadata Provenance

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Fedora Datastreams

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Blacklight User Interface

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Where does this lead? We need new interfaces new tools for new kind of catalogers for knowledge organization experts

A Role for Ex Libris Alma &/or Primo Named Entity Recognition Vocabulary Reconciliation Provenance Management Primo Central Named Entity Recognition on Full Text Auto Classification

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 A bit louder... we need new interfaces we need enterprise tools Integrated into our metadata management systems for new kind of catalogers for knowledge organization experts

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Simplified Workflow Proposal

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 More Tools – At Programming Level Open NLP: Stanford Natural Language Toolkit: Python Tools SciKitLearn, Pandas, NLTK, SciPi, NumPi

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 More Data Science-ey Tools

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Data Science Techniques Feature Extraction / Feature Engineering Predictive Modeling Probabilistic Classification – Large Multi-Class Problems Text Analytics Vectorization Bags & Sets of Words TF/IDF N-Grams Sparse Matrices

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Simple Example – Predict Yelp Star Ratings

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Fitting a Model – Naïve Bayes

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Data Science Venn Diagram

Harper – IGeLU – NLP 4 LODLAM – Sept 16,

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Where can we go from here? NER is just the beginning Feature Engineering Hiring Statisticians Clustering & Classification Vocabulary Pruning and Engineering Manageable 10-20k Class Text Classification Problems Domain Specific Ex Libris’ Activity in this space

Harper – IGeLU – NLP 4 LODLAM – Sept 16, 2014 Thanks!