Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Slides:



Advertisements
Similar presentations
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

© 2012 Association for Computing Machinery Intro to the ACM Digital Library February 24, 2012 Intro to the ACM Digital Library February 24, 2012.
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
This PowerPoint presentation and handouts are posted under “Library Classes” on library website.
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Single Search By Rakphao Theppan, librarian Searching Online Resources.
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Using Social Care Online: an overview Version 1.0 April 2015.
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
Undergraduate Project Preparation – Literature review and referencing.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
WISER : OxLIP+ Workshops in Information Skills and Electronic Research Oxford Libraries Information Platform Craig Finlay Gillian Beattie.
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Introducing Intute: Social Sciences Your Guide to the Best of the Web.
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Deep Text New Approaches in Text Analytics and Knowledge Organization Tom Reamy Chief Knowledge Architect KAPS Group Author: Deep.
Tom Reamy Chief Knowledge Architect KAPS Group
Using Social Care Online: an overview
Tom Reamy Chief Knowledge Architect KAPS Group
Enterprise Social Networks A New Semantic Foundation
Program Chair: Tom Reamy Chief Knowledge Architect
Taxonomies, Lexicons and Organizing Knowledge
Search Techniques and Advanced tools for Researchers
Using Text Analytics to Spot Fake News
Text Analytics Workshop: Introduction
Program Chair: Tom Reamy Chief Knowledge Architect
Expertise Location Basic Level Categories
Presentation transcript:

Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

2 Agenda  Introduction  Project: Update ACM taxonomy – after 12+ years  Information Environment  Text Mining / Text Analytics   Multiple Methods / Reports  Conclusion

3 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted & emotion taxonomies, natural categories Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics, Social Media development, consulting – Text Analytics Quick Start – Audit, Evaluation, Pilot  Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.  Program Chair – Text Analytics World – March 29-April 1 - SF  Presentations, Articles, White Papers –  Current – Book – Text Analytics: How to Conquer Information Overload, Get Real Value from Social Media, and Add Smart Text to Big Data

4 Introduction: Approach  Is Automatic Taxonomy Development Here Yet?  Not Yet  But it is getting closer  Hybrid: – Taxonomists, SME’s, database analysts, text analysts – Text Mining software – basic text analysis – power – Text analytics software – brains  New taxonomy terms & structure – Old = indexing, authors adding tags & keywords – New = auto-tagging, applications

5 Information Environment  Existing Taxonomy: Computing Classification System  Content: – Database export of Guide to the Computing Literature bibliographic records (.txt; approximately 7GB in 58 files.) – Statistical distribution of CCS categories across the Digital Library and Guide to Computing Literature (Excel; 4 files) – ACM Digital Library full text files (PDFs and XML metadata, including CCS categories; approximately 170GB in 240,000 files) – Ralston Encyclopedia of Computer Science (PDFs and HTML of each article with XML metadata, including CCS categories; approximately 350MB in 1,850 files)

Text Analytics in Taxonomy Development Case Study – Multiple Methods  Text Mining - terms in documents – frequency, date, source, etc. – Text Preparation – Create multiple filters  Quality – important terms, co-occurring terms  Time savings – only feasible way to scan documents  Clustering – suggested categories, chunking for editors – Clustering within clusters - explore  Entity Extraction – people, organizations, programming languages, hardware/devices, etc.  Joint Work Sessions – interactive exploration 6

Case Study – Taxonomy Development 7

8

9

10

Case Study – Taxonomy Development 11

12 Multiple Sets of Reports  Keyword Frequency – First Pass – 3,026 – Total – 508, 941 (Get from Big Database) – Sub-Totals Year Pre-1998, By Year, By 5 year blocks Map to other variables – Journals, Authors – basis for communities  Keywords in Abstract/Title  Cluster analysis of keyword-abstract-title  Search Terms in keyword-abstract-title

13 Entity Extraction – Company, Internet, Organization, Title

14 Multiple Methods - Reports  Spreadsheets – static reports  Database query reports – Create multiple slices, views, filters  Working reports – eliminate more noise words  Multiple mapping – extractions, author tags &keywords  Map – frequency in abstracts, titles, articles  Search logs – terms and phrases  Date ranges – trend reports – per terms, new words

15

16

17 Conclusions  Auto-taxonomy not here - Yet  Scale requires semi-automated solution  Human effort – initial design, text preparation – Now would add more auto-categorization  Human effort – analysis & refinement – of queries, text mining, and taxonomy  Simple taxonomies are better – part of information ecosystem – Lower levels of terms – into auto-tagging rules  Early 2015: New Book: – Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data – Title might be shorter but it will be cover all you need to know

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services