Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Slides:



Advertisements
Similar presentations
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
Faceted Navigation Design Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Deep Text New Approaches in Text Analytics and Knowledge Organization Tom Reamy Chief Knowledge Architect KAPS Group Author: Deep.
Text Analytics Webinar
Tom Reamy Chief Knowledge Architect KAPS Group
Text Analytics Tutorial
Tom Reamy Chief Knowledge Architect KAPS Group
Combining Taxonomy, Ontology, Text, and Data A Deep Text Approach
Enterprise Social Networks A New Semantic Foundation
Program Chair: Tom Reamy Chief Knowledge Architect
Using Text Analytics to Spot Fake News
Text Analytics Workshop: Introduction
Text Analytics Workshop
Program Chair: Tom Reamy Chief Knowledge Architect
Expertise Location Basic Level Categories
Presentation transcript:

Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2 Agenda  Introduction  Case Studies – Application: Faceted Search, Text Analytics  Text Analytics - Elements – Approaches  Project Process – Research Foundation – Taxonomy and Content – Text Analytics Development  Conclusion

3 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted & emotion taxonomies, natural categories Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics, Social Media development, consulting – Text Analytics Quick Start – Audit, Evaluation, Pilot  Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.  Program Chair – Text Analytics World  Presentations, Articles, White Papers –  Current – Book – Text Analytics: How to Conquer Information Overload, Get Real Value from Social Media, and Add Smart Text to Big Data

Taxonomy Boot Camp: Case Studies  DOT – Adding text analytics to SharePoint Search  Fragmented environment – 51 DOTs, 5-15 Districts  Project is main organizational unit – wanted cross-project capability  GAO and World Bank  Search, New Enterprise Taxonomy, Add auto-categorization 4

5

6 Basic Solution: Taxonomy and Facets and Ontology  Taxonomy of Subjects / Disciplines: – Engineering > Bridge Engineering > Bridge Design Standards  Facets: – Organization > Division > Group – Clients > Federal > EPA – Equipment > Emergency Equipment > Firefighting Equipment – Location > District > – Items > Construction Tools > Asphalt Rake – Materials > Concrete > Mixed Concrete – Content Type – Formal Documents > Work Orders

Discussion Tom Reamy KAPS Group Knowledge Architecture Professional Services

8

9

10 Taxonomy Boot Camp Multi-dimensional and Smart  Faceted Navigation has become the basic/ norm – Facets require huge amounts of metadata – Entity / noun phrase extraction is fundamental – Automated with disambiguation (through categorization)  Taxonomy – two roles – subject/topics and facet structure – Complex facets and faceted taxonomies  Clusters and Tag Clouds – discovery & exploration  Auto-categorization – aboutness, subject facets – This is still fundamental to search experience – InfoApps only as good as fundamentals of search

11 Taxonomy Boot Camp Elements of Text Analytics  Text Mining – NLP, statistical, predictive, machine learning  Extraction – entities – known and unknown, concepts, events  Semantic Technology – ontology, fact extraction  Sentiment Analysis - Positive Negative – products, companies, ?  Auto-categorization – Training sets, Terms – Rules – simple – position in text (Title, body, url) – Boolean– Full search syntax – AND, OR, NOT – Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE  Platform for multiple features – Sentiment, Extraction – Disambiguation - Identification of objects, events, context – Distinguish Major-Minor mentions – Model more subtle sentiment

12

13 Taxonomy Boot Camp Adding Structure to Unstructured Content  Beyond Documents – categorization by corpus, by page, sections or even sentence or phrase  Documents are not unstructured – variety of structures – Sections – Specific - “Abstract” to Function “Evidence” – Multiple Text Indicators – Categorization Rule  Corpus – document types/purpose – Textual complexity, level of generality  Applications require sophisticated rules, not just categorization by similarity

14

Taxonomy Boot Camp: Research Foundation Quick Start Step One- Knowledge Audit  Info Problems – what, how severe  Formal Process – Knowledge Audit – Contextual & Information interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining  Informal for smaller organizations, specific application  Category modeling – Cognitive Science – how people think – Panda, Monkey, Banana  Natural level categories mapped to communities, activities Novice prefer higher levels Balance of informative and distinctiveness  Strategic Vision – Text Analytics and Information/Knowledge Environment 15

16 Text Analytics Development: Categorization Process Start with Taxonomy and Content  Starter Taxonomy – If no taxonomy, develop (steal) initial high level Textbooks, glossaries, Intranet structure Organization Structure – facets, not taxonomy  Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large – Orthogonal categories  Content Selection – Map of all anticipated content – Selection of training sets – if possible – Automated selection of training sets – taxonomy nodes as first categorization rules – apply and get content

17 Taxonomy Boot Camp Text Analytics Development: Categorization Process  Start: Term building – from content – basic set of terms that appear often / important to content – Auto-suggested and/or human generated  Add terms to rule, get 90%+ recall  Apply to broader set of content, build back up to 90%+  Apply to new types of content – build precision -- Rules  Repeat, refine, repeat, refine, repeat  Develop logic templates  Test against more, new content – add more terms, refine logic of rules  Repeat until “done” – 90%?

18 Taxonomy Boot Camp Text Analytics Development: Entity Extraction Process  Facet Design – from Knowledge Audit, K Map  Find and Convert catalogs: – Organization – internal resources – People – corporate yellow pages, HR – Include variants – Scripts to convert catalogs – programming resource  Text Mining – Terms – Subject Matter Experts  Build initial rules – follow categorization process – Differences – scale, threshold – application dependent – Recall – Precision – balance set by application – Issue – disambiguation – Ford company, person, car

19

20

21

22 Conclusion  Think Big, Start Small, Scale Fast – Strategic Foundation  Faceted Search Works – But Requires Metadata+  Combination of Data & Text, Structure & Unstructured  Taxonomy Design – Small Modules – Part of Ontology – Subject + Multiple Single Facets  Text Analytics is a Platform – Search and Applications – FOIA Requests – all projects in which a model Guardrail from Supplier Y was installed – when, location (Route 29),  Taxonomy is Dead! Long Live Catonomy! – Mind the Gap – Need Categorization (Don’t Just Sit There!)

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services