Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.

Slides:



Advertisements
Similar presentations
Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Unstructured Content Management Taxonomic Publishing Models Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Adding Semantics to Enterprise Search Workshop
Mashup Mindset Moving Mashups to Next Level Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Webinar
Tom Reamy Chief Knowledge Architect KAPS Group
Tom Reamy Chief Knowledge Architect KAPS Group
Enterprise Social Networks A New Semantic Foundation
Program Chair: Tom Reamy Chief Knowledge Architect
Text Analytics Workshop: Introduction
Text Analytics Workshop
Program Chair: Tom Reamy Chief Knowledge Architect
Presentation transcript:

Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20 New York

2 Agenda  Introduction – Text Analytics Basics  Evaluation Process & Methodology – Two Stages – Initial Filters & POC  Proof of Concept – Methodology – Results  Text Analytics and “Text Analytics”  Conclusions

3 KAPS Group: General  Knowledge Architecture Professional Services  Virtual Company: Network of consultants – 8-10  Partners – SAS, SAP, FAST, Smart Logic, Concept Searching, etc.  Consulting, Strategy, Knowledge architecture audit  Services: – Taxonomy/Text Analytics development, consulting, customization – Evaluation of Enterprise Search, Text Analytics – Text Analytics Assessment, Fast Start – Technology Consulting – Search, CMS, Portals, etc. – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories

4 Introduction to Text Analytics Text Analytics Features  Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets  Summarization – Customizable rules, map to different content  Fact Extraction – Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc.  Sentiment Analysis – Rules – Objects and phrases

5 Introduction to Text Analytics Text Analytics Features  Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE  This is the most difficult to develop  Build on a Taxonomy  Combine with Extraction – If any of list of entities and other words

Case Study – Categorization & Sentiment 6

7

8

Evaluation Process & Methodology Overview  Start with Self Knowledge – Think Big, Start Small, Scale Fast  Eliminate the unfit – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 3 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors  Deep POC (2) – advanced, integration, semantics  Focus on working relationship with vendor. 9

Design of the Text Analytics Selection Team Traditional Candidates – IT&, Business, Library  IT - Experience with software purchases, needs assess, budget – Search/Categorization is unlike other software, deeper look  Business -understand business, focus on business value  They can get executive sponsorship, support, and budget – But don’t understand information behavior, semantic focus  Library, KM - Understand information structure  Experts in search experience and categorization – But don’t understand business or technology 10

Design of the Text Analytics Selection Team  Interdisciplinary Team, headed by Information Professionals  Relative Contributions – IT – Set necessary conditions, support tests – Business – provide input into requirements, support project – Library – provide input into requirements, add understanding of search semantics and functionality  Much more likely to make a good decision  Create the foundation for implementation 11

Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge  Strategic and Business Context  Info Problems – what, how severe  Strategic Questions – why, what value from the text analytics, how are you going to use it – Platform or Applications?  Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization,  Text Analytics Strategy/Model – forms, technology, people – Existing taxonomic resources, software  Need this foundation to evaluate and to develop 12

13 Varieties of Taxonomy/ Text Analytics Software  Taxonomy Management – Synaptica, SchemaLogic  Full Platform – SAS, SAP, Smart Logic, Linguamatics, Concept Searching, Expert System, IBM, GATE  Embedded – Search or Content Management – FAST, Autonomy, Endeca, Exalead, etc. – Nstein, Interwoven, Documentum, etc.  Specialty / Ontology (other semantic) – Sentiment Analysis – Lexalytics, Clarabridge, Lots of players – Ontology – extraction, plus ontology

Vendors of Taxonomy/ Text Analytics Software – Attensity – Business Objects – Inxight – Clarabridge – ClearForest – Concept Searching – Data Harmony / Access Innovations – Expert Systems – GATE (Open Source) – IBM Infosphere – Lexalytics – Multi-Tes – Nstein – SAS – SchemaLogic – Smart Logic – Synaptica 14

15 Initial Evaluation – Factors Traditional Software Evaluation - Deeper  Basic & Advanced Capabilities  Lack of Essential Feature – No Sentiment Analysis, Limited language support  Customization vs. OOB – Strongest OOB – highest customization cost  Company experience, multiple products vs. platform  Ease of integration – API’s, Java – Internal and External Applications – Technical Issues, Development Environment  Total Cost of Ownership and support, initial price  POC Candidates – 1-4

16 Initial Evaluation – Factors Case Studies  Amdocs – Customer Support Notes – short, badly written, millions of documents – Total Cost, multiple languages, Integration with their application – Distributed expertise – Platform – resell full range of services, Sentiment Analysis – Twenty to Four to POC (Two) to SAS  GAO – Library of 200 page PDF formal documents, plus public web site – People – library staff – 3-4 taxonomists – centralized expertise – Enterprise search, general public – Twenty to POC with SAS

Phase II - Proof Of Concept - POC  Measurable Quality of results is the essential factor  4 weeks POC – bake off / or short pilot  Real life scenarios, categorization with your content  2 rounds of development, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Majority of time is on auto-categorization  Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time  Taxonomy Developers – expert consultants plus internal taxonomists 17

18 POC Design: Evaluation Criteria & Issues  Basic Test Design – categorize test set – Score – by file name, human testers  Categorization & Sentiment – Accuracy 80-90% – Effort Level per accuracy level  Quantify development time – main elements  Comparison of two vendors – how score? – Combination of scores and report  Quality of content & initial human categorization – Normalize among different test evaluators  Quality of taxonomists – experience with text analytics software and/or experience with content and information needs and behaviors  Quality of taxonomy – structure, overlapping categories

Text Analytics POC Outcomes Evaluation Factors  Variety & Limits of Content – Twitter to large formal libraries  Quality of Categorization – Scores – Recall, Precision (harder) – Operators – NOT, DIST, START,  Development Environment & Methodology – Toolkit or Integrated Product – Effort Level and Usability  Importance of relevancy – can be used for precision, applications  Combination of workbench, statistical modeling  Measures – scores, reports, discussions 19

POC and Early Development: Risks and Issues  CTO Problem –This is not a regular software process  Semantics is messy not just complex – 30% accuracy isn’t 30% done – could be 90%  Variability of human categorization  Categorization is iterative, not “the program works” – Need realistic budget and flexible project plan  Anyone can do categorization – Librarians often overdo, SME’s often get lost (keywords)  Meta-language issues – understanding the results – Need to educate IT and business in their language 20

Text Analytics and “Text Analytics” – Text Mining  TA is pre-processing for text mining  TA adds huge dimensions of unstructured text – Now 85-90% of all content, Social Media  TA can improve the quality of text – Categorization, Disambiguated metadata extraction  Unstructured text into data - What are the possibilities? – New Kinds of Taxonomies – emotion, small smart modular – Information Overload – search, facets, auto-tagging, etc. – Behavior Prediction – individual actions (cancel or not?) – Customer & Business Intelligence – new relationships – Crowd sourcing – technical support – Expertise Analysis – documents, authors, communities 21

Conclusion  Start with self-knowledge – what will you use it for? – Current Environment – technology, information  Basic Features are only filters, not scores  Integration – need an integrated team (IT, Business, KA) – For evaluation and development  POC – your content, real world scenarios – not scores  Foundation for development, experience with software – Development is better, faster, cheaper  Categorization is essential, time consuming  Text Analytics opens up new worlds of applications 22

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services