Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Slides:



Advertisements
Similar presentations
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Enterprise Search with FAST Rick McDannel Manager of Information Technology.
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Buy, Build, Automate: Why you should Buy Your Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Customer Forum OTech’s New Web Publishing Service Web Services Section – April 29, 2015.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Taxonomies and Faceted Navigation Getting the Best of Both
Mashup Mindset Moving Mashups to Next Level Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
Faceted Navigation Design Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Faceted Navigation: Best of Browse and Search Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Enterprise Social Networks A New Semantic Foundation
Taxonomies, Lexicons and Organizing Knowledge
Text Analytics Workshop: Introduction
Program Chair: Tom Reamy Chief Knowledge Architect
Semantic Wikis Expedition #52 Conor Shankey CEO July 18, 2006
Presentation transcript:

Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2 Agenda – Part II  Introduction – Text Analytics Can Rescue Search – Elements of Text Analytics  Evaluating Text Analytics Software – Varieties of Software – platform and embedded  Three Phase Process / Two Examples – Initial Evaluation – 4 vendors – Demo’s – from 4 to 2 – 4-6 week POC – 2 vendors  Conclusions

3 KAPS Group: General  Knowledge Architecture Professional Services  Virtual Company: Network of consultants – 8-10  Partners – SAS, SAP, FAST, Smart Logic, Concept Searching, etc.  Consulting, Strategy, Knowledge architecture audit  Services: – Taxonomy/Text Analytics development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Evaluation of Enterprise Search, Text Analytics – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories

4 Introduction Text Analytics to the Rescue  Enterprise Search is Dead!  Taxonomy is Dead!  Long Live Text Analytics!  ECM and ES failed because search is about semantics / meaning not technology  Taxonomy failed because it is too rigid, and too dumb  Metadata failed because it is too hard to do for authors (not really)  They all failed because Search is a semantic infrastructure element not a project

5 Introduction to Text Analytics Text Analytics Features  Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets  Summarization – Customizable rules, map to different content  Fact Extraction – Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc.  Sentiment Analysis – Rules – Objects and phrases

6 Introduction to Text Analytics Text Analytics Features  Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – NEAR (#), PARAGRAPH, SENTENCE  This is the most difficult to develop  Build on a Taxonomy  Combine with Extraction – If any of list of entities and other words

7

8

9

10

11

12 Varieties of Taxonomy/ Text Analytics Software  Taxonomy Management – Synaptica, SchemaLogic  Full Platform – SAS-Teragram, SAP-Inxight, Clear Forest, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE  Content Management – Nstein, Interwoven, Documentum, etc.  Embedded – Search – FAST, Autonomy, Endeca, Exalead, etc.  Specialty – Sentiment Analysis - Lexalytics

Evaluation Process & Methodology  Start with Self Knowledge – Think Big, Start Small, Scale Fast  Eliminate the unfit – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 3-6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors Beyond “Yes, we have that feature.”  Deep POC (2) – advanced, integration, semantics  Focus on working relationship with vendor. 13

14 Evaluation Process & Methodology Initial Evaluation - Basic Requirements  Platform – range of capabilities – Categorization, Sentiment analysis, etc.  Technical – Search evaluation + Integration – API’s, Java based, Linux run time – Scalability – millions of documents a day – Import-Export – XML, RDF  Usability  Multiple Language Support

Evaluating Text Analytics Software Initial Evaluation: Usability  Ease of use – copy, paste, rename, merge, etc.  User Documentation, user manuals, on-line help, training and tutorials  Visualization – file structure, tree, Hierarchy and alphabetical  Automatic Taxonomy/Node & Rule Generation – Nonsense for Taxonomy – Node – suggestions for sub-categories, rules  Variety of node relationships – child-parent, related 15

16 Initial Evaluation Example Outcomes  Filter One: – Company A, B – sentiment analysis focus, weak categorization – Company C – Lack of full suite of text analytics – Company D – business concerns, support – Open Source – license issues – Ontology Vendors – missing categorization capabilities  4 Demos – Saw a variety of different approaches, but – Company X – lacking sentiment analysis, require 2 vendors – Company Y – lack of language support, development cost

Evaluating Taxonomy Software Proof Of Concept - POC  Quality of results is the essential factor  4-6 weeks POC – bake off / or short pilot  Real life scenarios, categorization with your content  Preparation: – Preliminary analysis of content and users information needs – Set up software in lab – relatively easy – Train taxonomist(s) on software(s) – Develop taxonomy if none available  4-6 week POC – 2-3 rounds of development, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content 17

Evaluating Taxonomy Software POC  Majority of time is on auto-categorization  Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time  Risks – getting software installed and working, getting the right content, initial categorization of content  Elements: – Content – Search terms / search scenarios – Training sets – Test sets of content  Taxonomy Developers – expert consultants plus internal taxonomists 18

Evaluating Taxonomy Software POC Feature Test Cases  Auto-categorization to existing taxonomy – variety of content  Clustering – automatic node generation  Summarization  Entity extraction – build a number of catalogs – People, products, etc.  Sentiment Analysis – products and categorization rules  Evaluate usability in action by taxonomists  Question – Integration with Ontologies?  Technical / integration – Output in XML- API’s  Map above to Client use cases 19

20 POC Design Discussion: Evaluation Criteria  Basic Test Design – categorize test set – Score – by file name, human testers – Accuracy Level – 80-90% – Effort Level per accuracy level  Sentiment Analysis – Accuracy Level – 80-90% – Effort Level per accuracy level  Quantify development time – main elements  Comparison of two vendors – how score? – Combination of scores and report

Evaluating Taxonomy Software POC - Issues  Quality of content  Quality of initial human categorization  Normalize among different test evaluators  Quality of taxonomists – experience with text analytics software and/or experience with content and information needs and behaviors  Quality of taxonomy – General issues – structure (too flat or too deep) – Overlapping categories – Differences in use – browse, index, categorize  Foundation for Development 21

22 Text Analytics Evaluation Context Dependent – Tale of Two POC’s  Taxonomy – GAO – flat, not very orthogonal, political concerns Department centric, subject – Company A– flat, two taxonomies, technical - open to change? Action oriented, activities and events  Content – GAO – giant 200 page PDF formal documents, variety Start_200, Title – Company A – short cryptic customer support notes, Social Media Creative spelling, combination of formal and individual  Culture / Applications – GAO - Formal Federal – big applications, infrastructure – Company A – informal, technical, lots of small apps

23 Text Analytics Evaluation - Conclusions  Text Analytics with Search and ECM can finally deliver the promise  Search / Text analytics is not like other software – meaning  Only way to deal with meaning is in context – your context – Technical issues are filters not features  Enterprise Text Analytics (Platform) is best – If have to start with one application, plan for the platform  POC seems expensive, but is cheaper than making a dead end choice – It is also the foundation and head start on development

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services