Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Similar presentations


Presentation on theme: "Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services"— Presentation transcript:

1 Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

2 2 Agenda – Part II  Introduction – Text Analytics Can Rescue Search – Elements of Text Analytics  Evaluating Text Analytics Software – Varieties of Software – platform and embedded  Three Phase Process / Two Examples – Initial Evaluation – 4 vendors – Demo’s – from 4 to 2 – 4-6 week POC – 2 vendors  Conclusions

3 3 KAPS Group: General  Knowledge Architecture Professional Services  Virtual Company: Network of consultants – 8-10  Partners – SAS, SAP, FAST, Smart Logic, Concept Searching, etc.  Consulting, Strategy, Knowledge architecture audit  Services: – Taxonomy/Text Analytics development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Evaluation of Enterprise Search, Text Analytics – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories

4 4 Introduction Text Analytics to the Rescue  Enterprise Search is Dead!  Taxonomy is Dead!  Long Live Text Analytics!  ECM and ES failed because search is about semantics / meaning not technology  Taxonomy failed because it is too rigid, and too dumb  Metadata failed because it is too hard to do for authors (not really)  They all failed because Search is a semantic infrastructure element not a project

5 5 Introduction to Text Analytics Text Analytics Features  Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets  Summarization – Customizable rules, map to different content  Fact Extraction – Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc.  Sentiment Analysis – Rules – Objects and phrases

6 6 Introduction to Text Analytics Text Analytics Features  Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – NEAR (#), PARAGRAPH, SENTENCE  This is the most difficult to develop  Build on a Taxonomy  Combine with Extraction – If any of list of entities and other words

7 7

8 8

9 9

10 10

11 11

12 12 Varieties of Taxonomy/ Text Analytics Software  Taxonomy Management – Synaptica, SchemaLogic  Full Platform – SAS-Teragram, SAP-Inxight, Clear Forest, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE  Content Management – Nstein, Interwoven, Documentum, etc.  Embedded – Search – FAST, Autonomy, Endeca, Exalead, etc.  Specialty – Sentiment Analysis - Lexalytics

13 Evaluation Process & Methodology  Start with Self Knowledge – Think Big, Start Small, Scale Fast  Eliminate the unfit – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 3-6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors Beyond “Yes, we have that feature.”  Deep POC (2) – advanced, integration, semantics  Focus on working relationship with vendor. 13

14 14 Evaluation Process & Methodology Initial Evaluation - Basic Requirements  Platform – range of capabilities – Categorization, Sentiment analysis, etc.  Technical – Search evaluation + Integration – API’s, Java based, Linux run time – Scalability – millions of documents a day – Import-Export – XML, RDF  Usability  Multiple Language Support

15 Evaluating Text Analytics Software Initial Evaluation: Usability  Ease of use – copy, paste, rename, merge, etc.  User Documentation, user manuals, on-line help, training and tutorials  Visualization – file structure, tree, Hierarchy and alphabetical  Automatic Taxonomy/Node & Rule Generation – Nonsense for Taxonomy – Node – suggestions for sub-categories, rules  Variety of node relationships – child-parent, related 15

16 16 Initial Evaluation Example Outcomes  Filter One: – Company A, B – sentiment analysis focus, weak categorization – Company C – Lack of full suite of text analytics – Company D – business concerns, support – Open Source – license issues – Ontology Vendors – missing categorization capabilities  4 Demos – Saw a variety of different approaches, but – Company X – lacking sentiment analysis, require 2 vendors – Company Y – lack of language support, development cost

17 Evaluating Taxonomy Software Proof Of Concept - POC  Quality of results is the essential factor  4-6 weeks POC – bake off / or short pilot  Real life scenarios, categorization with your content  Preparation: – Preliminary analysis of content and users information needs – Set up software in lab – relatively easy – Train taxonomist(s) on software(s) – Develop taxonomy if none available  4-6 week POC – 2-3 rounds of development, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content 17

18 Evaluating Taxonomy Software POC  Majority of time is on auto-categorization  Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time  Risks – getting software installed and working, getting the right content, initial categorization of content  Elements: – Content – Search terms / search scenarios – Training sets – Test sets of content  Taxonomy Developers – expert consultants plus internal taxonomists 18

19 Evaluating Taxonomy Software POC Feature Test Cases  Auto-categorization to existing taxonomy – variety of content  Clustering – automatic node generation  Summarization  Entity extraction – build a number of catalogs – People, products, etc.  Sentiment Analysis – products and categorization rules  Evaluate usability in action by taxonomists  Question – Integration with Ontologies?  Technical / integration – Output in XML- API’s  Map above to Client use cases 19

20 20 POC Design Discussion: Evaluation Criteria  Basic Test Design – categorize test set – Score – by file name, human testers – Accuracy Level – 80-90% – Effort Level per accuracy level  Sentiment Analysis – Accuracy Level – 80-90% – Effort Level per accuracy level  Quantify development time – main elements  Comparison of two vendors – how score? – Combination of scores and report

21 Evaluating Taxonomy Software POC - Issues  Quality of content  Quality of initial human categorization  Normalize among different test evaluators  Quality of taxonomists – experience with text analytics software and/or experience with content and information needs and behaviors  Quality of taxonomy – General issues – structure (too flat or too deep) – Overlapping categories – Differences in use – browse, index, categorize  Foundation for Development 21

22 22 Text Analytics Evaluation Context Dependent – Tale of Two POC’s  Taxonomy – GAO – flat, not very orthogonal, political concerns Department centric, subject – Company A– flat, two taxonomies, technical - open to change? Action oriented, activities and events  Content – GAO – giant 200 page PDF formal documents, variety Start_200, Title – Company A – short cryptic customer support notes, Social Media Creative spelling, combination of formal and individual  Culture / Applications – GAO - Formal Federal – big applications, infrastructure – Company A – informal, technical, lots of small apps

23 23 Text Analytics Evaluation - Conclusions  Text Analytics with Search and ECM can finally deliver the promise  Search / Text analytics is not like other software – meaning  Only way to deal with meaning is in context – your context – Technical issues are filters not features  Enterprise Text Analytics (Platform) is best – If have to start with one application, plan for the platform  POC seems expensive, but is cheaper than making a dead end choice – It is also the foundation and head start on development

24 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com


Download ppt "Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services"

Similar presentations


Ads by Google