Download presentation
Presentation is loading. Please wait.
Published byBathsheba Fields Modified over 9 years ago
1
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
2
2 Agenda Introduction – Text Analytics & Infrastructure Platform – Text Analytics Features – Semantic Infrastructure – Taxonomy, Metadata, Technology – Value of Text Analytics – Getting Started with Text Analytics Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications – Integration with Search and ECM – Platform for Information Applications Questions / Discussions
3
3 KAPS Group: General Knowledge Architecture Professional Services – Network of Consultants Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching – Attensity, Clarabridge, Lexalytics, Strategy – IM & KM - Text Analytics, Social Media, Integration Services: – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc. Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies Presentations, Articles, White Papers – http://www.kapsgroup.comhttp://www.kapsgroup.com
4
4 Agenda – Introduction Text Analytics & Semantic Infrastructure Text Analytics Features – Categorization & Extraction Semantic Infrastructure – Taxonomy, Metadata, Technology Value of Text Analytics – Enterprise Search that works Getting Started with Text Analytics – Text Analytics Strategy & Vision – Text Analytics Evaluation / Quick Start
5
5 Introduction to Text Analytics Text Analytics Features Noun Phrase Extraction / Fact Extraction – Catalogs with variants, rule based dynamic – Relationships of entities – people-organizations-activities Sentiment Analysis – Objects and phrases – statistics & rules – Positive and Negative Summarization – replace snippets Auto-categorization – built on a taxonomy – Training sets, Terms, Semantic Networks – Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE Auto-categorization as Foundation – Disambiguation - Identification of objects, events, context – Build rules based, not simply Bag of Individual Words
6
Case Study – Categorization & Sentiment 6
7
7
8
8
9
9 Introduction to Text Analytics Taxonomy & Metadata Thesauri, Controlled Vocabulary, Glossaries, Product Catalogs – Resources to build on SharePoint – Managed Metadata Services – Term stores – corporate taxonomies – Enterprise Keywords (Folksonomy) Metadata standards – Dublin Core - Mostly syntactic not semantic – Semantic – keywords – very poor performance, no structure Facets – classes of metadata – Standard - People, Organization, Document type-purpose – Requires huge amounts of metadata
10
10 Introduction to Text Analytics TA & Taxonomy Complimentary Information Platform Taxonomy provides a consistent and common vocabulary – Enterprise resource – integrated not centralized Text Analytics provides a consistent tagging – Human indexing is subject to inter and intra individual variation Taxonomy provides the basic structure for categorization – And candidates terms Text Analytics provides the power to apply the taxonomy – And metadata of all kinds Text Analytics and Taxonomy Together – Platform – Consistent in every dimension – Powerful and economic
11
Introduction to Text Analytics Taxonomy and Text Analytics Standard Taxonomies = starter categorization rules – Example – Mesh – bottom 5 layers are terms Categorization taxonomy structure – Tradeoff of depth and complexity of rules – Easier to maintain taxonomy, but need to refine rules Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large – Orthogonal categories Smaller modular taxonomies – More flexible relationships – not just Is-A-Kind/Child-Of Different kinds of taxonomies – Sentiment – products and features Taxonomy of Sentiment, Emotion - Expertise – process 11
12
12 Introduction to Text Analytics Metadata - Tagging How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough – And expensive – central or distributed Library staff –experts in categorization not subject matter – Too limited, narrow bottleneck – Often don’t understand business processes and business uses Authors – Experts in the subject matter, terrible at categorization – Intra and Inter inconsistency, “intertwingleness” – Choosing tags from taxonomy – complex task – Folksonomy – almost as complex, wildly inconsistent – Resistance – not their job, cognitively difficult = non-compliance Text Analytics is the answer(s)!
13
13 Introduction to Text Analytics Content Management – SharePoint Mind the Gap – Manual, Automatic, Hybrid All require human effort – issue of where and how effective Manual - human effort is tagging (difficult, inconsistent) Automatic and Hybrid - human effort is prior to tagging – Build on expertise – librarians on categorization, SME’s on subject terms Hybrid Model – Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author – Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy – Feedback – if author overrides -> suggestion for new category – Facets – Requires a lot of Metadata - Entity Extraction feeds facets Hybrid – Automatic is really a spectrum – depends on context
14
14 Introduction to Text Analytics Benefits of Text Analytics Why Text Analytics? – Enterprise search has failed to live up to its potential – Enterprise Content management has failed to live up to its potential – Taxonomy has failed to live up to its potential – Adding metadata, especially keywords has not worked What is missing? – Intelligence – human level categorization, conceptualization – Infrastructure – Integrated solutions not technology, software Text Analytics can be the foundation that (finally) drives success – search, content management, and much more
15
15 Text Analytics Platform – Benefits IDC White Paper Time Wasted – Reformat information - $5.7 million per 1,000 per year – Not finding information - $5.3 million per 1,000 – Recreating content - $4.5 Million per 1,000 Small Percent Gain = large savings – 1% - $10 million – 5% - $50 million – 10% - $100 million
16
16 Text Analytics Platform – Benefits Findability within and outside the enterprise – Savings per year - $millions Rescue enterprise search and ECM projects – Add semantics to search Clean up enterprise content – Duplication and accurate categorization Improve the quality of information access – Finding the right information can save millions Build smarter applications – Social networking, locate expertise within the enterprise
17
17 Text Analytics Platform – Benefits Understand your customers – What they are talking about and how they feel about it Empower your employees – Not only more time, but they work smarter Understand your competitors – What they are working on, talking about – Combine unstructured content and rich data sources – more intelligent analysis
18
18 Text Analytics Platform – Dangers Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library – Need all of the above and taxonomists+ Bad Design: – Start with bad taxonomy – Wrong taxonomy – too big or two flat Bad Categorization / Entity Extraction – Right kind of experience
19
19 Getting Started with Text Analytics Text Analytics Vision & Strategy Strategic Questions – why, what value from the text analytics, how are you going to use it – Platform or Applications? What are the basic capabilities of Text Analytics? What can Text Analytics do for Search? – After 10 years of failure – get search to work? What can you do with smart search based applications? – RM, PII, Social ROI for effective search – difficulty of believing – Problems with metadata, taxonomy
20
20 Getting Started with Text Analytics Text Analytics Vision & Strategy Simple Subject Taxonomy structure – Easy to develop and maintain Combined with categorization capabilities – Added power and intelligence Combined with people tagging, refining tags Combined with Faceted Metadata – Dynamic selection of simple categories – Allow multiple user perspectives Can’t predict all the ways people think Monkey, Banana, Panda Combined with ontologies and semantic data – Multiple applications – Text mining to Search – Combine search and browse
21
Step 1 : TA Information Audit Start with Self Knowledge Info Problems – what, how severe Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization, Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining Category modeling – Cognitive Science – how people think Natural level categories mapped to communities, activities Novice prefer higher levels Balance of informative and distinctiveness Text Analytics Strategy/Model – forms, technology, people 21
22
Step 1 : TA Information Audit Start with Self Knowledge Ideas – Content and Content Structure – Map of Content – Tribal language silos – Structure – articulate and integrate – Taxonomic resources People – Producers & Consumers – Communities, Users, Central Team Activities – Business processes and procedures – Semantics, information needs and behaviors – Information Governance Policy Technology – CMS, Search, portals, text analytics – Applications – BI, CI, Semantic Web, Text Mining 22
23
23 Step 2: TA Evaluation Varieties of Taxonomy/ Text Analytics Software Taxonomy Management - extraction Full Platform – SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE Embedded – Search or Content Management – FAST, Autonomy, Endeca, Vivisimo, NLP, etc. – Interwoven, Documentum, etc. Specialty / Ontology (other semantic) – Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology
24
Step 2: Text Analytics Evaluation Different Kind of software evaluation Traditional Software Evaluation - Start – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors Reduce to 1-3 vendors Vendors have different strengths in multiple environments – Millions of short, badly typed documents, Build application – Library 200 page PDF, enterprise & public search 24
25
Design of the Text Analytics Selection Team Traditional Candidates – IT&, Business, Library IT - Experience with software purchases, needs assess, budget – Search/Categorization is unlike other software, deeper look Business -understand business, focus on business value They can get executive sponsorship, support, and budget – But don’t understand information behavior, semantic focus Library, KM - Understand information structure Experts in search experience and categorization – But don’t understand business or technology 25
26
Design of the Text Analytics Selection Team Interdisciplinary Team, headed by Information Professionals Relative Contributions – IT – Set necessary conditions, support tests – Business – provide input into requirements, support project – Library – provide input into requirements, add understanding of search semantics and functionality Much more likely to make a good decision Create the foundation for implementation 26
27
Step 3: Proof of Concept / Pilot Project 4 weeks POC – bake off / or short pilot Real life scenarios, categorization with your content 2 rounds of development, test, refine / Not OOB Need SME’s as test evaluators – also to do an initial categorization of content Measurable Quality of results is the essential factor Majority of time is on auto-categorization Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time Taxonomy Developers – expert consultants plus internal taxonomists 27
28
Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
29
29 Resources Conferences: – Text Analytics World – All aspects of text analytics Text Analytics World Call for Speakers – Oct 3-4 Boston – Text Analytics Summit – social media focus Text Analytics Summit LinkedIn Groups: – Text Analytics World – Text Analytics Group – Data and Text Professionals – Sentiment Analysis – Metadata Management – Semantic Technologies
30
30 Resources Books – Women, Fire, and Dangerous Things George Lakoff – Knowledge, Concepts, and Categories Koen Lamberts and David Shanks – The Stuff of Thought – Steven Pinker Journals – Academic – Cognitive Science, Linguistics, NLP – Applied – Scientific American Mind, New Scientist
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.