Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
2 Agenda Taxonomy and Text Analytics – Search, Taxonomy, and Text Analytics Case Study – Taxonomy Development – Text Analytics as a Taxonomy tool – Case Studies – Expertise & Sentiment & Beyond Future of Text Analytics and Taxonomy – Beyond Indexing - Categorization – Sentiment, Expertise, Ontologies
3 Taxonomy and Text Analytics Text Analytics Features Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets Summarization – Customizable rules, map to different content Fact Extraction – Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc. Sentiment Analysis – Rules – Objects and phrases – positive and negative
4 Taxonomy and Text Analytics Text Analytics Features Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – DIST (#), PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction – If any of list of entities and other words
Case Study – Categorization & Sentiment 5
6
7 Search, Taxonomy, and Text Analytics Elements Multiple Knowledge Structures – Facet – orthogonal dimension of metadata – Taxonomy - Subject matter / aboutness – Categorization, clusters, entity extraction into facets A Hybrid Model of ECM and Metadata – Authors, editors-librarians, Text Analytics – Submit a document -> TA generates metadata, extracts concepts, Suggests categorization (keywords) -> author OK’s (easy task) -> librarian monitors for issues – Use results as input into analytics And/or Dynamic categorization-extraction at results time
8
9
10 Search, Taxonomy and Text Analytics Multiple Applications Platform for Information Applications – Content Aggregation – Duplicate Documents – save millions! – Text Mining – BI, CI – sentiment analysis – Combine with Data Mining – disease symptoms, new Predictive Analytics – Social – Hybrid folksonomy / taxonomy / auto-metadata – Social – expertise, categorize tweets and blogs, reputation – Ontology – travel assistant – SIRI Use your Imagination!
Taxonomy and Text Analytics Case Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms 11
Case Study – Taxonomy Development 12
Case Study – Taxonomy Development 13
Case Study – Taxonomy Development 14
15 Taxonomy and Text Analytics Applications Expertise Analysis Sentiment Analysis to Expertise Analysis(KnowHow) – Know How, skills, “tacit” knowledge Experts write and think differently Basic level is lower, more specific – Levels: Superordinate – Basic – Subordinate Mammal – Dog – Golden Retriever – Furniture – chair – kitchen chair Experts organize information around processes, not subjects Build expertise categorization rules
16 Expertise Analysis Expertise – application areas Taxonomy / Ontology development /design – audience focus – Card sorting – non-experts use superficial similarities Business & Customer intelligence – add expertise to sentiment – Deeper research into communities, customer s Text Mining - Expertise characterization of writer, corpus eCommerce – Organization/Presentation of information – expert, novice Expertise location- Generate automatic expertise characterization based on documents Experiments - Pronoun Analysis – personality types – Essay Evaluation Software - Apply to expertise characterization Model levels of chunking, procedure words over content
17 Beyond Sentiment: Behavior Prediction Case Study – Telecom Customer Service Problem – distinguish customers likely to cancel from mere threats Analyze customer support notes General issues – creative spelling, second hand reports Develop categorization rules – First – distinguish cancellation calls – not simple – Second - distinguish cancel what – one line or all – Third – distinguish real threats
18 Beyond Sentiment Behavior Prediction – Case Study Basic Rule – (START_20, (AND, – (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act – ask about the contract expiration date as she wanted to cxl teh acct Combine sophisticated rules with sentiment statistical training and Predictive Analytics
19 Beyond Sentiment - Wisdom of Crowds Crowd Sourcing Technical Support Example – Android User Forum Develop a taxonomy of products, features, problem areas Develop Categorization Rules: – “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” – Find product & feature – forum structure – Find problem areas in response, nearby text for solution Automatic – simply expose lists of “solutions” – Search Based application Human mediated – experts scan and clean up solutions
20 Text Analytics Development Best Practices - Principles Categorization taxonomy structure – Tradeoff of depth and complexity of rules – Multiple avenues – facets, terms, rules, etc. No right balance – Recall-precision balance is application specific – Training sets of starting points, rules rule – Need for custom development Different kinds of taxonomies – Sentiment – products and features – Expertise – process – Categorization – smaller – power in categorization rules – Facets – combine – more orthogonal categories
21 Taxonomy and Text Analytics Conclusions Text Analytics (Entity extraction and auto-categorization, sentiment analysis) are an essential platform Text Analytics add a new dimension to taxonomy – Taxonomists are an essential resource – understand information structure Enterprise Search – Hybrid ECM model with text analytics Future – new kinds of applications: – Text Mining and Data mining, research tools, sentiment – Social Media – multiple sources for multiple applications – Beyond Sentiment – expertise applications, behavior – NeuroAnalytics – cognitive science meets taxonomy and more Watson is just the start
Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services