Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Similar presentations


Presentation on theme: "Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services."— Presentation transcript:

1 Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

2 2 Agenda  Text Analytics Introduction – Text Analytics – Text Mining  Case Study – Taxonomy Development  Text Analytics, Text Mining, and Taxonomy,  Text Analytics Applications – New Directions – Search & Info Apps – Expertise Analysis, Behavior Prediction, More  Conclusions

3 3 KAPS Group: General  Knowledge Architecture Professional Services – Network of Consultants  Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching – Attensity, Clarabridge, Lexalytics,  Strategy – IM & KM - Text Analytics, Social Media, Integration  Services: – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies Presentations, Articles, White Papers – http://www.kapsgroup.comhttp://www.kapsgroup.com

4 4 Taxonomy, Text Mining, and Text Analytics Text Analytics Features  Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets  Summarization – Customizable rules, map to different content  Fact Extraction – Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc.  Sentiment Analysis – Rules – Objects and phrases – positive and negative

5 5 Taxonomy, Text Mining, and Text Analytics Text Analytics Features  Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – DIST (#), PARAGRAPH, SENTENCE  This is the most difficult to develop  Build on a Taxonomy  Combine with Extraction – If any of list of entities and other words

6 6

7 Case Study – Categorization & Sentiment 7

8 8

9 9

10 10

11 11

12 12

13 13

14 Taxonomy and Text Analytics 14

15 Taxonomy and Text Analytics 15

16 Taxonomy, Text Mining, and Text Analytics Case Study – Taxonomy Development  Problem – 200,000 new uncategorized documents  Old taxonomy –need one that reflects change in corpus  Text mining, entity extraction, categorization  Content – 250,000 large documents, search logs, etc.  Bottom Up- terms in documents – frequency, date,  Clustering – suggested categories  Clustering – chunking for editors  Entity Extraction – people, organizations, Programming languages  Time savings – only feasible way to scan documents  Quality – important terms, co-occurring terms 16

17 Case Study – Taxonomy Development 17

18 Case Study – Taxonomy Development 18

19 Case Study – Taxonomy Development 19

20 20 Text Analytics Development

21 21 New Directions in Social Media Text Analytics, Text Mining, and Predictive Analytics  Two Systems of the Brain – Fast, System 1, Immediate patterns (TM) – Slow, System 2, Conceptual, reasoning (TA)  Text Analytics – pre-processing for TM – Discover additional structure in unstructured text – Behavior Prediction – adding depth in individual documents – New variables for Predictive Analytics, Social Media Analytics – New dimensions – 90% of information  Text Mining for TA– Semi-automated taxonomy development – Bottom Up- terms in documents – frequency, date, clustering – Improve speed and quality – semi-automatic

22 22 Text Analytics and Taxonomy Complimentary Information Platform  Taxonomy provides a consistent and common vocabulary – Enterprise resource – integrated not centralized  Text Analytics provides a consistent tagging – Human indexing is subject to inter and intra individual variation  Taxonomy provides the basic structure for categorization – And candidates terms  Text Analytics provides the power to apply the taxonomy – And metadata of all kinds  Text Analytics and Taxonomy Together – Platform – Consistent in every dimension – Powerful and economic

23 23 Taxonomy, Text Mining, and Text Analytics Metadata – Tagging – the Problem  How do you bridge the gap – taxonomy to documents?  Tagging documents with taxonomy nodes is tough – And expensive – central or distributed  Library staff –experts in categorization not subject matter – Too limited, narrow bottleneck – Often don’t understand business processes and business uses  Authors – Experts in the subject matter, terrible at categorization – Intra and Inter inconsistency, “intertwingleness” – Choosing tags from taxonomy – complex task – Folksonomy – almost as complex, wildly inconsistent – Resistance – not their job, cognitively difficult = non-compliance  Text Analytics is the answer(s)!

24 24 Taxonomy, Text Mining, and Text Analytics Metadata Tagging – the Solution  Mind the Gap – Manual, Automatic, Hybrid  All require human effort – issue of where and how effective  Manual - human effort is tagging (difficult, inconsistent)  Automatic and Hybrid - human effort is prior to tagging – Build on expertise – librarians on categorization, SME’s on subject terms  Hybrid Model – Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author – Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy – Feedback – if author overrides -> suggestion for new category – Facets – Requires a lot of Metadata - Entity Extraction feeds facets  Hybrid – Automatic is really a spectrum – depends on context

25 25 Taxonomy, Text Mining, and Text Analytics Applications: Search  Multiple Knowledge Structures – Facet – orthogonal dimension of metadata – Taxonomy - Subject matter / aboutness – Ontology – Relationships / Facts Subject – Verb - Object  Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining  People – tagging, evaluating tags, fine tune rules and taxonomy  People – Users, social tagging, suggestions  Rich Search Results – context and conversation

26 26

27 27

28 28 Taxonomy, Text Mining, and Text Analytics Applications: Search-Based Applications  Platform for Information Applications – Content Aggregation – Duplicate Documents – save millions! – Text Mining – BI, CI – sentiment analysis – Combine with Data Mining – disease symptoms, new Predictive Analytics – Social – Hybrid folksonomy / taxonomy / auto-metadata – Social – expertise, categorize tweets and blogs, reputation – Ontology – travel assistant – SIRI  Use your Imagination!

29 29 Taxonomy, Text Mining, and Text Analytics Applications: Expertise Analysis  Sentiment Analysis to Expertise Analysis(KnowHow) – Know How, skills, “tacit” knowledge  Experts write and think differently  Basic level is lower, more specific – Levels: Superordinate – Basic – Subordinate Mammal – Dog – Golden Retriever – Furniture – chair – kitchen chair  Experts organize information around processes, not subjects  Build expertise categorization rules

30 30 Taxonomy, Text Mining, and Text Analytics Expertise – application areas  Taxonomy / Ontology development /design – audience focus – Card sorting – non-experts use superficial similarities  Business & Customer intelligence – add expertise to sentiment – Deeper research into communities, customer s  Text Mining - Expertise characterization of writer, corpus  eCommerce – Organization/Presentation of information – expert, novice  Expertise location- Generate automatic expertise characterization based on documents  Experiments - Pronoun Analysis – personality types – Essay Evaluation Software - Apply to expertise characterization Model levels of chunking, procedure words over content

31 31 Beyond Sentiment: Behavior Prediction Case Study – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Analyze customer support notes  General issues – creative spelling, second hand reports  Develop categorization rules – First – distinguish cancellation calls – not simple – Second - distinguish cancel what – one line or all – Third – distinguish real threats

32 32 Beyond Sentiment Behavior Prediction – Case Study  Basic Rule – (START_20, (AND, – (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act – ask about the contract expiration date as she wanted to cxl teh acct Combine sophisticated rules with sentiment statistical training and Predictive Analytics

33 33 Beyond Sentiment - Wisdom of Crowds Crowd Sourcing Technical Support  Example – Android User Forum  Develop a taxonomy of products, features, problem areas  Develop Categorization Rules: – “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.” – Find product & feature – forum structure – Find problem areas in response, nearby text for solution  Automatic – simply expose lists of “solutions” – Search Based application  Human mediated – experts scan and clean up solutions

34 34 Taxonomy, Text Mining, and Text Analytics Conclusions  Text Analytics is an essential platform for multiple applications  Text Analytics and Text Mining and Taxonomy are mutually enriching approaches  Sentiment Analysis, Beyond Positive & Negative  New emotion taxonomies, context around terms  New applications – Expertise, behavior prediction, etc.  Future – new kinds of applications: – Enterprise Search – Hybrid ECM model with text analytics – Expertise Analysis, Behavior Prediction, and more – Social Media and Big Data built from TM & TA – NeuroAnalytics – cognitive science meets taxonomy and more Watson is just the start

35 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

36 36 Resources  Books – Women, Fire, and Dangerous Things George Lakoff – Knowledge, Concepts, and Categories Koen Lamberts and David Shanks – Formal Approaches in Categorization Ed. Emmanuel Pothos and Andy Wills – The Mind Ed John Brockman Good introduction to a variety of cognitive science theories, issues, and new ideas – Any cognitive science book written after 2009

37 37 Resources  Conferences – Web Sites – Text Analytics World – http://www.textanalyticsworld.com http://www.textanalyticsworld.com – Text Analytics Summit – http://www.textanalyticsnews.com http://www.textanalyticsnews.com – Semtech – http://www.semanticweb.com http://www.semanticweb.com

38 38 Resources  Blogs – SAS- http://blogs.sas.com/text-mining/ http://blogs.sas.com/text-mining/  LinkedIn Groups: – Text Analytics World – Text Analytics Group – Data and Text Professionals – Sentiment Analysis – Metadata Management – Semantic Technologies

39 39 Resources  Web Sites – Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ http://finance.groups.yahoo.com/group/TaxoCoP/ – Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementm eetstextanalytics.pdf http://www.textanalyticsnews.com/usa/contentmanagementm eetstextanalytics.pdf – Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.comhttp://www.kapsgroup.com

40 40 Resources  Articles – Malt, B. C. 1995. Category coherence in cross-cultural perspective. Cognitive Psychology 29, 85-148 – Rifkin, A. 1985. Evidence for a basic level in event taxonomies. Memory & Cognition 13, 538-56 – Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987. Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086 – Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82


Download ppt "Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services."

Similar presentations


Ads by Google