Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.

Slides:

Advertisements

Similar presentations

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Advertisements

Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.

Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.

Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Development Case Studies

Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.

Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.

Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics And Text Mining Best of Text and Data

Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group

SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group

Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.

Adding Semantics to Enterprise Search Workshop

Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.

Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.

Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.

New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group

Faceted Navigation Design Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.

Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Deep Text New Approaches in Text Analytics and Knowledge Organization Tom Reamy Chief Knowledge Architect KAPS Group Author: Deep.

Text Analytics Webinar

Tom Reamy Chief Knowledge Architect KAPS Group

Text Analytics Tutorial

Tom Reamy Chief Knowledge Architect KAPS Group

Enterprise Social Networks A New Semantic Foundation

Using Text Analytics to Spot Fake News

Text Analytics Workshop: Introduction

Text Analytics Workshop

Program Chair: Tom Reamy Chief Knowledge Architect

Expertise Location Basic Level Categories

Presentation transcript:

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

2 Agenda  Introduction – State of Text Analytics – Text Analytics Features – Information / Knowledge Environment – Taxonomy, Metadata, Information Technology – Value of Text Analytics – Quick Start for Text Analytics  Development – Taxonomy, Categorization, Faceted Metadata  Text Analytics Applications – Integration with Search and ECM – Platform for Information Applications  Questions / Discussions

3 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies  Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.  Presentations, Articles, White Papers –

4 Text Analytics Workshop Introduction: Text Analytics  History – academic research, focus on NLP  Inxight –out of Zerox Parc – Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data  Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends – Half from 2008 are gone - Lucky ones got bought  Focus on enterprise text analytics – shift to sentiment analysis - easier to do, obvious pay off (customers, not employees) – Backlash – Real business value?  Enterprise search down, taxonomy up –need for metadata – not great results from either – 10 years of effort for what?  Text Analytics is slowly growing – time for a jump?

5 Text Analytics Workshop Current State of Text Analytics  Big Data – Big Text is bigger, text into data, data for text – Watson – ensemble methods, pun module  Social Media / Sentiment – look for real business value – New techniques, emotion taxonomies  Enterprise Text Analytics (ETA) – ETA is the platform for unstructured text applications – Wide Range of InfoApps – BI,CI, Fraud, social media  Has Text Analytics Arrived? – Survey – 28% just getting started, 11% not yet, 17.5% ETA  What is holding it back? – Lack of clarity about business value, what it is – 55% – Lack of strategic vision, real examples  Gartner – new report on text analytics

6 Introduction: Future Directions What is Text Analytics Good For?

7 Text Analytics Workshop What is Text Analytics?  Text Mining – NLP, statistical, predictive, machine learning  Semantic Technology – ontology, fact extraction  Extraction – entities – known and unknown, concepts, events – Catalogs with variants, rule based  Sentiment Analysis – Objects and phrases – statistics & rules – Positive and Negative  Auto-categorization – Training sets, Terms, Semantic Networks – Rules: Boolean - AND, OR, NOT – Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE – Disambiguation - Identification of objects, events, context – Build rules based, not simply Bag of Individual Words

Case Study – Categorization & Sentiment 8

9

10

11

Case Study – Taxonomy Development 12

13 Text Analytics Workshop TA & Taxonomy Complimentary Information Platform  Taxonomy provides a consistent and common vocabulary – Enterprise resource – integrated not centralized  Text Analytics provides a consistent tagging – Human indexing is subject to inter and intra individual variation  Taxonomy provides the basic structure for categorization – And candidates terms  Text Analytics provides the power to apply the taxonomy – And metadata of all kinds  Text Analytics and Taxonomy Together – Platform – Consistent in every dimension – Powerful and economic

Text Analytics Workshop Taxonomy and Text Analytics  Standard Taxonomies = starter categorization rules – Example – Mesh – bottom 5 layers are terms  Categorization taxonomy structure – Tradeoff of depth and complexity of rules – Easier to maintain taxonomy, but need to refine rules  Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large, orthogonal categories  Smaller modular taxonomies – More flexible relationships – not just Is-A-Kind/Child-Of  Different kinds of taxonomies – emotion, expertise  No standards for text analytics – custom jobs – Importance of starting resources 14

15 Text Analytics Workshop Metadata - Tagging  How do you bridge the gap – taxonomy to documents?  Tagging documents with taxonomy nodes is tough – And expensive – central or distributed  Library staff –experts in categorization not subject matter – Too limited, narrow bottleneck – Often don’t understand business processes and business uses  Authors – Experts in the subject matter, terrible at categorization – Intra and Inter inconsistency, “intertwingleness” – Choosing tags from taxonomy – complex task – Folksonomy – almost as complex, wildly inconsistent – Resistance – not their job, cognitively difficult = non-compliance  Text Analytics is the answer(s)!

16 Text Analytics Workshop Mind the Gap – Manual-Automatic-Hybrid  All require human effort – issue of where and how effective  Manual - human effort is tagging (difficult, inconsistent) – Small, high value document collections, trained taggers  Automatic - human effort is prior to tagging – auto-categorization rules and/or NLP algorithm effort  Hybrid Model – before (like automatic) and after – Build on expertise – librarians on categorization, SME’s on subject terms  Facets – Requires a lot of Metadata - Entity Extraction feeds facets – more automatic, feedback by design  Manual - Hybrid – Automatic is a spectrum – depends on context

17 Text Analytics Workshop Benefits of Text Analytics  Why Text Analytics? – Enterprise search has failed to live up to its potential – Enterprise Content management has failed to live up to its potential – Taxonomy has failed to live up to its potential – Adding metadata, especially keywords has not worked  What is missing? – Intelligence – human level categorization, conceptualization – Infrastructure – Integrated solutions not technology, software  Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

Strategic Vision for Text Analytics Costs and Benefits  IDC study – quantify cost of bad search  Three areas: – Time spent searching – Recreation of documents – Bad decisions / poor quality work  Costs – 50% search time is bad search = $2,500 year per person – Recreation of documents = $5,000 year per person – Bad quality (harder) = $15,000 year per person  Per 1,000 people = $ 22.5 million a year – 30% improvement = $6.75 million a year – Add own stories – especially cost of bad information – Human measure - # of FTE’s, savings passed on to customers, etc. 18

19 Getting Started with Text Analytics Need for a Quick Start  Text Analytics is weird, a bit academic, and not very practical It involves language and thinking and really messy stuff  On the other hand, it is really difficult to do right (Rocket Science)  Organizations don’t know what text analytics is and what it is for  TAW Survey shows - need two things: Strategic vision of text analytics in the enterprise Business value, problems solved, information overload Text Analytics as platform for information access Real life functioning program showing value and demonstrating an understanding of what it is and does  Quick Start – Strategic Vision – Software Evaluation – POC / Pilot

20 Getting Started with Text Analytics Text Analytics Vision & Strategy  Strategic Questions – why, what value from the text analytics, how are you going to use it – Platform or Applications?  What are the basic capabilities of Text Analytics?  What can Text Analytics do for Search? – After 10 years of failure – get search to work?  What can you do with smart search based applications? – RM, PII, Social  ROI for effective search – difficulty of believing – Problems with metadata, taxonomy

Quick Start Step One- Knowledge Audit  Ideas – Content and Content Structure – Map of Content – Tribal language silos – Structure – articulate and integrate – Taxonomic resources  People – Producers & Consumers – Communities, Users, Central Team  Activities – Business processes and procedures – Semantics, information needs and behaviors – Information Governance Policy  Technology – CMS, Search, portals, text analytics – Applications – BI, CI, Semantic Web, Text Mining 21

Quick Start Step One- Knowledge Audit  Info Problems – what, how severe  Formal Process – Knowledge Audit – Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining  Informal for smaller organizations, specific application  Category modeling – Cognitive Science – how people think – Panda, Monkey, Banana  Natural level categories mapped to communities, activities Novice prefer higher levels Balance of informative and distinctiveness  Strategic Vision – Text Analytics and Information/Knowledge Environment 22

23 Quick Start Step Two - Software Evaluation Varieties of Taxonomy/ Text Analytics Software  Software is more important to text analytics – No spreadsheets for semantics  Taxonomy Management - extraction  Full Platform – SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE  Embedded – Search or Content Management – FAST, Autonomy, Endeca, Vivisimo, NLP, etc. – Interwoven, Documentum, etc.  Specialty / Ontology (other semantic) – Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology

Quick Start Step Two - Software Evaluation Different Kind of software evaluation  Traditional Software Evaluation - Start – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors  Reduce to 1-3 vendors  Vendors have different strengths in multiple environments – Millions of short, badly typed documents, Build application – Library 200 page PDF, enterprise & public search 24

Quick Start Step Two - Software Evaluation Design of the Text Analytics Selection Team  IT - Experience with software purchases, needs assess, budget – Search/Categorization is unlike other software, deeper look  Business -understand business, focus on business value  They can get executive sponsorship, support, and budget – But don’t understand information behavior, semantic focus  Library, KM - Understand information structure  Experts in search experience and categorization – But don’t understand business or technology  Interdisciplinary Team, headed by Information Professionals  Much more likely to make a good decision  Create the foundation for implementation 25

Quick Start Step Three – Proof of Concept / Pilot Project  POC use cases – basic features needed for initial projects  Design - Real life scenarios, categorization with your content  Preparation: – Preliminary analysis of content and users information needs Training & test sets of content, search terms & scenarios – Train taxonomist(s) on software(s) – Develop taxonomy if none available  Four week POC – 2 rounds of develop, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Majority of time is on auto-categorization 26

27 POC Design: Evaluation Criteria & Issues  Basic Test Design – categorize test set – Score – by file name, human testers  Categorization & Sentiment – Accuracy 80-90% – Effort Level per accuracy level  Combination of scores and report  Operators (DIST, etc.), relevancy scores, markup  Development Environment – Usability, Integration  Issues: – Quality of content & initial human categorization – Normalize among different test evaluators – Quality of taxonomy – structure, overlapping categories

Quick Start for Text Analytics Proof of Concept -- Value of POC  Selection of best product(s)  Identification and development of infrastructure elements – taxonomies, metadata – standards and publishing process  Training by doing –SME’s learning categorization, Library/taxonomist learning business language  Understand effort level for categorization, application  Test suitability of existing taxonomies for range of applications  Explore application issues – example – how accurate does categorization need to be for that application – 80-90%  Develop resources – categorization taxonomies, entity extraction catalogs/rules 28

POC and Early Development: Risks and Issues  CTO Problem –This is not a regular software process  Semantics is messy not just complex – 30% accuracy isn’t 30% done – could be 90%  Variability of human categorization  Categorization is iterative, not “the program works” – Need realistic budget and flexible project plan  Anyone can do categorization – Librarians often overdo, SME’s often get lost (keywords)  Meta-language issues – understanding the results – Need to educate IT and business in their language 29

Development 30

31 Text Analytics Development: Categorization Process Start with Taxonomy and Content  Starter Taxonomy – If no taxonomy, develop (steal) initial high level Textbooks, glossaries, Intranet structure Organization Structure – facets, not taxonomy  Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large – Orthogonal categories  Content Selection – Map of all anticipated content – Selection of training sets – if possible – Automated selection of training sets – taxonomy nodes as first categorization rules – apply and get content

32 Text Analytics Workshop Text Analytics Development: Categorization Process  First Round of Categorization Rules  Term building – from content – basic set of terms that appear often / important to content  Add terms to rule, apply to broader set of content  Repeat for more terms – get recall-precision “scores”  Repeat, refine, repeat, refine, repeat  Get SME feedback – formal process – scoring  Get SME feedback – human judgments  Text against more, new content  Repeat until “done” – 90%?

33 Text Analytics Workshop Text Analytics Development: Entity Extraction Process  Facet Design – from Knowledge Audit, K Map  Find and Convert catalogs: – Organization – internal resources – People – corporate yellow pages, HR – Include variants – Scripts to convert catalogs – programming resource  Build initial rules – follow categorization process – Differences – scale, threshold – application dependent – Recall – Precision – balance set by application – Issue – disambiguation – Ford company, person, car

34 Text Analytics Development: Entity Extraction Process  Demo – SAS Enterprise Content Categorization  Amdocs Motivation – BillGreaterThanLast – build rule  BillIncludesProrate – auto rule   GAO Project Three – Agriculture and New Agriculture 

35 Text Analytics Workshop Case Study - Background  Inxight Smart Discovery  Multiple Taxonomies – Healthcare – first target – Travel, Media, Education, Business, Consumer Goods,  Content – 800+ Internet news sources – 5,000 stories a day  Application – Newsletters – Editors using categorized results – Easier than full automation

36 Text Analytics Workshop Case Study - Approach  Initial High Level Taxonomy – Auto generation – very strange – not usable – Editors High Level – sections of newsletters – Editors & Taxonomy Pro’s - Broad categories & refine  Develop Categorization Rules – Multiple Test collections – Good stories, bad stories – close misses - terms  Recall and Precision Cycles – Refine and test – taxonomists – many rounds – Review – editors – 2-3 rounds  Repeat – about 4 weeks

37 Text Analytics Workshop Case Study – Issues & Lessons  Taxonomy Structure: Aggregate vs. independent nodes – Children Nodes – subset – rare  Trade-off of depth of taxonomy and complexity of rules  No best answer – taxonomy structure, format of rules – Need custom development – Recall more important than precision – editors role  Combination of SME and Taxonomy pros – Combination of Features – Entity extraction, terms, Boolean, filters, facts  Training sets and find similar are weakest  Plan for ongoing refinement

38 Text Analytics Workshop Enterprise Environment – Case Studies  A Tale of Two Taxonomies – It was the best of times, it was the worst of times  Basic Approach – Initial meetings – project planning – High level K map – content, people, technology – Contextual and Information Interviews – Content Analysis – Draft Taxonomy – validation interviews, refine – Integration and Governance Plans

39 Text Analytics Workshop Enterprise Environment – Case One – Taxonomy, 7 facets  Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins  Facets: – Organization > Division > Group – Clients > Federal > EPA – Facilities > Division > Location > Building X – Content Type – Knowledge Asset > Proposals – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Methods > Social > Population Study – Materials > Compounds > Chemicals

40 Text Analytics Workshop Enterprise Environment – Case One – Taxonomy, 7 facets  Project Owner – KM department – included RM, business process  Involvement of library - critical  Realistic budget, flexible project plan  Successful interviews – build on context – Overall information strategy – where taxonomy fits  Good Draft taxonomy and extended refinement – Software, process, team – train library staff – Good selection and number of facets  Developed broad categorization and one deep-Chemistry  Final plans and hand off to client

41 Text Analytics Workshop Enterprise Environment – Case Two – Taxonomy, 4 facets  Taxonomy of Subjects / Disciplines: – Geology > Petrology  Facets: – Organization > Division > Group – Process > Drill a Well > File Test Plan – Assets > Platforms > Platform A – Content Type > Communication > Presentations

42 Enterprise Environment – Case Two – Taxonomy, 4 facets Environment & Project Issues  Value of taxonomy understood, but not the complexity and scope – Under budget, under staffed  Location – not KM – tied to RM and software – Solution looking for the right problem  Importance of an internal library staff – Difficulty of merging internal expertise and taxonomy  Project mind set – not infrastructure – Rushing to meet deadlines doesn’t work with semantics Importance of integration – with team, company – Project plan more important than results

43 Enterprise Environment – Case Two – Taxonomy, 4 facets Research and Design Issues  Research Issues – Not enough research – and wrong people – Misunderstanding of research – wanted tinker toy connections Interview 1 leads to taxonomy node 2  Design Issues – Not enough facets – Wrong set of facets – business not information – Ill-defined facets – too complex internal structure

44 Enterprise Environment – Case Two – Taxonomy, 4 facets Conclusion: Risk Factors  Political-Cultural-Semantic Environment – Not simple resistance - more subtle – re-interpretation of specific conclusions and sequence of conclusions / Relative importance of specific recommendations  Access to content and people – Enthusiastic access  Importance of a unified project team – Working communication as well as weekly meetings

Applications 45

46 Quick Start for Text Analytics Building on the Foundation  Text Analytics: Create the Platform – CM & Search – New Electronic Publishing Process Use text analytics to tag, new hybrid workflow – New Enterprise Search Build faceted navigation on metadata, extraction  Enhance Information Access in the Enterprise - InfoApps – Governance, Records Management, Doc duplication, Compliance – Applications – Business Intelligence, CI, Behavior Prediction – eDiscovery, litigation support, Fraud detection – Productivity / Portals – spider and categorize, extract

47 Quick Start for Text Analytics Information Platform: Content Management  Hybrid Model – Internal Content Management – Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author – Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy – Feedback – if author overrides -> suggestion for new category  External Information - human effort is prior to tagging – More automated, human input as specialized process – periodic evaluations – Precision usually more important – Target usually more general

48 Text Analytics and Search Multi-dimensional and Smart  Faceted Navigation has become the basic/ norm – Facets require huge amounts of metadata – Entity / noun phrase extraction is fundamental – Automated with disambiguation (through categorization)  Taxonomy – two roles – subject/topics and facet structure – Complex facets and faceted taxonomies  Clusters and Tag Clouds – discovery & exploration  Auto-categorization – aboutness, subject facets – This is still fundamental to search experience – InfoApps only as good as fundamentals of search  People – tagging, evaluating tags, fine tune rules and taxonomy

49

50

51 Integrated Facet Application Design Issues - General  What is the right combination of elements? – Dominant dimension or equal facets – Browse topics and filter by facet, search box – How many facets do you need?  Scale requires more automated solutions – More sophisticated rules  Issue of disambiguation: – Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford – Same word, different entity – Ford and Ford  Number of entities and thresholds per results set / document – Usability, audience needs  Relevance Ranking – number of entities, rank of facets

52 Quick Start for Text Analytics Text and Data: Two Way Street  New types of applications – New ways to make sense of data, enrich data  Harvard – Analyzing Text as Data – Detecting deception, Frame Analysis  Narrative Science – take data (baseball statistics, financial data) and turn into a story  Political campaigns using Big Data, social media, and text analytics  Watson for healthcare – help doctors keep up with massive information overload

53 Quick Start for Text Analytics Social Media: Beyond Simple Sentiment  Beyond Good and Evil (positive and negative) – Social Media is approaching next stage (growing up) – Where is the value? How get better results?  Importance of Context – around positive and negative words – Rhetorical reversals – “I was expecting to love it” – Issues of sarcasm, (“Really Great Product”), slanguage  Granularity of Application – Early Categorization – Politics or Sports  Limited value of Positive and Negative – Degrees of intensity, complexity of emotions and documents  Addition of focus on behaviors – why someone calls a support center – and likely outcomes

54 Quick Start for Text Analytics Social Media: Beyond Simple Sentiment  Two basic approaches [Limited accuracy, depth] – Statistical Signature of Bag of Words – Dictionary of positive & negative words  Essential – need full categorization and concept extraction  New Taxonomies – Appraisal Groups – Adjective and modifiers – “not very good” – Supports more subtle distinctions than positive or negative  Emotion taxonomies - Joy, Sadness, Fear, Anger, Surprise, Disgust – New Complex – pride, shame, confusion, skepticism

Quick Start for Text Analytics Social Media: Beyond Simple Sentiment  Expertise Analysis – Experts think & write differently – process, chunks – Categorization rules for documents, authors, communities  Applications: – Business & Customer intelligence, Voice of the Customer – Deeper understanding of communities, customers – better models – Security, threat detection – behavior prediction, Are they experts? – Expertise location- Generate automatic expertise characterization  Crowd Sourcing – technical support to Wiki’s  Political – conservative and liberal minds/texts – Disgust, shame, cooperation, openness 55

56 Quick Start for Text Analytics Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Basic Rule – (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act  More sophisticated analysis of text and context in text  Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications

57 Text Analytics Workshop Conclusions  Text Analytics and Taxonomy are partners – enrich each other  Text Analytics can mind the gap – between taxonomies and documents  Text Analytics needs strategic vision and quick start – Need to approach as platform – deep context – understand information environment  Text Analytics is a platform for huge range of applications: – Search and Content Management and Basic productivity apps – New kinds of applications - social, data, InfoApps of all kinds  Want to learn more – come to Text Analytics World in SF in April! – Call for Speakers-Nov 2 –

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services

59 Resources  Books – Women, Fire, and Dangerous Things George Lakoff – Knowledge, Concepts, and Categories Koen Lamberts and David Shanks – Formal Approaches in Categorization Ed. Emmanuel Pothos and Andy Wills – The Mind Ed John Brockman Good introduction to a variety of cognitive science theories, issues, and new ideas – Any cognitive science book written after 2009

60 Resources  Conferences – Web Sites – Text Analytics World - All aspects of text analytics Call for Speakers – April 17-18, San Francisco – – Text Analytics Summit – – Semtech –

61 Resources  Blogs – SAS-  Web Sites – Taxonomy Community of Practice: – LindedIn – Text Analytics Summit Group – – Whitepaper – CM and Text Analytics - eetstextanalytics.pdf eetstextanalytics.pdf – Whitepaper – Enterprise Content Categorization strategy and development –

62 Resources  Articles – Malt, B. C Category coherence in cross-cultural perspective. Cognitive Psychology 29, – Rifkin, A Evidence for a basic level in event taxonomies. Memory & Cognition 13, – Shaver, P., J. Schwarz, D. Kirson, D. O’Conner Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, – Tanaka, J. W. & M. E. Taylor Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23,

63 Resources  LinkedIn Groups: – Text Analytics World – Text Analytics Group – Data and Text Professionals – Sentiment Analysis – Metadata Management – Semantic Technologies  Journals – Academic – Cognitive Science, Linguistics, NLP – Applied – Scientific American Mind, New Scientist