Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
2 Agenda Introduction – Elements & Infrastructure Platform – Semantics not technology – Infrastructure not project – Value of Text Analytics Evaluating Software – Two Phase Process – Designing the Team and Content Structures Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications – Integration with Search and ECM – Platform for Information Applications
3 KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: – Taxonomy/Text Analytics development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Evaluation of Enterprise Search, Text Analytics – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories
4 Introduction to Text Analytics Semantic Infrastructure - Elements Taxonomy – Thesauri, Controlled Vocabulary Metadata – Standard (Dublin Core) and Facets Basic Text Analytics – Categorization – Document Topics – Aboutness – Entity Extraction – noun phrases, feed facets – Summarization – beyond snippets Advanced Text Analytics – Fact extraction – ontologies – Sentiment Analysis – good, bad, and ugly What is in a Name – text analytics or ?
5 Introduction to Text Analytics Taxonomy Thesauri, Controlled Vocabulary – Resources to build on – Indexing not categorization Taxonomy – Foundation for Categorization – Browse – classification scheme – Formal – Is-Child-Of, Is-Part-Of – Large taxonomies - MeSH – indexing all topics – Small is better – for categorization and faceted navigation
6 Introduction to Text Analytics Metadata Metadata standards – Dublin Core - Mostly syntactic not semantic – Description – static or dynamic (summarization) – Semantic – keywords – very poor performance Best Bets – high level categorization-search – Human judgments Audience – mixed results – Role, function, expertise, information behaviors Facets – classes of metadata – Standard - People, Organization, Document type-purpose – Specialized – methods, materials, products
7 Introduction to Text Analytics Text Analytics Categorization – Multiple techniques – examples, terms, Boolean – Built on a taxonomy Entity Extraction – Catalogs with variants, rule based dynamic Summarization – Rules – find sentences in a document Fact Extraction – Relationships of entities – people-organizations-activities Sentiment Analysis – Rules – adjectives & adverbs not nouns
8 Introduction to Text Analytics Text Analytics Why Text Analytics? – Enterprise search has failed to live up to its potential – Enterprise Content management has failed to live up to its potential – Taxonomy has failed to live up to its potential – Adding metadata, especially keywords has not worked What is missing? – Intelligence – human level categorization, conceptualization – Infrastructure – Integrated solutions not technology, software Text Analytics can be the foundation that (finally) drives success – search, content management, and much more
9 Text Analytics Platform 4 Basic Contexts Ideas – Content Structure – Language and Mind of your organization – Applications - exchange meaning, not data People – Company Structure – Communities, Users – Central team - establish standards, facilitate Activities – Business processes and procedures Technology – CMS, Search, portals, taxonomy tools – Applications – BI, CI, Text Mining
10 Text Analytics Platform: The start and foundation Knowledge Architecture Audit Knowledge Map - Understand what you have, what you are, what you want – The foundation of the foundation Contextual interviews, content analysis, surveys, focus groups, ethnographic studies Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories Natural level categories mapped to communities, activities Novice prefer higher levels Balance of informative and distinctiveness Living, breathing, evolving foundation is the goal
11 Text Analytics Platform – Benefits IDC White Paper Time Wasted – Reformat information - $5.7 million per 1,000 per year – Not finding information - $5.3 million per 1,000 – Recreating content - $4.5 Million per 1,000 Small Percent Gain = large savings – 1% - $10 million – 5% - $50 million – 10% - $100 million
12 Text Analytics Platform – Benefits Findability within and outside the enterprise – Savings per year - $millions Rescue enterprise search and ECM projects – Add semantics to search Clean up enterprise content – Duplication and accurate categorization Improve the quality of information access – Finding the right information can save millions Build smarter applications – Social networking, locate expertise within the enterprise
13 Text Analytics Platform – Benefits Understand your customers – What they are talking about and how they feel about it Empower your employees – Not only more time, but they work smarter Understand your competitors – What they are working on, talking about – Combine unstructured content and rich data sources – more intelligent analysis
14 Text Analytics Platform – Dangers Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library – Need all of the above and taxonomists+ Bad Design: – Start with bad taxonomy – Wrong taxonomy – too big or two flat Bad Categorization / Entity Extraction – Right kind of experience
15 Resources Books – Women, Fire, and Dangerous Things George Lakoff – Knowledge, Concepts, and Categories Koen Lamberts and David Shanks – The Stuff of Thought – Steven Pinker Web Sites – Text Analytics News – Text Analytics Wiki -
16 Resources Blogs – SAS- Manya Mayes – Chief Strategist Web Sites – Taxonomy Community of Practice: – Whitepaper – CM and Text Analytics - eetstextanalytics.pdf eetstextanalytics.pdf
Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services