Text Analytics And Text Mining Best of Text and Data Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com
Agenda Text Analytics Capabilities Text Analytics Applications Text Mining and Text Analytics Data and Unstructured Content Case Study – Text Mining for Taxonomy Development Conclusion
KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: Text Analytics evaluation, development, consulting, customization Knowledge Representation – taxonomy, ontology, Prototype Metadata standards and implementation Knowledge Management: Collaboration, Expertise, e-learning Applied Theory – Faceted taxonomies, complexity theory, natural categories
Introduction to Text Analytics Text Analytics Features Noun Phrase Extraction Catalogs with variants, rule based dynamic Multiple types, custom classes – entities, concepts, events Feeds facets Summarization Customizable rules, map to different content Fact Extraction Relationships of entities – people-organizations-activities Ontologies – triples, RDF, etc. Sentiment Analysis Statistical, rules – full categorization set of operators
Introduction to Text Analytics Text Analytics Features Auto-categorization Training sets – Bayesian, Vector space Terms – literal strings, stemming, dictionary of related terms Rules – simple – position in text (Title, body, url) Semantic Network – Predefined relationships, sets of rules Boolean– Full search syntax – AND, OR, NOT Advanced – NEAR (#), PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction, Sentiment Foundation for best text analytics & combination
Varieties of Taxonomy/ Text Analytics Software Taxonomy Management Synaptica, SchemaLogic Full Platform SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE Content Management – embedded Embedded – Search FAST, Autonomy, Endeca, Exalead, etc. Specialty Sentiment Analysis , VOC – Lexalytics, Attensity / Reports Ontology – extraction, plus ontology
Text Analytics Applications Platform for Multiple Applications Content Aggregation, Duplicate Documents – save millions! Business intelligence, Customer Intelligence Social Media - sentiment analysis, Voice of the Customer Social – Hybrid folksonomy / taxonomy / auto-metadata Social – expertise, categorize tweets and blogs, reputation Ontology – travel assistant, semantic web, etc. eDiscovery, Reputation management, Customer Experience Expertise Location, Crowd sourcing Technical support
Text Analytics Applications: Enterprise Search - Elements Text Analytics can “solve” enterprise search Multiple Knowledge Structures Facet – orthogonal dimension of metadata Taxonomy - Subject matter / aboutness Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining People – tagging, evaluating tags, fine tune rules and taxonomy Rich Search Results – context and conversation Platform for search based applications
Text Analytics and Text Mining Data and Unstructured Content 80% of content is unstructured – adding to semantic web is major Text Analytics – content into data Big Data meets Big Content Real integration of text and ontology Beyond “hasDescription” Improve accuracy of extracted entities, facts – disambiguation Pipeline – oil & gas OR research / Ford Add Concepts, not just “Things” – 68% want this Semantic Web + Text Analytics = real world value Linked Data + Text Analytics – best of both worlds Build superior foundation elements – taxonomies, categorization
Combine with Data Mining New sources of information Text Analytics and Text Mining and Data Mining Vaccine Adverse Reaction Combine with Data Mining New sources of information News stories, medical records Blogs, social Find new connections, sources of knowledge Vaccine Adverse Effects – disease, symptoms, variables Unstructured text into a data source Some preliminary analysis, content structure Find unknown adverse effects and prevalence Drug Discovery + search / research – 5 year story
Text Analytics Applications Example – Vaccine Adverse Effects
Text Analytics Applications Example – Vaccine Adverse Effects
Text Analytics Applications Example – Vaccine Adverse Effects
Text Analytics and Text Mining Case Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
Text Analytics and Text Mining Case Study – Taxonomy Development Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms Add Data: PubDate, journalTitle, Taxonomy Node Terms – Map to frequency, date, date ranges, Taxonomy Node New Terms, Trends Relevance – frequency, Abstract, Title, human judgment Entity Extraction – Authors, Organizations, Products, Categorization – build on clusters & taxonomy Combination – reports, visualizations, interactive explorations
Case Study – Taxonomy Development
Case Study – Taxonomy Development
Case Study – Taxonomy Development
Conclusion The best is yet to come! Text Analytics impact is huge – solve information overload Enterprise Search and Search Based Applications: Save millions and enhance productivity Combination of Text Analytics & Text Mining – unlimited range of applications Mutual Enrichment – more data, add structure to unstructured Add Ontology = Richer Text Analytics – smarter, more useful Text Analytics + Text Mining + Semantic Web Move from theory to new practical applications The best is yet to come!
Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com