Tom Reamy Chief Knowledge Architect KAPS Group

Slides:

Advertisements

Similar presentations

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.

Advertisements

Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.

Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.

Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.

Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.

Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics And Text Mining Best of Text and Data

Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group

SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group

Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.

Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.

Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.

New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group

Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.

Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.

Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Deep Text New Approaches in Text Analytics and Knowledge Organization Tom Reamy Chief Knowledge Architect KAPS Group Author: Deep.

Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text.

Text Analytics Webinar

Tom Reamy Chief Knowledge Architect KAPS Group

Text Analytics Tutorial

Deep Text Social Media Analysis A Text Analytics Foundation

Text Analytics Workshop

Combining Taxonomy, Ontology, Text, and Data A Deep Text Approach

Enterprise Social Networks A New Semantic Foundation

Program Chair: Tom Reamy Chief Knowledge Architect

Text Analytics Workshop

Search Techniques and Advanced tools for Researchers

Social Knowledge Mining

Using Text Analytics to Spot Fake News

Using Text Analytics to Spot Fake News

Text Analytics Workshop: Introduction

Text Analytics Workshop

Program Chair: Tom Reamy Chief Knowledge Architect

Expertise Location Basic Level Categories

Web Mining Department of Computer Science and Engg.

Introduction to Information Retrieval

Presentation transcript:

Intelligent Interactions with Search Results Getting Beyond Those Blue Results Lists (or Smart Text) Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Program Chair – Text Analytics World Taxonomy Boot Camp, KMWorld, Enterprise Search Summit: Nov. Washington DC

Agenda Case Study Beyond Search – Building on the Foundation Introduction: Search and Structure: Smart Text Smart Text– foundation of text analytics Adding Structure to Unstructured Text Dynamic Sections and more, Better Relevancy Calculations Complex Document Summaries, Deeper Personalization Case Study Publishing: Processing 700K Proposals Beyond Search – Building on the Foundation Conclusions

KAPS Group: General Clients: Knowledge Architecture Professional Services – Network of Consultants Strategy – IM & KM - Text Analytics, Social Media, Integration Services: Taxonomy/Text Analytics development, consulting, customization Text Analytics Fast Start – Audit, Evaluation, Pilot Social Media: Text based applications – design & development Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, DOT, World Bank, etc. Partners – Expert System, SAS, SAP, IBM, FAST, Smart Logic, Concept Searching, Attensity, Clarabridge, Lexalytics, Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies Presentations, Articles, White Papers – http://www.kapsgroup.com

Introduction: Elements of Smart Text - Text Analytics Text Mining – NLP, statistical, predictive, machine learning Extraction – entities – known and unknown, concepts, events Disambiguation - Ford Fact Extraction - ontology, relationships of entities Sentiment Analysis - Positive Negative – products, companies, Auto-categorization Training sets, Terms Rules – simple – position in text (Title, body, url) Boolean– Full search syntax – AND, OR, NOT Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE Based on taxonomy/ontology

Enterprise Text Analytics Search is still #1 = 30-50% of applications New Standard Search – facets (more and more metadata), auto-categorization built on taxonomies, clustering Trend = Text Analytics/Search as Semantic Infrastructure Platform for Info Apps (Search-based applications) SharePoint – Major focus of TA companies – fix problems with taxonomy/folksonomy Hybrid workflow – Publish document -> TA analysis -> suggestions for categorization, entities, metadata -> present to author External information = more automation, extraction – precision more important

Enterprise Text Analytics Adding Structure to Unstructured Content Beyond Documents – categorization by corpus, by page, sections or even sentence or phrase Documents are not unstructured – variety of structures Text indicators to define sections of the document Objectives, Abstract, Purpose, Aim – all the “same” section Sections – Specific - “Abstract” to Function “Evidence” Start of section is easy – where does it end? Experiment – clusters / vocabulary to define section Textual complexity, level of generality

Enterprise Text Analytics Categorization and Beyond Need to develop flexible categorization and taxonomy – tweets to 200 page PDF Rules or sample documents? Need more precision and granularity than documents can do Training sets – not as easy as thought Applications require sophisticated rules, not just categorization by similarity Separate logic of the rules from the text Stable rules, changing text Scores – relevancy with thresholds Not just frequency of words

Enterprise Text Analytics Document Type Rules (START_2000, (AND, (OR, _/article:"[Abstract]", _/article:"[Methods]“), (OR,_/article:"clinical trial*", _/article:"humans", (NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe", _/article:"use", _/article:"animals"), If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score Primary issue – major mentions, not every mention Combination of noun phrase extraction and categorization Results – virtually 100%

Case Study Publishing Project: Reed Construction Data 700,000 Proposals – Wide Variation Process Proposals – extract data – 30-50 types Current Manual Process – Internal Teams Expensive and Slow Structure Variety of Unstructured Documents Generate Table of Contents Generate Sections and Capture Text Semi-automatic extract Key Information Save Time & Money, Flexible Hiring, New Offerings

Publishing Project: Example Rules Automated Table of Content

Publishing Project: Example Rules Key Data Extraction Bid Dates/Times Roles (Architect, Designer, etc.) – names and addresses, etc. Project Attributes – Cost, Invitation Number, Parking, etc. Some Easy, Some Hard – Address! Example: ARCHITECT: MICHEAL KIM ARCHITECTURE 1 HOLDEN STREET BROOKLINE, MA 02445 P: (617) 739-6925 F: (772) 325-2991 Technique – create broad and stable templates, variation in the text

Publishing Project: Example Rules Key Project Data

Publishing Project: Process & Approach

Smart Search: Metadata, Metadata, Metadata Basic Facets: Date, People, Organization, Content-Type Advanced Facets: Materials, Methods, Project Attributes, etc. Context dependent Deep personalization Selection of facets by role, community, task, content Smart Summarization Better conceptual description Complex summaries – key data, document sections, etc. Smart Search – beyond simple relevancy Next – Beyond Search - active agents – don’t need questions

Building on the Foundation: Applications Pronoun Analysis: Fraud Detection; Enron Emails Patterns of “Function” words reveal wide range of insights Function words = pronouns, articles, prepositions, conjunctions, etc. Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have: Fewer, shorter words, fewer conjunctions, more positive emotion words More use of “if, any, those, he, she, they, you”, less “I” Current research – 76% accuracy in some context

Building on the Foundation: Social Media Beyond Simple Sentiment Beyond Good and Evil (positive and negative) Degrees of intensity, complexity of emotions and documents Importance of Context – around positive and negative words Rhetorical reversals – “I was expecting to love it” Issues of sarcasm, (“Really Great Product”), slanguage Essential – need full categorization and concept extraction New Taxonomies – Appraisal Groups – “not very good” Supports more subtle distinctions than positive or negative Emotion taxonomies - Joy, Sadness, Fear, Anger, Surprise, Disgust New Complex – pride, shame, confusion, skepticism

Building on the Foundation: Applications Behavior Prediction – Telecom Customer Service Problem – distinguish customers likely to cancel from mere threats Basic Rule / Intention (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"), (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”))))) Examples: customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. cci and is upset that he has the asl charge and wants it off or her is going to cancel his act More sophisticated analysis of text and context in text Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications

Building on the Foundation: Current Applications Survey Analysis – Add analysis of free text Automated Essay Scoring – Second Generation Beyond words (polysyllabic) to meaning Story Telling – Data Heavy, Sports, Finance 90% of news machine written by 2025, books? Legal Review / eDiscovery TA- categorize and filter to smaller, more relevant set Payoff is big – One firm with 1.6 M docs – saved $2M Voice of the Customer / Employee / Voter Analysis of Blogs, Tweets, Social Networks Early Identify problems with products and services Customer Relationship & Brand Management, Fraud Detection

Smart Text : New Directions - Integration Deep Integration – Text Analytics New Forms of Rules – Combine Text Mining and Text Analytics Incorporate clusters – CLUSTER Operator Like SENTENCE but more flexible, dynamic More Dynamic Sections Build up from “Categorization” of sentences – based on co-reference Smaller units – Appraisal Taxonomies for Subjects, Build Larger Units Complex Units – Collections of Paragraphs based on meaning Sentence Level Sentiment Techniques for Subjects Smarter Relevancy – not frequency – develop new scoring Hybrid Machine-Human Where, When, and How Development, Tagging, Usage, Analytics, etc. How Get Best of Both?

Conclusions Text Analytics can feed/extend Big Data and Cognitive Science applications Discover structure in (un)structured text Apply text analytics to sections of document – new kinds of relevancy Creating multiple views into data inside text – smart search results – interactive (facets plus) Modular design – better search, new applications, Watson Future: Cognitive Computing: Learns, discover patterns based on context, highly integrated, meaning-based, highly interactive Text Analytics adds depth of meaning Future – Women, Fire, and Dangerous Things Text Analytics and Cognitive Science = Metaphor Analysis, deep language understanding, common sense?

Coming Soon! New Book coming: Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text into Big Data November

Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com