Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text.

Slides:



Advertisements
Similar presentations
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Social Media Social Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Flexible Text Mining using Interactive Information Extraction David Milward
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Business Productivity Infrastructure Optimization Campaign 1 Agenda: BPIO Partner Sales Readiness Workshop Day 3: Topic: Enterprise Content management.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Deep Text New Approaches in Text Analytics and Knowledge Organization Tom Reamy Chief Knowledge Architect KAPS Group Author: Deep.
Text Analytics Webinar
Popular Database Management Systems
Tom Reamy Chief Knowledge Architect KAPS Group
Text Analytics Tutorial
Deep Text Social Media Analysis A Text Analytics Foundation
Tom Reamy Chief Knowledge Architect KAPS Group
Combining Taxonomy, Ontology, Text, and Data A Deep Text Approach
Enterprise Social Networks A New Semantic Foundation
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024 Text Analytics Market share.
Program Chair: Tom Reamy Chief Knowledge Architect
Text Analytics Workshop
Using Text Analytics to Spot Fake News
Five Reasons to Use SharePoint 2013 Communities
Text Analytics Workshop: Introduction
Text Analytics Workshop
Program Chair: Tom Reamy Chief Knowledge Architect
Expertise Location Basic Level Categories
Why IBM Watson.
Presentation transcript:

Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

2 Text Analytics World Highlights  Keynote – Peter Morville, Information Architecture+  Keynote – Future of Text Analytics – Bigger, Better, Smarter  Social Media and Enterprise Text Analytics – new techniques, new applications, new directions - Integration  Two Panels– leading TA experts: Interactive: What you always wanted to know about TA, but were afraid to ask.  Great Companies: Visit Sponsors & hear great case studies  Text Analytics Workshop – Thursday  Logistics

3 Agenda  Introduction: – Current State of Text Analytics – Survey / Report  Enterprise Text Analytics - Search – still fundamental – Shift from information to business  Social Media – Next Generation – Different World: Content, Structures, Applications  Future of Text Analytics – Roadblocks, Deep Vision  Questions

4 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies  Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Partners – SAS, Smart Logic, Expert Systems, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Projects – Portals, taxonomy, Text analytics – news, expertise location, information strategy, text analytics evaluation, Quick Start in Text A.  Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.  Presentations, Articles, White Papers –

5 Introduction: Coming Soon  New Book: Text Analytics: How to Conquer Information Overload and Get Real Value from Social Media  Due end of May  Free Copy to Workshop Attendees  One randomly selected person at the conference will receive a free copy – stay tuned!

6 Text Analytics World Current State of Text Analytics  History – academic research, focus on NLP  Inxight –out of Zerox Parc – Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data  Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends – Half from 2008 are gone - Lucky ones got bought  Early applications – News aggregation and Enterprise Search –  Second Wave = shift to sentiment analysis  Third Wave = Multiple Enterprise & Social Applications – Watson = New Levels of Excitement – Need practical version

7 Text Analytics World Current State of Text Analytics: Vendor Space  Taxonomy Management – SchemaLogic, Pool Party  Taxonomy & Semantic Networks - Text Analytics Solutions – Access Innovation, Luminoso  Extraction and Analytics – Linguamatics (Pharma), Temis, whole range of companies  Business Intelligence – Clear Forest, Inxight  Sentiment Analysis – Attensity, Lexalytics, Clarabridge  Open Source – GATE  Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching  Embedded in Content Management, Search – Autonomy, FAST, Endeca, Exalead, etc.  Market Mindshare – IBM, SAS, Clarabridge, Lexalytics

8 Current Market: Text Analytics Surveys, Seth Grimes Report  Market – $2Bil  Enterprise search – 30-50% of market ($1Bil)  Text Analytics is growing 20% a year, 10% of analytics  Fragmented market – no clear leader  Social and Voice of Customer is huge  Money (investor) is still mostly social  Cloud-based Software as Service continues to grow  Growth as a market – slowed, as a technique – expanding – (Me – time for new direction, characterization of field, etc.)  US market different than Europe/Asia – project oriented

9 Seth Grimes Report + Interviews Leading Analysts: Current Trends  From Mundane to Advanced – reducing manual labor to “Cognitive Computing”  Enterprise – Shift from Information to Business – cost cutting rather than productivity gains  Embedded solutions – not called TA (but should be because they suffer from weak TA)  Graph databases (saying since 2010 – he’ll be right one of these years: Open Knowledge Graphs  Human-Machine – still need human hybrid  Rules – hard to maintain and new text (wrong kind of rules)

10 Seth Grimes Report Current and Future Trends  Top four in Grimes survey: – Ability to generate taxonomies (64%) – Ability to use specialized, taxonomies, ontologies, etc. (54%) – Broad information extraction (53%) – Document Classification (53%)  Top business applications – Brand/product/reputation management (38%) – Voice of the Customer (39%) – Competitive Intelligence (33%) – Search, Info Access, etc. (29%) – (Research 38% - not listed as a choice)

11 Seth Grimes Report Current and Future Trends  Current extract more, more diverse types of info, applying insights in new ways and for new purposes – yet user satisfaction still lagging- accuracy and ease of use  74% satisfied with TA – only 4% disappointed  Most dissatisfaction – ease of use (29%) and availability of professional services/support (50%)  48% likely to recommend their provider – 36% would recommend against

12 Enterprise Text Analytics  Search is still #1 = 30-50% of applications  New Standard Search – facets (more and more metadata), auto- categorization built on taxonomies, clustering  Trend = Text Analytics/Search as Semantic Infrastructure – Platform for Info Apps (Search-based applications)  SharePoint – Major focus of TA companies – fix problems with taxonomy/folksonomy – Hybrid workflow – Publish document -> TA analysis -> suggestions for categorization, entities, metadata -> present to author  External information = more automation, extraction – precision more important

13 Enterprise Text Analytics Adding Structure to Unstructured Content  Beyond Documents – categorization by corpus, by page, sections or even sentence or phrase  Documents are not unstructured – variety of structures – Sections – Specific - “Abstract” to Function “Evidence” – Corpus – document types/purpose – Textual complexity, level of generality  Need to develop flexible categorization and taxonomy – tweets to 200 page PDF  Applications require sophisticated rules, not just categorization by similarity

14

15 Enterprise Text Analytics Document Type Rules  (START_2000, (AND, (OR, _/article:"[Abstract]", _/article:"[Methods]“), (OR,_/article:"clinical trial*", _/article:"humans",  (NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe", _/article:"use", _/article:"animals"),  If the article has sections like Abstract or Methods  AND has phrases around “clinical trials / Humans” and not words like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score  Primary issue – major mentions, not every mention – Combination of noun phrase extraction and categorization – Results – virtually 100%

16 Enterprise Text Analytics Building on the Foundation: Applications  Focus on business value, cost cutting  Enhancing information access is means, not an end – Governance, Records Management, Doc duplication, Compliance – Applications – Business Intelligence, CI, Behavior Prediction – eDiscovery, litigation support – Risk Management – Productivity / Portals – spider and categorize, extract – KM communities & knowledge bases New sources – field notes into expertise, knowledge base – capture real time, own language-concepts

17 Enterprise Text Analytics: Applications Pronoun Analysis: Fraud Detection; Enron s  Function words = pronouns, articles, prepositions, conjunctions, etc. – Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words  Patterns of “Function” words reveal wide range of insights  Areas: sex, age, power-status, personality – individuals and groups  Lying / Fraud detection: Documents with lies have: – Fewer, shorter words, fewer conjunctions, more positive emotion words – More use of “if, any, those, he, she, they, you”, less “I”  Current research – 76% accuracy in some contexts  Text Analytics can improve accuracy and utilize new sources  Combine with Data analytics can improve accuracy

18 Social Media: Next Generation Beyond Simple Sentiment  Beyond Good and Evil (positive and negative) – Degrees of intensity, complexity of emotions and documents  Importance of Context – around positive and negative words – Rhetorical reversals – “I was expecting to love it” – Issues of sarcasm, (“Really Great Product”), slanguage  Essential – need full categorization and concept extraction  Voice of the Customer: Must Have – Need full Text Analytics to do well  New conceptual models, models of users, communities

19 New Content Characteristics It’s a Very Different World  Scale – orders of magnitude – 100’s of millions, Billions  Speed – million a day  Size – Twitter, Blogs, forums, – 140 characters to a few sentences  Quality – misspellings, lack of structure, incoherence  Conversations – not stand alone docs – Can’t tell what a “document” is about without reference to previous threads  Purpose – communicate - social grooming, rant – Not exchange of ideas, policies, etc.  Simple Content Complexity – single thoughts, simplicity of emotion

20 New Content Characteristics It’s a Very Different World – Search and Taxonomy  i tried very slow, NO GOOGLE search, some apps not working.. This is not a "with GOOGLE" My friend has incredible, that is much batter.. Anyways i returned samsung, replace incredible. What's great about it: 4" LCD What's not so great: NOT A GOOGLE PHONE  (nt 2.0)willie John ci to/for: wanted to know about charges for pic mail for ;bill date 4/5/2010 | repeat: no | auth: pin | ptns affected: | information/instructions given: sup gave pic mail for free and gave adj for $ 2.40 new bal is $ | any mobile, anytime: n | ir: yes | ir- n |

21 New Content Characteristics It’s a Very Different World – Topical Current Content  Content not archived (for users)  No real need for search (or just very simple search)  Very Poor (if any) metadata – not faceted search  Focus on phrases, sentences – not documents  Little need of a complex subject taxonomy  About emotions, things, products, people  Emotion – simple structures, infinite kinds of expression

22 It’s a Very Different World  Companies are mining this resource and they need to add structure to get deeper understanding  Varieties of structure: – Simple topical taxonomies 2-3 levels – Emotion taxonomies, Ontologies and Semantic Networks – Dynamic taxonomies – built on public taxonomies, enterprise taxonomy – exposed in hierarchical triples.  Need more automatic / semi-automatic solutions – Advanced text analytics

New Kinds of Social Taxonomies  New Taxonomies – Appraisal – Appraisal Groups – Adjective and modifiers – “not very good” – Four types – Attitude, Orientation, Graduation, Polarity – Supports more subtle distinctions than positive or negative  Emotion taxonomies – Joy, Sadness, Fear, Anger, Surprise, Disgust – New Complex – pride, shame, embarrassment, love, awe – New situational/transient – confusion, concentration, skepticism  Beyond Keywords – Need Text Analytics – Analysis of phrases, multiple contexts – conditionals, oblique – Analysis of conversations – dynamic of exchange, private language – Enterprise taxonomy rolled into a categorization taxonomy 23

24 Social Media: Next Generation Variety of New Applications  Crowd Sourcing Technical Support – User Forums – find problem area, nearby text for solution – Automatic or Human mediated  Legal Review – Significant trend – computer-assisted review (manual =too many) – TA- categorize and filter to smaller, more relevant set – Payoff is big – One firm with 1.6 M docs – saved $2M  Financial Services – Trend – using text analytics with predictive analytics – risk and fraud – Combine unstructured text (why) and transaction data (what) – Customer Relationship Management, Fraud Detection – Stock Market Prediction – Twitter, impact articles

25 Social Media: Next Generation Variety of New Applications  Voice of the Customer (Employee, Voter) – Early discovery of issues with product, service, customer issues – Identify opportunities for new products and service, sales or new feature improvements – Enable companies to find and understand correlations between promotional campaigns and customer reactions – It can lead to business or competitor intelligence  Current – better at gathering information than analyzing  Possibilities are (almost) endless  And a little bit scary – deep psychology, conservative-liberal brains

26 Social Media: Next Generation Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Basic Rule – (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act  More sophisticated analysis of text and context in text  Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications

27 Future of Text Analytics Obstacles - Survey Results  What factors are holding back adoption of TA? – Lack of clarity about TA and business value - 47% – Lack of senior management buy-in - 8.5%  Need articulated strategic vision and immediate practical win  Issue – TA is strategic, US wants short term projects – Sneak Project in, then build infrastructure – difficulty of speaking enterprise  Integration Issue – who owns infrastructure? IT, Library, ? – IT understands infrastructure, but not text – Need interdisciplinary collaboration – Stanford is offering English- Computer Science Degree – close, but really need a library- computer science degree

28 Future of Text Analytics Primary Obstacle: Complexity  Usability of software is one element  More important is difficulty of conceptual-document models – Language is easy to learn, hard to understand and model  Need to add more intelligence (semantic networks) and ways for the system to learn – social feedback  Customization – Text Analytics– heavily context dependent – Content, Questions, Taxonomy-Ontology – Level of specificity – Telecommunications – Specialized vocabularies, acronyms

29 New Directions in Text Analytics Conclusions  Text Analytics still growing: more mature applications and technique  Find the right balance of infrastructure and application focus  Essential theme – integration – text and data, enterprise and social  Big obstacles remain – Strategic Vision of text analytics in the enterprise – Concrete and quick application to drive acceptance  Future – Women, Fire, and Dangerous Things – Text Analytics and Cognitive Science = Metaphor Analysis, deep language understanding, common sense?

30 New Directions in Text Analytics Conclusions  Bigger: – Big Data gets the press, but Big Text is bigger – and potentially more valuable – Needs more systemic solutions – Number and variety of TA Applications still growing  Better: – Libraries of Modules – Ensemble Methods – Cognitive Computing – TA Foundation  Smarter: – Not AI, but smarts without waiting for 50 years  Great Time to get into Text Analytics

Questions? Tom Reamy Program Chair – Text Analytics World KAPS Group