Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text.

Similar presentations


Presentation on theme: "Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text."— Presentation transcript:

1 Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

2 2 Text Analytics World Highlights  Keynote – Peter Morville, Information Architecture+  Keynote – Future of Text Analytics – Bigger, Better, Smarter  Social Media and Enterprise Text Analytics – new techniques, new applications, new directions - Integration  Two Panels– leading TA experts: Interactive: What you always wanted to know about TA, but were afraid to ask.  Great Companies: Visit Sponsors & hear great case studies  Text Analytics Workshop – Thursday  Logistics

3 3 Agenda  Introduction: – Current State of Text Analytics – Survey / Report  Enterprise Text Analytics - Search – still fundamental – Shift from information to business  Social Media – Next Generation – Different World: Content, Structures, Applications  Future of Text Analytics – Roadblocks, Deep Vision  Questions

4 4 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies  Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Partners – SAS, Smart Logic, Expert Systems, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Projects – Portals, taxonomy, Text analytics – news, expertise location, information strategy, text analytics evaluation, Quick Start in Text A.  Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.  Presentations, Articles, White Papers – www.kapsgroup.comwww.kapsgroup.com

5 5 Introduction: Coming Soon  New Book: Text Analytics: How to Conquer Information Overload and Get Real Value from Social Media  Due end of May  Free Copy to Workshop Attendees  One randomly selected person at the conference will receive a free copy – stay tuned!

6 6 Text Analytics World Current State of Text Analytics  History – academic research, focus on NLP  Inxight –out of Zerox Parc – Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data  Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends – Half from 2008 are gone - Lucky ones got bought  Early applications – News aggregation and Enterprise Search –  Second Wave = shift to sentiment analysis  Third Wave = Multiple Enterprise & Social Applications – Watson = New Levels of Excitement – Need practical version

7 7 Text Analytics World Current State of Text Analytics: Vendor Space  Taxonomy Management – SchemaLogic, Pool Party  Taxonomy & Semantic Networks - Text Analytics Solutions – Access Innovation, Luminoso  Extraction and Analytics – Linguamatics (Pharma), Temis, whole range of companies  Business Intelligence – Clear Forest, Inxight  Sentiment Analysis – Attensity, Lexalytics, Clarabridge  Open Source – GATE  Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching  Embedded in Content Management, Search – Autonomy, FAST, Endeca, Exalead, etc.  Market Mindshare – IBM, SAS, Clarabridge, Lexalytics

8 8 Current Market: Text Analytics Surveys, Seth Grimes Report  Market – 2014 - $2Bil  Enterprise search – 30-50% of market ($1Bil)  Text Analytics is growing 20% a year, 10% of analytics  Fragmented market – no clear leader  Social and Voice of Customer is huge  Money (investor) is still mostly social  Cloud-based Software as Service continues to grow  Growth as a market – slowed, as a technique – expanding – (Me – time for new direction, characterization of field, etc.)  US market different than Europe/Asia – project oriented

9 9 Seth Grimes Report + Interviews Leading Analysts: Current Trends  From Mundane to Advanced – reducing manual labor to “Cognitive Computing”  Enterprise – Shift from Information to Business – cost cutting rather than productivity gains  Embedded solutions – not called TA (but should be because they suffer from weak TA)  Graph databases (saying since 2010 – he’ll be right one of these years: Open Knowledge Graphs  Human-Machine – still need human hybrid  Rules – hard to maintain and new text (wrong kind of rules)

10 10 Seth Grimes Report Current and Future Trends  Top four in Grimes survey: – Ability to generate taxonomies (64%) – Ability to use specialized, taxonomies, ontologies, etc. (54%) – Broad information extraction (53%) – Document Classification (53%)  Top business applications – Brand/product/reputation management (38%) – Voice of the Customer (39%) – Competitive Intelligence (33%) – Search, Info Access, etc. (29%) – (Research 38% - not listed as a choice)

11 11 Seth Grimes Report Current and Future Trends  Current extract more, more diverse types of info, applying insights in new ways and for new purposes – yet user satisfaction still lagging- accuracy and ease of use  74% satisfied with TA – only 4% disappointed  Most dissatisfaction – ease of use (29%) and availability of professional services/support (50%)  48% likely to recommend their provider – 36% would recommend against

12 12 Enterprise Text Analytics  Search is still #1 = 30-50% of applications  New Standard Search – facets (more and more metadata), auto- categorization built on taxonomies, clustering  Trend = Text Analytics/Search as Semantic Infrastructure – Platform for Info Apps (Search-based applications)  SharePoint – Major focus of TA companies – fix problems with taxonomy/folksonomy – Hybrid workflow – Publish document -> TA analysis -> suggestions for categorization, entities, metadata -> present to author  External information = more automation, extraction – precision more important

13 13 Enterprise Text Analytics Adding Structure to Unstructured Content  Beyond Documents – categorization by corpus, by page, sections or even sentence or phrase  Documents are not unstructured – variety of structures – Sections – Specific - “Abstract” to Function “Evidence” – Corpus – document types/purpose – Textual complexity, level of generality  Need to develop flexible categorization and taxonomy – tweets to 200 page PDF  Applications require sophisticated rules, not just categorization by similarity

14 14

15 15 Enterprise Text Analytics Document Type Rules  (START_2000, (AND, (OR, _/article:"[Abstract]", _/article:"[Methods]“), (OR,_/article:"clinical trial*", _/article:"humans",  (NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe", _/article:"use", _/article:"animals"),  If the article has sections like Abstract or Methods  AND has phrases around “clinical trials / Humans” and not words like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score  Primary issue – major mentions, not every mention – Combination of noun phrase extraction and categorization – Results – virtually 100%

16 16 Enterprise Text Analytics Building on the Foundation: Applications  Focus on business value, cost cutting  Enhancing information access is means, not an end – Governance, Records Management, Doc duplication, Compliance – Applications – Business Intelligence, CI, Behavior Prediction – eDiscovery, litigation support – Risk Management – Productivity / Portals – spider and categorize, extract – KM communities & knowledge bases New sources – field notes into expertise, knowledge base – capture real time, own language-concepts

17 17 Enterprise Text Analytics: Applications Pronoun Analysis: Fraud Detection; Enron Emails  Function words = pronouns, articles, prepositions, conjunctions, etc. – Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words  Patterns of “Function” words reveal wide range of insights  Areas: sex, age, power-status, personality – individuals and groups  Lying / Fraud detection: Documents with lies have: – Fewer, shorter words, fewer conjunctions, more positive emotion words – More use of “if, any, those, he, she, they, you”, less “I”  Current research – 76% accuracy in some contexts  Text Analytics can improve accuracy and utilize new sources  Combine with Data analytics can improve accuracy

18 18 Social Media: Next Generation Beyond Simple Sentiment  Beyond Good and Evil (positive and negative) – Degrees of intensity, complexity of emotions and documents  Importance of Context – around positive and negative words – Rhetorical reversals – “I was expecting to love it” – Issues of sarcasm, (“Really Great Product”), slanguage  Essential – need full categorization and concept extraction  Voice of the Customer: Must Have – Need full Text Analytics to do well  New conceptual models, models of users, communities

19 19 New Content Characteristics It’s a Very Different World  Scale – orders of magnitude – 100’s of millions, Billions  Speed – 20-100 million a day  Size – Twitter, Blogs, forums, email – 140 characters to a few sentences  Quality – misspellings, lack of structure, incoherence  Conversations – not stand alone docs – Can’t tell what a “document” is about without reference to previous threads  Purpose – communicate - social grooming, rant – Not exchange of ideas, policies, etc.  Simple Content Complexity – single thoughts, simplicity of emotion

20 20 New Content Characteristics It’s a Very Different World – Search and Taxonomy  i tried very slow, NO GOOGLE search, some apps not working.. This is not a "with GOOGLE" My friend has incredible, that is much batter.. Anyways i returned samsung, replace incredible. What's great about it: 4" LCD What's not so great: NOT A GOOGLE PHONE  (nt 2.0)willie John ci to/for: wanted to know about charges for pic mail for ;bill date 4/5/2010 | repeat: no | auth: pin | ptns affected: 7777777777 | information/instructions given: sup gave pic mail for free and gave adj for $ 2.40 new bal is $ 147.53 | any mobile, anytime: n | ir: yes | ir-email: n |

21 21 New Content Characteristics It’s a Very Different World – Topical Current Content  Content not archived (for users)  No real need for search (or just very simple search)  Very Poor (if any) metadata – not faceted search  Focus on phrases, sentences – not documents  Little need of a complex subject taxonomy  About emotions, things, products, people  Emotion – simple structures, infinite kinds of expression

22 22 It’s a Very Different World  Companies are mining this resource and they need to add structure to get deeper understanding  Varieties of structure: – Simple topical taxonomies 2-3 levels – Emotion taxonomies, Ontologies and Semantic Networks – Dynamic taxonomies – built on public taxonomies, enterprise taxonomy – exposed in hierarchical triples.  Need more automatic / semi-automatic solutions – Advanced text analytics

23 New Kinds of Social Taxonomies  New Taxonomies – Appraisal – Appraisal Groups – Adjective and modifiers – “not very good” – Four types – Attitude, Orientation, Graduation, Polarity – Supports more subtle distinctions than positive or negative  Emotion taxonomies – Joy, Sadness, Fear, Anger, Surprise, Disgust – New Complex – pride, shame, embarrassment, love, awe – New situational/transient – confusion, concentration, skepticism  Beyond Keywords – Need Text Analytics – Analysis of phrases, multiple contexts – conditionals, oblique – Analysis of conversations – dynamic of exchange, private language – Enterprise taxonomy rolled into a categorization taxonomy 23

24 24 Social Media: Next Generation Variety of New Applications  Crowd Sourcing Technical Support – User Forums – find problem area, nearby text for solution – Automatic or Human mediated  Legal Review – Significant trend – computer-assisted review (manual =too many) – TA- categorize and filter to smaller, more relevant set – Payoff is big – One firm with 1.6 M docs – saved $2M  Financial Services – Trend – using text analytics with predictive analytics – risk and fraud – Combine unstructured text (why) and transaction data (what) – Customer Relationship Management, Fraud Detection – Stock Market Prediction – Twitter, impact articles

25 25 Social Media: Next Generation Variety of New Applications  Voice of the Customer (Employee, Voter) – Early discovery of issues with product, service, customer issues – Identify opportunities for new products and service, sales or new feature improvements – Enable companies to find and understand correlations between promotional campaigns and customer reactions – It can lead to business or competitor intelligence  Current – better at gathering information than analyzing  Possibilities are (almost) endless  And a little bit scary – deep psychology, conservative-liberal brains

26 26 Social Media: Next Generation Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Basic Rule – (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act  More sophisticated analysis of text and context in text  Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications

27 27 Future of Text Analytics Obstacles - Survey Results  What factors are holding back adoption of TA? – Lack of clarity about TA and business value - 47% – Lack of senior management buy-in - 8.5%  Need articulated strategic vision and immediate practical win  Issue – TA is strategic, US wants short term projects – Sneak Project in, then build infrastructure – difficulty of speaking enterprise  Integration Issue – who owns infrastructure? IT, Library, ? – IT understands infrastructure, but not text – Need interdisciplinary collaboration – Stanford is offering English- Computer Science Degree – close, but really need a library- computer science degree

28 28 Future of Text Analytics Primary Obstacle: Complexity  Usability of software is one element  More important is difficulty of conceptual-document models – Language is easy to learn, hard to understand and model  Need to add more intelligence (semantic networks) and ways for the system to learn – social feedback  Customization – Text Analytics– heavily context dependent – Content, Questions, Taxonomy-Ontology – Level of specificity – Telecommunications – Specialized vocabularies, acronyms

29 29 New Directions in Text Analytics Conclusions  Text Analytics still growing: more mature applications and technique  Find the right balance of infrastructure and application focus  Essential theme – integration – text and data, enterprise and social  Big obstacles remain – Strategic Vision of text analytics in the enterprise – Concrete and quick application to drive acceptance  Future – Women, Fire, and Dangerous Things – Text Analytics and Cognitive Science = Metaphor Analysis, deep language understanding, common sense?

30 30 New Directions in Text Analytics Conclusions  Bigger: – Big Data gets the press, but Big Text is bigger – and potentially more valuable – Needs more systemic solutions – Number and variety of TA Applications still growing  Better: – Libraries of Modules – Ensemble Methods – Cognitive Computing – TA Foundation  Smarter: – Not AI, but smarts without waiting for 50 years  Great Time to get into Text Analytics

31 Questions? Tom Reamy Program Chair – Text Analytics World tomr@kapsgroup.com KAPS Group http://www.kapsgroup.com


Download ppt "Text Analytics World Future Directions of Text Analytics: Smarter, Bigger, and Better Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text."

Similar presentations


Ads by Google